Wednesday, April 6, 2011

emacs - regular expressions in lisp need to be double-escaped...why?

I've been playing around with emacs lisp, and I wanted to write a little function to do a regular expression search and replace. I had a heck of a time getting the regular expression to work correctly because I didn't realize that all the special characters need to be double escaped when writing lisp code (but not when using query-replace-regexp interactively!).

So for example, using query-replace-regexp interactively you can use

^\(.*\)[\t]-.*$

but when writing elisp code you need to double escape everything like so:

^\\(.*\\)[\t]-.*$

I finally found a reference to this in a Steve Yegge article, but I was wondering if anyone knew why this is?

From stackoverflow
  • It's because you need to escape backslashes in strings. If you don't escape the backslash of \( in the string it turns out to be just (

    scottfrazer : Dang, the backslash escape thing just got me when I posted that :) I had to put \\\\( to get it right
    Kevin Tighe : Ok, that makes sense. Thanks!
    Chris Conway : The same thing is true of any language that doesn't have direct syntactic support for regexes and constructs them via string literals (e.g., Java, Javascript), right?
    bendin : ah yes, the dreaded leaning toothpick syndrome.
    Kevin Tighe : That's true Chris. I think I was thrown off in elisp because I wasn't used to escaping () and {} chars.
  • scottfrazier is correct, one escape is parsed when the string is read, another is parsed when creating the regular expression. It's fairly easy to remember, but it can become a pain, especially when you're trying to match a literal backslash '\'. You end up having to do it four times '\\\\' because you have to double-slash to match the slash in both the initial string parse and the regular expression parse.

    And when you write on Stack Overflow about this problem you have to use 8 slashes because markdown uses the slash for an escape character as well.

  • You already have the answer, but a built-in aide for creating regular expressions inside Emacs is re-builder.

    M-x re-builder
    
    Kevin Tighe : Awesome, I didn't know about that. Next time I do this it should go much more smoothly. Thank you :).
  • FWIW, emacs-lisp-mode will fontify the special expressions (like "\\(" and "\\)" for you. You can then change the faces to be something that stands out. (They are font-lock-regexp-grouping-construct and font-lock-regexp-grouping-backslash)

0 comments:

Post a Comment