in reply to Re: [emacs] converting perl regex into elisp regex
in thread [emacs] converting perl regex into elisp regex

Hi Joe!

I know most of this and just experimented a little bit, it's a hack but I thinks it's a good start 8)

Of course substituting \\ as \0 is only a temporarily solution ...

$\="\n"; #--- flags my $flag_interactive; # true => no extra escaping of backslashes my $RE='\w*(a|b|c)\d\('; $RE='\d{2,3}'; print $RE; #--- hide pairs of backslashes $RE=~s#\\\\#\0#g; #--- toggle escaping of 'backslash constructs' my $bsc='(){}|'; $RE=~s#[$bsc]#\\$&#g; # escape them once $RE=~s#\\\\##g; # and erase double-escaping #--- replace character classes my %charclass=( w => 'word' , # TODO: emacs22 already knows \w ??? d => 'digit', s => 'space' ); my $kc=join "|",keys %charclass; $RE=~s#\\($kc)#[[:$charclass{$1}:]]#g; #--- unhide pairs of backslashes $RE=~s#\0#\\\\#g; #--- escape backslashes for elisp string $RE=~s#\\#\\\\#g unless $flag_interactive; print $RE;

Do you see any problems?

Cheers Rolf

Replies are listed 'Best First'.
Re^3: [emacs] converting perl regex into elisp regex
by LanX (Saint) on Sep 18, 2009 at 03:30 UTC
    this version translates your example well, IMHO there are only the two mentioned TODOs left to be covered.

    /usr/bin/perl -w /tmp/plre2el.pl Perlcode: "(.*?)" Elispcode: \"\\(.*?\\)\"

    use strict; use warnings; # version 0.2 $\="\n"; #--- flags my $flag_interactive; # true => no extra escaping of backslashes my $RE='\w*(a|b|c)\d\('; $RE='\d{2,3}'; $RE='"(.*?)"'; print "Perlcode: $RE"; #--- hide pairs of backslashes $RE=~s#\\\\#\0#g; # TODO check for suitable long "hidesequence" instead of a simple \0 #--- TODO normalisation of needless escaping # e.g. from /\"/ to /"/, since it's no difference in perl but might +confuse elisp #--- toggle escaping of 'backslash constructs' my $bsc='(){}|'; $RE=~s#[$bsc]#\\$&#g; # escape them once $RE=~s#\\\\##g; # and erase double-escaping #--- replace character classes my %charclass=( w => 'word' , # TODO: emacs22 already knows \w ??? d => 'digit', s => 'space' ); my $kc=join "|",keys %charclass; $RE=~s#\\($kc)#[[:$charclass{$1}:]]#g; #--- unhide pairs of backslashes $RE=~s#\0#\\\\#g; #--- escaping for elisp string unless ($flag_interactive){ $RE=~s#\\#\\\\#g; # ... backslashes $RE=~s#"#\\"#g; # ... quotes } print "Elispcode: $RE";

    Cheers Rolf

    Please note: xemacs knows "Raw Strings" where escaping is not neccessary, but I doubt that normal Perl RE syntax can be used from within Gnu Emacs Lisp because of the quoting problem, so the conversion can in general only be done with perl!

    UPDATE:

    -TODO: Just noticed that i still need a special treatment for escape sequences like \t and \n. I'll add this tomorrow.

    last version re_pl2el.pl

      This looks very useful. Has it been updated or put somewhere like CPAN or github?
        > Has it been updated or put somewhere like CPAN or github?

        nope! :)

        I was once following a way to use the regex-debugging interface to have a cleaner approach° to get there, but ...too many projects.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

        °) It was discussed here, try searching the archives.

        update

        Regarding footnote, see Dynamically inspecting Regex OP-Codes at runtime?