in reply to Re: Escape special characters for a LaTeX file
in thread Escape special characters for a LaTeX file

This should have produced a nice, fat warning.
The script never gets there; apparently /^"([^"]{2,})","(\d)","(.+)"$/ never matches because of carriage returns... Specifically,
(.+)"$
can't match
"\r\n (ends of strings from __DATA__ section)

(perl -E '"foo\r\n" =~ /foo$/ or say "cant match!"').

perl -i -pe 'tr/\r//d' latex.pl fixes that and the script starts to emit tons of warnings.

I thought $ works on Windows with carriage returns but it seems it doesn't...

Replies are listed 'Best First'.
Re^3: Escape special characters for a LaTeX file
by AnomalousMonk (Archbishop) on Dec 14, 2014 at 15:43 UTC
    I thought $ works on Windows with carriage returns but it seems it doesn't...

    It does. The OS-specific line terminator character(s) is/are translated somewhere in Perl's I/O layers to a common  \n newline character (which, IIRC, is 0x0a linefeed) — unless you're using binmode or some other form of raw read/write. The following code should execute identically (I believe) under any OS (I'm running this under Windoze 7):

    c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $s (qq{foo\r\n}, qq{foo\r}, qq{foo\n}, qq{foo}) { if ($s =~ /foo$/) { print 'matched: ', pp $s; } else { print 'no match: ', pp $s; } } " no match: "foo\r\n" no match: "foo\r" matched: "foo\n" matched: "foo"
    See definition of  $ in Regular Expressions - Metacharacters and its general discussion in perlre and perlretut. The match failures are due to the  \r stuck in the middle of things.

    Also consider the following, which I also expect would work the same on any OS (it certainly works under Windows):

    use warnings; use strict; while( <DATA> ) { if (/^"([^"]{2,})","(\d)","(.+)"$/) { chomp; my ($one, $two, $three) = ($1, $2, $3); print "matched: ``$_'' ('$one', '$two', '$three') \n"; } else { print "no match: ``$_'' \n"; } } __DATA__ "Sonntag, 26.05.2013 - 13:13:27","0","Lieber Herr % , & text text with + some %special characters" xyzzy "Mittwoch, 05.06.2013 - 18:12:09","0","Besten Dank, & hat prima geklap +pt. {Greetings!}"
      Well now I'm confused. Does his script do anything on Windows? (as in any output?)

        I never tried to run marek1703's script as originally posted (I didn't want to start a fight with that tarbaby), so I don't know. However, I think it should be possible to write a script that will run under any OS and will do what I think marek1703 wants done given that the  s/// problem is resolved!

Re^3: Escape special characters for a LaTeX file
by marek1703 (Acolyte) on Dec 14, 2014 at 15:54 UTC

    Thank you again!

    funny enough, I have had already tried the suggestion of BrowserUK in my script, commented out.
    Now I put it back:

    $text =~ s![#$%&~_}{^]!\\$&!g;

    But only one character is not escaped: "%"
    strange!

      $text =~ s![#$%&~_}{^]!\\$&!g;

      But only one character is not escaped: "%"

      The sequence  $% is interpreted as the Perl special variable  $% (see perlvar) and its current value is interpolated. Try  [#\$%&~_}{^] (escaping the  $ character) instead.

      Update: E.g.:

      c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "my $sans_esc = qr/[#$%&~_}{^]/; print $sans_esc; ;; my $with_esc = qr/[#\$%&~_}{^]/; print $with_esc; " (?^:[#0&~_}{^]) (?^:[#\$%&~_}{^])

      Try to use the quotemeta operator (or, equivalently, the \Q ... \E tags) on your search list. It should remove all problems of this sort.

        Another way to completely defeat scalar or array interpolation in regexes (as in strings) is by use of  ' (single-quote) delimiters for the  s/// substitution:

        c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "my $s = '-#-$-%-&-~-_-}-{-^-0-$%&~_}{^0'; print qq{'$s'}; ;; $s =~ s' (?= [#$%&~_{}^]) '\\'xmsg; print qq{'$s'}; " '-#-$-%-&-~-_-}-{-^-0-$%&~_}{^0' '-\#-\$-\%-\&-\~-\_-\}-\{-\^-0-\$\%\&\~\_\}\{\^0'
        (This applies also to  m// and  qr// operators.) See perlop.

      Sorry! One strange thing is happening: I get this:
      1\0:\0\0
      all "0" are escaped.

        ... all "0" are escaped.

        That's because '0' (zero) is part of the character class: the value of  $% happens to be 0 at that point | the moment of regex compilation. Please see my update to the above.

        Update: E.g.:

        c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "my $s = '-#-$-%-&-~-_-}-{-^-0-$%&~_}{^0'; print qq{'$s'}; ;; $s =~ s{ ([#\$%&~_{}^]) }{\\$1}xmsg; print qq{'$s'}; " '-#-$-%-&-~-_-}-{-^-0-$%&~_}{^0' '-\#-\$-\%-\&-\~-\_-\}-\{-\^-0-\$\%\&\~\_\}\{\^0'
        and also (no capture):
        c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "my $s = '-#-$-%-&-~-_-}-{-^-0-$%&~_}{^0'; print qq{'$s'}; ;; $s =~ s{ (?= [#\$%&~_{}^]) }{\\}xmsg; print qq{'$s'}; " '-#-$-%-&-~-_-}-{-^-0-$%&~_}{^0' '-\#-\$-\%-\&-\~-\_-\}-\{-\^-0-\$\%\&\~\_\}\{\^0'