in reply to Escape special characters for a LaTeX file

BrowserUk's approach to your escaping problem is, of course, the way to go. However, in the hope that you may be spared some future headaches, a couple of features of the OPed code other than those already noted deserve comment.



my @special_characters = q{# $ % & ~ _ \\ { } ^};

This statement assigns a single element that is a string to an array (which it also creates).

c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my @special_characters = q{# $ % & ~ _ \\ { } ^}; dd \@special_characters; print 'number of elements in array: ', scalar @special_characters; " ["# \$ % & ~ _ \\ { } ^"] number of elements in array: 1
See perlintro, List value constructors in perldata.



foreach my $i (@special_characters) {
     $text =~ s/$special_characters[$i]/\\$&/g;
 }

This loops over the single element of the array, a string that is not numeric (i.e., doesn't "look like" a number; by default, Perl evaluates such a string to 0), and then uses the string as an index into the array. This should have produced a nice, fat warning.

c:\@Work\Perl>perl -wMstrict -le "my @special_characters = q{# $ % & ~ _ \\ { } ^}; ;; foreach my $i (@special_characters) { print qq{>>$special_characters[$i]<<}; } " Argument "# $ % & ~ _ \\ { } ^" isn't numeric in array element at -e l +ine 1. >># $ % & ~ _ \ { } ^<<

Replies are listed 'Best First'.
Re^2: Escape special characters for a LaTeX file
by Anonymous Monk on Dec 14, 2014 at 14:46 UTC
    This should have produced a nice, fat warning.
    The script never gets there; apparently /^"([^"]{2,})","(\d)","(.+)"$/ never matches because of carriage returns... Specifically,
    (.+)"$
    can't match
    "\r\n (ends of strings from __DATA__ section)

    (perl -E '"foo\r\n" =~ /foo$/ or say "cant match!"').

    perl -i -pe 'tr/\r//d' latex.pl fixes that and the script starts to emit tons of warnings.

    I thought $ works on Windows with carriage returns but it seems it doesn't...

      I thought $ works on Windows with carriage returns but it seems it doesn't...

      It does. The OS-specific line terminator character(s) is/are translated somewhere in Perl's I/O layers to a common  \n newline character (which, IIRC, is 0x0a linefeed) — unless you're using binmode or some other form of raw read/write. The following code should execute identically (I believe) under any OS (I'm running this under Windoze 7):

      c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $s (qq{foo\r\n}, qq{foo\r}, qq{foo\n}, qq{foo}) { if ($s =~ /foo$/) { print 'matched: ', pp $s; } else { print 'no match: ', pp $s; } } " no match: "foo\r\n" no match: "foo\r" matched: "foo\n" matched: "foo"
      See definition of  $ in Regular Expressions - Metacharacters and its general discussion in perlre and perlretut. The match failures are due to the  \r stuck in the middle of things.

      Also consider the following, which I also expect would work the same on any OS (it certainly works under Windows):

      use warnings; use strict; while( <DATA> ) { if (/^"([^"]{2,})","(\d)","(.+)"$/) { chomp; my ($one, $two, $three) = ($1, $2, $3); print "matched: ``$_'' ('$one', '$two', '$three') \n"; } else { print "no match: ``$_'' \n"; } } __DATA__ "Sonntag, 26.05.2013 - 13:13:27","0","Lieber Herr % , & text text with + some %special characters" xyzzy "Mittwoch, 05.06.2013 - 18:12:09","0","Besten Dank, & hat prima geklap +pt. {Greetings!}"
        Well now I'm confused. Does his script do anything on Windows? (as in any output?)

      Thank you again!

      funny enough, I have had already tried the suggestion of BrowserUK in my script, commented out.
      Now I put it back:

      $text =~ s![#$%&~_}{^]!\\$&!g;

      But only one character is not escaped: "%"
      strange!

        $text =~ s![#$%&~_}{^]!\\$&!g;

        But only one character is not escaped: "%"

        The sequence  $% is interpreted as the Perl special variable  $% (see perlvar) and its current value is interpolated. Try  [#\$%&~_}{^] (escaping the  $ character) instead.

        Update: E.g.:

        c:\@Work\Perl\monks\marek1703>perl -wMstrict -le "my $sans_esc = qr/[#$%&~_}{^]/; print $sans_esc; ;; my $with_esc = qr/[#\$%&~_}{^]/; print $with_esc; " (?^:[#0&~_}{^]) (?^:[#\$%&~_}{^])

        Try to use the quotemeta operator (or, equivalently, the \Q ... \E tags) on your search list. It should remove all problems of this sort.

        Sorry! One strange thing is happening: I get this:
        1\0:\0\0
        all "0" are escaped.