in reply to Removing the carriage return in a Find & Replace?

Per HTML standard, <CR>, <LF> and <space> are interchangeable separators in an HTML document. Moreover, a string of two or more separators is treated like a single separator.

So, if you want to catch <TD><FONT FACE=arial SIZE=-1> with a regex you should expect 0 or more separators wherever a separator is optional and 1 or more wherever it is required. That said, I think that /<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1>/ should be enough.

Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Replies are listed 'Best First'.
Re^2: Removing the carriage return in a Find & Replace?
by bobafifi (Beadle) on Sep 26, 2008 at 12:04 UTC
    Thanks for the quick reply psini!
    Using your suggestion, I just tried
    perl -i -pe 's/<TD>\s*<FONT FACE=arial SIZE=-1>/widget/g' * test.php
    unfortunately it didn't work.

    However, when I remove the carriage return in the html and run
    perl -i -pe 's/<TD><FONT FACE=arial SIZE=-1>/widget/g' * test.php
    no problem. Not sure why, but the s* doesn't seem to be recognized.

    Thanks again,
    Bob

      Because you've told perl to read the file a line at a time (well, more you haven't told it not to do otherwise and line is the default) so $_ will only contain <TD>\n and the next line will have <FONT ....>. At no point is the entire contents you expect to match in $_ simultaneously and in the right order so the match never happens and the substitution never triggers.

      See the documentation for the -0 switch in perlrun, specifically the part about turning on paragraph mode.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        The -p option you use splits the input in separate lines. For Perl \n isn't the same as a space even if it is for HTML. One solution is to undef $/ in order to enable ''slurp mode'', (or to use the before mentioned command line option -0):

        perl -i -pe 'BEGIN { undef $/ } s/<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1> +/widget/g' test.html
        I just tried mscharrer variation on this and it worked!
        perl -i -pe 'BEGIN { undef $/ } s/<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1>/widget/g' test.html

        Thanks so much!
        Bob

      Are you sure it is a CR and not some evil non-printable character used by MS?

      Try editing the file with a text editor (not a word processor!), delete the current newline character, insert a CR and try again. If it works, the problem is to find what is the newline character used in the file.

      Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

        I'm on a Mac using TextEdit in text mode (no MS) and Terminal to run Perl.
        Have you been able to get my example to work on your machine? Thanks,
        Bob