dsayars has asked for the wisdom of the Perl Monks concerning the following question:

I'm using ActivePerl in Windows on a file that contains \r\n carriage returns, but but I can't match them in my Perl regex. Neither of these expressions works:

<cp IX='1'\/>([A-Z][A-Z][A-Z])(.*?)\r\n>

<cp IX='1'\/>([A-Z][A-Z][A-Z])(.*?)\x0d\x0a

Both expressions work in editpad pro. I ended up having to replace every instance or "\r\" with with "crlf" and use this expression:

cp IX='1'\/>([A-Z][A-Z][A-Z])(.*?)crlf

Is Perl not supposed to recognize both "\r\n" and "\x0d\x0a" as carriage returns? They are definitely present in the file or I couldn't have found and replaced them with "crlf".

Replies are listed 'Best First'.
Re: ActivePerl won't match carriage returns (binmode)
by tye (Sage) on Nov 01, 2011 at 22:12 UTC
Re: ActivePerl won't match carriage returns
by GrandFather (Saint) on Nov 02, 2011 at 00:26 UTC

    By default Perl uses an I/O translation layer to convert between whatever character it uses internally for \n and the native line ending sequence on the host OS so that generally Perl just does the right thing for native text files.

    Not directly related to your immediate question: using regexen for hand parsing XML is thought a bad thing because it is really hard to get right in in the general case. You may do much better to investigate modules like XML::Twig to do the heavy lifting of parsing the XML.

    True laziness is hard work
Re: ActivePerl won't match carriage returns
by SuicideJunkie (Vicar) on Nov 01, 2011 at 22:05 UTC

    You should show your actual code.

    Did you read the whole file or just one line at a time? Perhaps you forgot to give the regex the multi-line option so that you can match across newlines?

      Thanks for the reply. I used the global /g. It's searching for certain code sequences in a huge XML file. Here are the actual matching statements:

      while ($text=~/IX='1'\/>([A-Z][A-Z][A-Z])(.*?)\r\n(.*?)\r\n|IX='0'\/>\r\n([A-Z][A-Z][A-Z])(.*?)\r\n(.*?)\r\n/g)              # doesn't work

      while ($text=~/IX='1'\/>([A-Z][A-Z][A-Z])(.*?)CRLF(.*?)CRLF|IX='0'\/>CRLF([A-Z][A-Z][A-Z])(.*?)CRLF(.*?)CRLF/g)     # works