Sandy has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE!!

It isn't a regex problem at all!. It was the way I was reading in the file.

I erroneously had $/='';, but then it only read my data up to, but not including the blank line.

When I changed the above code to undef $/, everything works fine.

Oops!

END_OF_UPDATE

Hello all,

I have a file that I need to parse. Each page has a header and a page number.

At this point, I am having difficulties just grabbing the header for each page (part of a more complex regex).

Problem:

If there is an extra line in front of page 54's header, the regex does not find this page. If the blank line is replaced with an 'x', the page header will be found.
Code & output when it doesn't work:
#!/usr/bin/perl use warnings; use strict; my $header = join("\\s*\\n", 'My\s+Header', 'Page\s+\d+', ); $/ = ''; my $file = <DATA>; while ($file =~/($header\n+)/g) { print "pos = ", pos($file),"\n"; print $1,"\n"; } __DATA__ My Header Page 53 Some Text Some More Text Some More Text Some More Text My Header Page 54 3 Chapter Title My Header Page 55 Some Text Some More Text Some More Text Some More Text
Result:
pos = 18 My Header Page 53
Code & output when it does work:

same code, slightly different data

#!/usr/bin/perl use warnings; use strict; my $header = join("\\s*\\n", 'My\s+Header', 'Page\s+\d+', ); $/ = ''; my $file = <DATA>; while ($file =~/($header\n+)/g) { print "pos = ", pos($file),"\n"; print $1,"\n"; } __DATA__ My Header Page 53 Some Text Some More Text Some More Text Some More Text x My Header Page 54 3 Chapter Title My Header Page 55 Some Text Some More Text Some More Text Some More Text
Result:
pos = 18 My Header Page 53 pos = 93 My Header Page 54 pos = 127 My Header Page 55
I have checked the data with a binary editor, and nothing weird is on the blank line.

I'm using active state perl V5.6.1 on a windows 2000 professional machine.

Any insight would be appreciated.

Sandy

UPDATE:

Just a note: the code above was part of a series of test. The same result is obtained even if I use the 'sgmx' modifiers for the regex.

Replies are listed 'Best First'.
Re: Need help with a regex
by GrandFather (Saint) on Sep 15, 2006 at 18:53 UTC

    One of the ways PerlMonks works best is when you distill your problem down to a small test case. If you don't find the answer during distilation, then there's a 90% chance you will see it 10 seconds after clicking "create".

    Well posted node!


    DWIM is Perl's answer to Gödel