It isn't a regex problem at all!. It was the way I was reading in the file.
I erroneously had $/='';, but then it only read my data up to, but not including the blank line.
When I changed the above code to undef $/, everything works fine.
Oops!
END_OF_UPDATE
Hello all,
I have a file that I need to parse. Each page has a header and a page number.
At this point, I am having difficulties just grabbing the header for each page (part of a more complex regex).
Problem:
If there is an extra line in front of page 54's header, the regex does not find this page. If the blank line is replaced with an 'x', the page header will be found.Code & output when it doesn't work:
Result:#!/usr/bin/perl use warnings; use strict; my $header = join("\\s*\\n", 'My\s+Header', 'Page\s+\d+', ); $/ = ''; my $file = <DATA>; while ($file =~/($header\n+)/g) { print "pos = ", pos($file),"\n"; print $1,"\n"; } __DATA__ My Header Page 53 Some Text Some More Text Some More Text Some More Text My Header Page 54 3 Chapter Title My Header Page 55 Some Text Some More Text Some More Text Some More Text
pos = 18 My Header Page 53
same code, slightly different data
Result:#!/usr/bin/perl use warnings; use strict; my $header = join("\\s*\\n", 'My\s+Header', 'Page\s+\d+', ); $/ = ''; my $file = <DATA>; while ($file =~/($header\n+)/g) { print "pos = ", pos($file),"\n"; print $1,"\n"; } __DATA__ My Header Page 53 Some Text Some More Text Some More Text Some More Text x My Header Page 54 3 Chapter Title My Header Page 55 Some Text Some More Text Some More Text Some More Text
pos = 18 My Header Page 53 pos = 93 My Header Page 54 pos = 127 My Header Page 55
I'm using active state perl V5.6.1 on a windows 2000 professional machine.
Any insight would be appreciated.
Sandy
UPDATE:
Just a note: the code above was part of a series of test. The same result is obtained even if I use the 'sgmx' modifiers for the regex.
In reply to Need help with a regex by Sandy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |