in reply to Problem extracting date with regex
Best to enclose this stuff in <CODE>..</CODE> tags so it looks like this...
$dir='C:/texts/'; opendir(directory,$dir) or die "cant"; while($file=readdir directory){ next if $file=~/^\./; $rfname=$dir.$file; # print "Found file: '$rfname'\n"; open (CONT, $rfname); while (<CONT>){ if($_=~m/<a href="/index.pl?node=0-3&lastnode_id=19212">0-3</a>?<a + href="/index.pl?node=0-9%28th%29%3F%28st%29%3F%28nd%29%3F%28rd%29%3F +&lastnode_id=19212">0-9(th)?(st)?(nd)?(rd)?</a>\s+(Jan(uary)?|Feb(rua +ry)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct +(ober)?|Nov(ember)?|Dec(ember)?)\s+<a href="/index.pl?node=0-9&lastno +de_id=19212">0-9</a>?<a href="/index.pl?node=0-9&lastnode_id=19212">0 +-9</a>?<a href="/index.pl?node=0-9&lastnode_id=19212">0-9</a><a href= +"/index.pl?node=0-9&lastnode_id=19212">0-9</a>/ig){ print "$file\t $_\n"; } elsif($_=~m/(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?| +Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+ +<a href="/index.pl?node=1-3&lastnode_id=19212">1-3</a>?<a href="/inde +x.pl?node=0-9&lastnode_id=19212">0-9</a>(th)?(nd)?(st)?(rd)?\s+<a hre +f="/index.pl?node=0-9&lastnode_id=19212">0-9</a>?<a href="/index.pl?n +ode=0-9&lastnode_id=19212">0-9</a>?<a href="/index.pl?node=0-9&lastno +de_id=19212">0-9</a><a href="/index.pl?node=0-9&lastnode_id=19212">0- +9</a>/ig){ print "$file\t $_\n"; } } }
Looking at your code, it prints out the name of the file and the complete line when the line matches the regular expression. If that's not what you want then you'll need to capture part of the match using brackets and print the value of $1, not $_.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: Re: Problem extracting date with regex
by Adam (Vicar) on Jun 22, 2000 at 05:21 UTC |