Regular Expression help

joec_ has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have been trying for a while now (most of today !!) to try to get a regular expression working for this scenario, but can't seem to come to a right answer... Would you have any ideas?

I have this text and i need to get certain portions out...

Name "main::dbh" used only once: possible typo at /home/joe/test.pl li
+ne 60.
Script version 1.02 Test version 2.0.2

10 hashes read from index file
hashes from 1 database(s)

2 items read, 2 found, 2 written
Can't call method "disconnect" on an undefined value at /home/joe/test
+.pl line 66.
abc wwdddddddd 123.23
def wwdddddddd 456.56
[download]

The bits i need are:

items read
items found
abc
wwdddddddd (two letters and 8 numbers)
123.23

and the same for def. The letters and numbers may have the form

wwdddddddd:wwdddddddd:wwddddddddd

This is dependent on the outcome of another script.
Any ideas for a regex would be useful...
Thanks in advance...
Joe

ps. dont worry about the warnings, these get output as the result of another script and the abc / def could be any length / character.

-----

Eschew obfuscation, espouse elucidation!

Comment on Regular Expression help Download Code

Replies are listed 'Best First'.
Re: Regular Expression help by ikegami (Patriarch) on Jun 22, 2009 at 17:10 UTC
`my ($read, $found); for (;;) { defined( $_ = <$fh> ) or die("Premature eof\n"); ($read, $found) = /^(\d+) items read, (\d+) found,/ and last; } my @foos; while (<$fh>) { next if /^Can't call method "disconnect"/; my ($field0, $field1, $field2) = / ^ ( [a-z]+ ) \s+ ( [a-z]{2} [0-9]{8} (?: : [a-z]{2} [0-9]{8} )* ) \s+ ( \d+(\.\d+)? ) $ /x or last; my @field1s = split /:/, $field1; push @foos, [ $field0, \@field1s, $field2 ]; } ... do something with $read, $found and @foos ...` [download] `foo`, `field0`, `field1` and `field2` need to be replaced with better names. Update: Fixed bug.	[reply] [d/l] [select]
Re: Regular Expression help by SuicideJunkie (Vicar) on Jun 22, 2009 at 17:12 UTC
It sounds like you've got most of the requirements laid out already. What have you got so far and what parts aren't working? Are you trying to parse it as a giant multiline string, or are you reading a line at a time and collecting each piece as you come across it until you've got everything you want or hit the end? Personally, I would suggest reading a line at a time. For each line, attempt to match against each of the known types of line you want info out of. These regexes will be simple and easy to write and maintain! If it matches, squirrel away the captured data into a hash. Once you are done, do a check to make sure your hash contains all the necessary fields. Once you've caught them all, you can do whatever needs doing.	[reply]