joec_ has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have been trying for a while now (most of today !!) to try to get a regular expression working for this scenario, but can't seem to come to a right answer... Would you have any ideas?

I have this text and i need to get certain portions out...

Name "main::dbh" used only once: possible typo at /home/joe/test.pl li +ne 60. Script version 1.02 Test version 2.0.2 10 hashes read from index file hashes from 1 database(s) 2 items read, 2 found, 2 written Can't call method "disconnect" on an undefined value at /home/joe/test +.pl line 66. abc wwdddddddd 123.23 def wwdddddddd 456.56
The bits i need are:

and the same for def. The letters and numbers may have the form

wwdddddddd:wwdddddddd:wwddddddddd

This is dependent on the outcome of another script.
Any ideas for a regex would be useful...
Thanks in advance...
Joe

ps. dont worry about the warnings, these get output as the result of another script and the abc / def could be any length / character.

-----

Eschew obfuscation, espouse elucidation!

Replies are listed 'Best First'.
Re: Regular Expression help
by ikegami (Patriarch) on Jun 22, 2009 at 17:10 UTC
    my ($read, $found); for (;;) { defined( $_ = <$fh> ) or die("Premature eof\n"); ($read, $found) = /^(\d+) items read, (\d+) found,/ and last; } my @foos; while (<$fh>) { next if /^Can't call method "disconnect"/; my ($field0, $field1, $field2) = / ^ ( [a-z]+ ) \s+ ( [a-z]{2} [0-9]{8} (?: : [a-z]{2} [0-9]{8} )* ) \s+ ( \d+(\.\d+)? ) $ /x or last; my @field1s = split /:/, $field1; push @foos, [ $field0, \@field1s, $field2 ]; } ... do something with $read, $found and @foos ...

    foo, field0, field1 and field2 need to be replaced with better names.

    Update: Fixed bug.

Re: Regular Expression help
by SuicideJunkie (Vicar) on Jun 22, 2009 at 17:12 UTC
    It sounds like you've got most of the requirements laid out already. What have you got so far and what parts aren't working?

    Are you trying to parse it as a giant multiline string, or are you reading a line at a time and collecting each piece as you come across it until you've got everything you want or hit the end?

    Personally, I would suggest reading a line at a time. For each line, attempt to match against each of the known types of line you want info out of.
    These regexes will be simple and easy to write and maintain!
    If it matches, squirrel away the captured data into a hash. Once you are done, do a check to make sure your hash contains all the necessary fields. Once you've caught them all, you can do whatever needs doing.