vxp has asked for the wisdom of the Perl Monks concerning the following question:

I've a file to parse.

Here's a sample of its contents (this is cat -vue's output, and there are many more "sections" like the ones below in that file. about 12,000 sections like the ones below. I masked IPs, so if you see "x.x.x.x" there used to be an IP there :) ):

----- cut here ----- abc1-wl-wlc2.domain.com$ $ Successful downloads: 0$ Failed downloads: 1$ Warnings: 0$ $ Detailed Log Info$ $ General Expect Errors: 1$ Text Dump Failure: 1$ Binary Transfer Successful: 1$ Attempts to get Text Dump: 1$ Attempts to use TFTP: 1$ Attempted binary transfers: 1$ $ Textual log of the file transfer:$ Processing abc1-wl-wlc2.domain.com Performing [/usr/bin/ssh -2 -a -o " +StrictHostKeyChecking no" abc1-wl-wlc2.domain.com] Exception: 3:Child + PID 20855 exited with status 256 at /usr/SD/perl-5.6.1/lib/site_perl +/ EMAN/Config/DeviceConfig/WLC.pm line 288, line 58. ExpectLog: Sorry, t +elnet is not allowed on this port!Connection to abc1-wl-wlc2.domain.c +om closed. Performing TFTP using SNMP: DestHost [x.x.x.x] Commu nity [tasMANia] agentTransferUploadServerIP [x.x.x.x] agentTransferUpl +oadFilename [x.x.x.x.UCFM.18441] Exception: 3:Child PID 20855 exited +with status 256 at /usr/SD/perl-5.6.1/lib/site_perl/EMA N/Config/DeviceConfig/WLC.pm line 288, line 58. Updating CVS$ $ apl02-wl-wlc1.domain.com$ $ Successful downloads: 0$ Failed downloads: 1$ Warnings: 0$ $ Detailed Log Info$ $ Failed Binary transfers: 1$ General Expect Errors: 1$ Text Dump Failure: 1$ TFTP failures due to SNMP set error: 1$ Attempts to get Text Dump: 1$ Attempts to use TFTP: 1$ Attempted binary transfers: 1$ $ Textual log of the file transfer:$ Processing apl02-wl-wlc1.cisco.com Performing [/usr/bin/ssh -2 -a -o " +StrictHostKeyChecking no" apl02-wl-wlc1.cisco.com] Exception: 3:Child + PID 20683 exited with status 256 at /usr/SD/perl-5.6.1/lib/site_perl +/EMAN/Config/DeviceConfig/WLC.pm line 288, line 56. ExpectLog: Sorry, + telnet is not allowed on this port!Connection to apl02-wl-wlc1.cisco +.com closed. Performing TFTP using SNMP: DestHost [64.101.206.133] Co +mmunity [private] agentTransferUploadServerIP [171.70.168.173] agentT +ransferUploadFilename [64.101.206.133.UCFM.18436] Exception: Could no +t initialise SNMP [Timeout] at /usr/SD/perl-5.6.1/lib/site_perl/EMAN/ +Config/DeviceConfig/WLC.pm line 450, line 56. Exception: 3:Child PID +20683 exited with status 256 at /usr/SD/perl-5.6.1/lib/site_perl/EMAN +/Config/DeviceConfig/WLC.pm line 288, line 56. Could not initialise S +NMP [Timeout] at /usr/SD/perl-5.6.1/lib/site_perl/EMAN/Config/DeviceC +onfig/WLC.pm line 450, line 56. at /usr/SD/perl/lib/site_perl/EMAN/Co +nfig/Download.pl line 888 at /usr/SD/perl/lib/site_perl/EMAN/Config/D +ownload.pl line 888$ $ ----- cut here -----

The task at hand is as follows: I need to come up with a multiline regex that matches everything between "abc1-wl-wlc2.domain.com" and "apl02-wl-wlc1.domain.com", so I can process the contents of that file, section by section. if it helps any, I'm mainly interested in getting a list of hostnames that have a "Failed Binary transfers: 1" in their section..

Any thoughts / suggestions on the multiline?

I tried doing it on my own, but unfortunatelly without much success.. I know about the "m" modifier on the end of my regex, btw. I'm not completely useless, but I can't seem to craft the necessary regex for this task :/

Replies are listed 'Best First'.
Re: Multi-line regex help needed
by Limbic~Region (Chancellor) on Jun 18, 2009 at 15:30 UTC
    vxp,
    As I indicated in the CB, this limited data sample appears to be a perfect fit for changing $/ into paragraph mode (see perlvar;
    { local $/ = ""; while (<$fh>) { # $_ now has exactly 1 section in it } }

    Cheers - L~R

      Thanks, tried that by doing the following:

      #!/usr/bin/perl $file = shift; open(LOG, $file) or die "Couldn't open $file: $!\n"; local $/ = ""; while (<LOG>) { print "================BLAH================\n$_\n"; } close(LOG);

      it didn't quite do what's expected though. $_ doesn't have exactly one section in it. here's what's in it:

      ================BLAH================ abc1-wl-wlc1.domain.com ================BLAH================ Successful downloads: 0 Failed downloads: 1 Warnings: 0 ================BLAH================ Detailed Log Info ================BLAH================ General Expect Errors: 1 Text Dump Failure: 1 Binary Transfer Successful: 1 Attempts to get Text Dump: 1 Attempts to use TFTP: 1 Attempted binary transfers: 1 ================BLAH================ Textual log of the file transfer: Processing abc1-wl-wlc1.domain.com Performing [/usr/bin/ssh -2 -a -o " +StrictHostKeyChecking no" abc1-wl-wlc1.domain.com] Exception: 3:Child + PID 20854 exited with status 256 at /usr/SD/perl-5.6.1/lib/site_perl /EMAN/Config/DeviceConfig/WLC.pm line 288, line 62. ExpectLog: Sorry, +telnet is not allowed on this port!Connection to abc1-wl-wlc1.domain. +com closed. Performing TFTP using SNMP: DestHost [x.x.x.x] Com munity [private] agentTransferUploadServerIP [x.x.x.x] agentTransferUp +loadFilename [x.x.x.x.UCFM.18433] Exception: 3:Child PID 20854 exited + with status 256 at /usr/SD/perl-5.6.1/lib/site_perl/ EMAN/Config/DeviceConfig/WLC.pm line 288, line 62. Updating CVS ================BLAH================ abc1-wl-wlc2.domain.com
        vxp,
        Well then your real data and your posted data are not the same thing. What I provided you would have worked otherwise. Apparently all of your $ are actually not there in your real data.

        At this point, I would use the flip flop operator or my own flag to tell me the start and end of a section. It doesn't make sense to me to use a multi-line regex unless you intend to slurp the entire file into memory (scalability issues) or you intend to create your own sliding buffer window (code complexity issues).

        Cheers - L~R

        now I tried changing $/ to "domain.com" and it's a little closer. Actually _almost_ there:

        #!/usr/bin/perl $file = shift; open(LOG, $file) or die "Couldn't open $file: $!\n"; local $/ = "cisco.com\n"; while (<LOG>) { print "================BLAH================\n$_\n"; } close(LOG);

        Resulting output:

        abc1-wl-wlc1.domain.com ================BLAH================ Successful downloads: 0 Failed downloads: 1 Warnings: 0 Detailed Log Info General Expect Errors: 1 Text Dump Failure: 1 Binary Transfer Successful: 1 Attempts to get Text Dump: 1 Attempts to use TFTP: 1 Attempted binary transfers: 1 Textual log of the file transfer: Processing abc1-wl-wlc1.domain.com Performing [/usr/bin/ssh -2 -a -o " +StrictHostKeyChecking no" abc1-wl-wlc1.domain.com] Exception: 3:Child + PID 20854 exited with status 256 at /usr/SD/perl-5.6.1/lib/site_perl /EMAN/Config/DeviceConfig/WLC.pm line 288, line 62. ExpectLog: Sorry, +telnet is not allowed on this port!Connection to abc1-wl-wlc1.domain. +com closed. Performing TFTP using SNMP: DestHost [x.x.x.x] Com munity [private] agentTransferUploadServerIP [x.x.x.x] agentTransferUp +loadFilename [x.x.x.x.UCFM.18433] Exception: 3:Child PID 20854 exited + with status 256 at /usr/SD/perl-5.6.1/lib/site_perl/ EMAN/Config/DeviceConfig/WLC.pm line 288, line 62. Updating CVS abc1-wl-wlc2.domain.com ================BLAH================ Successful downloads: 0 Failed downloads: 1 Warnings: 0 Detailed Log Info General Expect Errors: 1 Text Dump Failure: 1 Binary Transfer Successful: 1 Attempts to get Text Dump: 1 Attempts to use TFTP: 1 Attempted binary transfers: 1 Textual log of the file transfer: Processing abc1-wl-wlc2.cisco.com Performing [/usr/bin/ssh -2 -a -o "S +trictHostKeyChecking no" abc1-wl-wlc2.domain.com] Exception: 3:Child +PID 20855 exited with status 256 at /usr/SD/perl-5.6.1/lib/site_perl /EMAN/Config/DeviceConfig/WLC.pm line 288, line 58. ExpectLog: Sorry, +telnet is not allowed on this port!Connection to abc1-wl-wlc2.domain. +com closed. Performing TFTP using SNMP: DestHost [x.x.x.x] Com munity [tasMANia] agentTransferUploadServerIP [x.x.x.x] agentTransferU +ploadFilename [x.x.x.x.UCFM.18441] Exception: 3:Child PID 20855 exite +d with status 256 at /usr/SD/perl-5.6.1/lib/site_perl/ EMAN/Config/DeviceConfig/WLC.pm line 288, line 58. Updating CVS
        The correct hostname needs to be in between the blah's - right now BLAH comes right after the correct hostname for the section :/
Re: Multi-line regex help needed
by BioLion (Curate) on Jun 18, 2009 at 17:12 UTC

    Maybe you could do a pseudo slurp:

    my $slrp = ''; while(<$fh>){ if (/ ^('domain.com' address bit)$ ## regex to catch lines that match the +section start (capturing it too) /x){ ## you have found a new section so process the previous one ## provided you have already seen one... &process_slrp() if (defined$slrp); ## restart the slurp, starting with the new domain.com bit ...store $slrp somewhere appropriate... ## reset $slrp = $1; ## start again } else { ## it isn't the start of a section so stick it into the slurp $slrp .= $_; } }

    Although i don't see why you can't slurp the whole file and do a multiline regex?

    m{ (domain bit.+) ##domain bit and everything until Successful downloads:(.*) ## next title, capture everything else u +ntil.. Failed downloads:(.*) ## next title, capture everything until... etc... }mxog

    This way you can directly parse out all the sections, even if they are not defined and stick them in an array to work with afterwards?

    Just a something something...