periodicalcoder has asked for the wisdom of the Perl Monks concerning the following question:

First time poster on this forum. I periodically write scripts to parse data or to reduce server administration effort. Typically I write batch (ouch) or bash, and I have only written perl scripts a few times.

In this case I have spent a couple of hours looking for a solution to this and so far I have not had much luck, or I may not have understood the solutions that I had found. I am trying to match a range of lines, but only print the matching lines to a file if there is a specific string inside the initial match. I need to run this script on Windows (currently using Strawberry Perl), and I may end up handing the script to a non-technical user after setting up the environment to allow it to run.

This script successfully grabs data when only defining the beginning and end string match.

perl -nle "print if /\QLX*\E/ .. /\QCAS*\E/" "filename"
Here is an example of the source file (UPDATE: added the proceeding text on the "00003" lines. I'm embarrassed that I missed this as I somehow didn't think it would matter...):
LX* other data SVC*HC:00003 other data CAS* LX* other data SVC*HC:00001 other data CAS* LX* other data SVC*HC:00003 other data CAS*

Could someone point me in the right direction for how to print the matching range, between "LX* and CAS*", to a file only if the string "00003" exists within that range? So long as it can be run in Windows I am open to suggestions.

Thank you very much for your assistance. This script could potentially help save quite a lot of time on a weekly basis.

Sean

Replies are listed 'Best First'.
Re: Matching range of text with another string in between
by stevieb (Canon) on Apr 22, 2016 at 00:52 UTC

    Welcome to the Monastery, periodicalcoder!

    If the last line of each entry will always be CAS*, you can split up the file into chunks by setting the record separator ($/) to that, then configure your regex to match across newlines (/m) and to match newlines (/s). The zero-width lookahead (?:...) ensures that '00003' comes after "LX*" and before the record separator ("CAS*").

    use warnings; use strict; open my $fh, '<', 'file.txt' or die $!; { local $/ = 'CAS*'; while (<$fh>){ print $_ if /LX\*.*(?=00003$)/ms; } }

      The /m isn't required. /m enables the multi-line mode which allows ^ and $ to match embedded newline characters.

      Update: Argh! I missed seeing the $ in stevieb's reply. I apologise to periodicalcoder for causing confusion and to stevieb for being silly!

      Premature optimization is the root of all job security

        Stevieb and GrandFather, thank you for your replies. I feel that we are heading in the right direction, but I have a few questions.

        First, I should have been more thorough in my question. What you gave me does apply, but I also need the first 14 lines and the last 3 lines of the file. I wanted to make sure that you had this information.

        Steveieb, I used your code but I get no results. I used your code as is, and the perl command line window got no output. If I define the output file the file gets created with no content. On a guess I changed the input file name to a nonexistent file and I got the same results, and no errors. Please pardon my noobness, and I hope that you can help :)

        UPDATED: Here is what I have at the moment as I am trying to output the results to another file (the .pl file sits in the same directory as the input/output files):

        use warnings; use strict; #Note that this script throws errors when pull file paths are defined. #Must be run from the path that the input/output files exist. open my $fhi, '<', 'cr835.txt' or die "$!"; open my $fho, '>', 'cr_output.txt' or die "$!"; #Prints the first 14 lines to the output file while(<>) { 1 .. 14 ? print : last; } #Prints content starting with LX* and ending with CAS* #but only if 00003 exists { local $/ = 'CAS*'; while (<$fhi>){ print $fho if /LX\*.*(?=00003$)/s; } } #Prints the last 3 lines of the file while (<F>) { $. < $lines - 3 and print while <F> } close $fhi; close $fho;
Re: Matching range of text with another string in between
by Anonymous Monk on Apr 22, 2016 at 00:42 UTC
    perl -ne "BEGIN{ $/ = qq(\nCAS*\n) } /00003/ and print" "filename"

    Since I don't have Windows, it is untested!

      Anonymous, thank you for your contribution, but unfortunately your code returns the entire contents of the file. If you have any other thoughts please feel free to let us know.

        Does CAS* always start at the beginning of a line? And does the CAS* line have any trailing blanks?

        For Windows, try replacing \n with \r\n

        Try both of my one-liners on your little test data set from your original post. If one of them works, then your big data file is not what you think it is.

Re: Matching range of text with another string in between
by GrandFather (Saint) on Apr 23, 2016 at 00:30 UTC

    XML::ASCX12 may be useful, although it looks like you have processed data to play with rather than the raw XML.

    Premature optimization is the root of all job security
Re: Matching range of text with another string in between
by Anonymous Monk on Apr 26, 2016 at 13:29 UTC
    perl -ne "BEGIN{print scalar <> for 1..14} END{print +(split /^/, $x)[ +-3..-1]} $x = $x x !/\QLX*/ . $_; /\QCAS*\E/ && print $x x $x =~ /000 +03/" "filename"

      This works perfectly! Thank you for your patience and I can't adequately say how much I appreciate your help. I expect project creep with this but hopefully I'll be able to handle the rest on my own.

      Thank you all for your help!

        Thanks for letting me know it worked. Also, thanks for the interesting problem. :)

        Also, in the future, details like the first 14 lines and the last 3 should be mentioned up front, because code can be very sensitive to such seemingly small requirements.