Matching range of text with another string in between

periodicalcoder has asked for the wisdom of the Perl Monks concerning the following question:

First time poster on this forum. I periodically write scripts to parse data or to reduce server administration effort. Typically I write batch (ouch) or bash, and I have only written perl scripts a few times.

In this case I have spent a couple of hours looking for a solution to this and so far I have not had much luck, or I may not have understood the solutions that I had found. I am trying to match a range of lines, but only print the matching lines to a file if there is a specific string inside the initial match. I need to run this script on Windows (currently using Strawberry Perl), and I may end up handing the script to a non-technical user after setting up the environment to allow it to run.

This script successfully grabs data when only defining the beginning and end string match.

perl -nle "print if /\QLX*\E/ .. /\QCAS*\E/" "filename"
[download]

Here is an example of the source file (UPDATE: added the proceeding text on the "00003" lines. I'm embarrassed that I missed this as I somehow didn't think it would matter...):

LX*
other data
SVC*HC:00003
other data
CAS*
LX*
other data
SVC*HC:00001
other data
CAS*
LX*
other data
SVC*HC:00003
other data
CAS*
[download]

Could someone point me in the right direction for how to print the matching range, between "LX* and CAS*", to a file only if the string "00003" exists within that range? So long as it can be run in Windows I am open to suggestions.

Thank you very much for your assistance. This script could potentially help save quite a lot of time on a weekly basis.

Sean

Comment on Matching range of text with another string in between Select or Download Code

Replies are listed 'Best First'.
Re: Matching range of text with another string in between by stevieb (Canon) on Apr 22, 2016 at 00:52 UTC
Welcome to the Monastery, periodicalcoder! If the last line of each entry will always be `CAS`, you can split up the file into chunks by setting the record separator (`$/`) to that, then configure your regex to match across newlines (`/m`) and to match newlines (`/s`). The zero-width lookahead `(?:...)` ensures that '00003' comes after "LX" and before the record separator ("CAS"). `use warnings; use strict; open my $fh, '<', 'file.txt' or die $!; { local $/ = 'CAS'; while (<$fh>){ print $_ if /LX\.(?=00003$)/ms; } }` [download]	[reply] [d/l] [select]
Re^2: Matching range of text with another string in between by GrandFather (Saint) on Apr 22, 2016 at 03:30 UTC
The /m isn't required. /m enables the multi-line mode which allows ^ and $ to match embedded newline characters. Update: Argh! I missed seeing the $ in stevieb's reply. I apologise to periodicalcoder for causing confusion and to stevieb for being silly! Premature optimization is the root of all job security	[reply]
Re^3: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 22, 2016 at 17:44 UTC
Stevieb and GrandFather, thank you for your replies. I feel that we are heading in the right direction, but I have a few questions. First, I should have been more thorough in my question. What you gave me does apply, but I also need the first 14 lines and the last 3 lines of the file. I wanted to make sure that you had this information. Steveieb, I used your code but I get no results. I used your code as is, and the perl command line window got no output. If I define the output file the file gets created with no content. On a guess I changed the input file name to a nonexistent file and I got the same results, and no errors. Please pardon my noobness, and I hope that you can help :) UPDATED: Here is what I have at the moment as I am trying to output the results to another file (the .pl file sits in the same directory as the input/output files): use warnings; use strict; #Note that this script throws errors when pull file paths are defined. #Must be run from the path that the input/output files exist. open my $fhi, '<', 'cr835.txt' or die "$!"; open my $fho, '>', 'cr_output.txt' or die "$!"; #Prints the first 14 lines to the output file while(<>) { 1 .. 14 ? print : last; } #Prints content starting with LX* and ending with CAS* #but only if 00003 exists { local $/ = 'CAS'; while (<$fhi>){ print $fho if /LX\.*(?=00003$)/s; } } #Prints the last 3 lines of the file while (<F>) { $. < $lines - 3 and print while <F> } close $fhi; close $fho; [download]	[reply] [d/l]
Re^4: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 18:11 UTC
Re^5: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 22, 2016 at 19:28 UTC
Some notes below your chosen depth have not been shown here
Re: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 00:42 UTC
`perl -ne "BEGIN{ $/ = qq(\nCAS*\n) } /00003/ and print" "filename"` [download] Since I don't have Windows, it is untested!	[reply] [d/l]
Re^2: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 22, 2016 at 17:17 UTC
Anonymous, thank you for your contribution, but unfortunately your code returns the entire contents of the file. If you have any other thoughts please feel free to let us know.	[reply]
Re^3: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 18:04 UTC
Does CAS* always start at the beginning of a line? And does the CAS* line have any trailing blanks?	[reply]
Re^4: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 22, 2016 at 18:18 UTC
Re^5: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 18:41 UTC
Re^5: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 18:46 UTC
Some notes below your chosen depth have not been shown here
Re^3: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 17:52 UTC
For Windows, try replacing \n with \r\n	[reply]
Re^3: Matching range of text with another string in between by Anonymous Monk on Apr 22, 2016 at 23:26 UTC
Try both of my one-liners on your little test data set from your original post. If one of them works, then your big data file is not what you think it is.	[reply]
Re^4: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 22, 2016 at 23:45 UTC
Re^5: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 25, 2016 at 23:56 UTC
Re: Matching range of text with another string in between by GrandFather (Saint) on Apr 23, 2016 at 00:30 UTC
XML::ASCX12 may be useful, although it looks like you have processed data to play with rather than the raw XML. Premature optimization is the root of all job security	[reply]
Re: Matching range of text with another string in between by Anonymous Monk on Apr 26, 2016 at 13:29 UTC
`perl -ne "BEGIN{print scalar <> for 1..14} END{print +(split /^/, $x)[ +-3..-1]} $x = $x x !/\QLX/ . $_; /\QCAS\E/ && print $x x $x =~ /000 +03/" "filename"` [download]	[reply] [d/l]
Re^2: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 27, 2016 at 21:29 UTC
This works perfectly! Thank you for your patience and I can't adequately say how much I appreciate your help. I expect project creep with this but hopefully I'll be able to handle the rest on my own. Thank you all for your help!	[reply]
Re^3: Matching range of text with another string in between by Anonymous Monk on Apr 27, 2016 at 21:48 UTC
Thanks for letting me know it worked. Also, thanks for the interesting problem. :)	[reply]
Re^3: Matching range of text with another string in between by Anonymous Monk on Apr 27, 2016 at 21:53 UTC
Also, in the future, details like the first 14 lines and the last 3 should be mentioned up front, because code can be very sensitive to such seemingly small requirements.	[reply]
Re^4: Matching range of text with another string in between by periodicalcoder (Novice) on Apr 29, 2016 at 00:22 UTC