in reply to Re: Matching range of text with another string in between
in thread Matching range of text with another string in between

Anonymous, thank you for your contribution, but unfortunately your code returns the entire contents of the file. If you have any other thoughts please feel free to let us know.

  • Comment on Re^2: Matching range of text with another string in between

Replies are listed 'Best First'.
Re^3: Matching range of text with another string in between
by Anonymous Monk on Apr 22, 2016 at 18:04 UTC

    Does CAS* always start at the beginning of a line? And does the CAS* line have any trailing blanks?

      I tried changing \n to \r\n as follows, but it still returned the entire contents:
      perl -ne "BEGIN{ $/ = qq(\r\nCAS*\r\n) } /00003/ and print" "filename"
      Yes, CAS* always starts at the beginning of the line, and those lines do not have any trailing blanks. I want to make sure that the matching entries start with LX* and end with CAS*. I could be wrong but it almost looks like the code above is matching starting with CAS* instead.

        $/ is endofline (see perldoc perlvar). At this point I no longer trust your data file and would use a dump program to verify that the sequence \r\nCAS*\r\n exists in the file.

        Are you sure it is not a Mac file? They have different line endings.

Re^3: Matching range of text with another string in between
by Anonymous Monk on Apr 22, 2016 at 17:52 UTC

    For Windows, try replacing \n with \r\n

Re^3: Matching range of text with another string in between
by Anonymous Monk on Apr 22, 2016 at 23:26 UTC

    Try both of my one-liners on your little test data set from your original post. If one of them works, then your big data file is not what you think it is.

      It is the end of the day for me but I'll definitely take a look on Monday. I also may owe you all an apology as I left text out of my sample data, and I edited it so that it is now accurate. There is text that comes before the "00003" string, but I don't know how heavily that impacts the code that you gave me.

      I have made progress and I will post the updated code on Monday. I tend to like to sleep on things before asking for another round of help. I sincerely appreciate your help so far, and I apologize that it has been a bit frustrating.

        I downloaded "Free Hex Editor Neo" and I saw that the lines seem to end (or start?) with two dots ".." whose hex values are "0d 0a". These are not visible when the file is opened normally. I don't know if this helps.

        Here is two full lines of actual text from the file which denotes the end of one section and the beginning of the next. It was probably unnecessary but I replaced the dollar figures with nines:

        CAS*CO*999*999.99~ LX*2~

        A line with the "00003" string.:

        SVC*HC:00003*999.99*999.9**1~

        At the moment I have the following code in a file named test_script.pl, which gets me the first 14 lines and the regex one-liner match gets me the last three:

        use warnings; use strict; #Note that this script throws errors when pull file paths are defined. #Must be run from the path that the input/output files exist. open my $fhi, '<', 'cr835.txt' or die "$!"; open my $fho, '>', 'cr_output.txt' or die "$!"; #Prints the first 14 lines to the output file while(<$fhi>) { 1 .. 14 ? print : last; } close $fhi; close $fho; system( q(perl -nle "print if /SE\Q*\E3841/ .. /IEA\Q*\E/" "cr835.txt" +) );

        I hope that the difficulty we have had wasn't caused by my leaving something important out of my initial post.