in reply to Re: looking for speed!! large file search and extract
in thread looking for speed!! large file search and extract

I recommend looking at substr and (if on Unix) the egrep utility, too.

Caution: Contents may have been coded under pressure.
  • Comment on Re^2: looking for speed!! large file search and extract

Replies are listed 'Best First'.
Re^3: looking for speed!! large file search and extract
by smbs (Acolyte) on Jan 12, 2005 at 17:00 UTC
    Thanx for answer but now have to make a small change
    I only want to extract the lines on condition that the
    line directly above it starts and ends with the
    following 5 chararacters "xyzdf"
    basically looking for 2 line match
    thanx

      A couple minor points (and maybe I'm a bit too new to PM to make the comments):

      1. You probably should have this in a new question, not a reply on the previous question, since it's now a new question.
      2. You probably should try something yourself, and then come back if it doesn't work. Or even if it does - share your answer and get feedback on it.
      One WTDI is to use a simplified state machine:
      my $match; while (<FH>) { print C $_ if $match and /^abcde.*PARTNAME$/; $match = /xyzdf$/; }
      This will set $match to true if the current line matches xyzdf at the end, false otherwise. The next time through the loop, we only check your second-line regexp if $match is already true (that is, the previous line matched the other regexp).

        It's not uncommon for people to modify their requirements. I think it belongs in the same thread as long as it's pretty close to the original problem.

        Caution: Contents may have been coded under pressure.

      then change holli's command line statement from perl -n -e "print if /^abcde/ && /PARTNAME$/" c:\somefile.txt>k:\1\somefile.txt to perl -n -e "print if /^xyzdf/ && /xyzdf$/" c:\somefile.txt>k:\1\somefile.txt

      If you don't understand this I really recommend you read perlre

      Update: Somehow missed reading "line directly above", so ignore the rest...except for reading perlre that's always a good idea if you haven't

      "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce

      if i get your comment right, this could be:
      c:\> perl -n -e "print $last, $_ if /xyzdf$/ && $last; $last= /^xyzdf +/ ? $_ : ''" file1>file2
      Assuming file1 looks like
      abc xyzdf def hij xyzdf klm xyzdf nop qrs xyzdf
      file2 will end up as
      xyzdf def hij xyzdf xyzdf nop qrs xyzdf
      Is that what you want?

      Update:
      if not, post some sample data and the desired output.
        Thanx but not what I would like
        Input file

        xyzdfhhlhlljjlxyzdf
        PARTNAMEhjjhhjhjkjkjkjkjPARTNAME
        hjill''';
        hgkjlklj
        xyzdfhhlhll666666jlxyzdf
        PARTNAMEhjjh88888888888jkjkjkjkjPARTNAME
        xyzdfh
        PARTNAMEh_not_to_be_extracted_jkjkjkjPARTNAME
        ghghjhj
        jlkjpkj
        xyzdfhhlh888888888ljjlxyzdf
        PARTNAMEhjjh8888iiiiiiiiiiiii888jkjkjkjkjPARTNAME

        Output file

        PARTNAMEhjjhhjhjkjkjkjkjPARTNAME
        PARTNAMEhjjh88888888888jkjkjkjkjPARTNAME
        PARTNAMEhjjh8888iiiiiiiiiiiii888jkjkjkjkjPARTNAME

        only 3 lines extracted because line above condition not
        meet