juo has asked for the wisdom of the Perl Monks concerning the following question:

How can I remove the first lines out of an ascii file until the first word of a line has a certain match. That line that has a match and all the lines that comes after it should be parsed through. Example : The CPN is the word to be matched :

BOM Report ------------------- BOM : Date : 19 Sep 2001 14:57:17 Total # of CPN's (41) ----------------------- CPN Quantity MPN Vendor 51-0597-000 15 06035C103MAT2A AVX 51-0597-000 15 06035C103MAT4A AVX 51-0597-000 15 C0603C103M5RAC KEMET ELECTRONICS New file : CPN Quantity MPN Vendor 51-0597-000 15 06035C103MAT2A AVX 51-0597-000 15 06035C103MAT4A AVX 51-0597-000 15 C0603C103M5RAC KEMET ELECTRONICS

Edit Masem 2001-09-19 - Code Tags

Replies are listed 'Best First'.
Will the range operator help you?
by Rhose (Priest) on Sep 19, 2001 at 17:15 UTC
    How about this? The range operator will allow you to skip (not remove) lines up to (and including) the matching line.
    use strict; while (<DATA>) { next if 1../^CPN/; #-- Parse lines... print; } __DATA__ BOM Report ------------------- BOM : Date : 19 Sep 2001 14:57:17 Total # of CPN's (41) ----------------------- CPN Quantity MPN Vendor 51-0597-000 15 06035C103MAT2A AVX 51-0597-000 15 06035C103MAT4A AVX 51-0597-000 15 C0603C103M5RAC KEMET ELECTRONICS
      The operator you speak of is not the range operator, but is in fact the flip-flop operator (oooh, it's a sneaky one!). I believe this was grabbed from AWK, and is *uber* handy when doing text processing (like AWK). You can have all sorts of funky variations on the above, from fancy regexps to simple line numbers. I shall use the example from the always informative perl docs (man perlop) -
      if (101 .. 200) { print; } # print 2nd hundred lines next line if (1 .. /^$/); # skip header lines s/^/> / if (/^$/ .. eof()); # quote body # parse mail messages while (<>) { $in_header = 1 .. /^$/; $in_body = /^$/ .. eof(); # do something based on those } continue { close ARGV if eof; # reset $. each file }
      Indeed a thing of beauty!
      HTH

      broquaint

      This works fine but I don't want to include my matching line into the skip. Is this possible.

        Sure if you just do something like this -
        while(<FILE>) { next unless /^CPN/ .. eof() and !/^CPN/; [munge data here] }
        HTH

        broquaint

Re: How to remove lines out of an ascii file
by tommyw (Hermit) on Sep 19, 2001 at 17:10 UTC

      Of course, in light of the other responses, I suspect that I've been misled by your use of "remove": you just want to skip the irrelevant lines.

      Which actually raises the question of whether you're interested in the header lines:

      New file :
      CPN     Quantity        MPN     Vendor  
      
      which lie between the first an second section (you may well be, since these will allow you to spot the fact that you are indeed changing section).

      However, if you're only interested in the data lines

      51-0597-000     15      06035C103MAT2A  AVX themselves 
      then:
      while (<INFILE>) { next unless /^(\d{2}-\d{4}-\d{3})\s+(\d+)\s+(\S+)\s+(\S+)$/; # Not only have we thrown out the garbage, # but the data fields are now in $1..$4 ... more processing ... }
      will skip all the non-data lines, as well as breaking the string apart into its constituent fields (you can obviously taylor the regexp to your precise needs, if you need to break out the middle component of the first field, for example).

Re: How to remove lines out of an ascii file
by dragonchild (Archbishop) on Sep 19, 2001 at 17:12 UTC
    LINE: while (my $line = <INFILE>) { next LINE while ($line =~ /./ .. $line =~ /CPN/); # Once here, you have matched the first CPN line and onwards. }

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: How to remove lines out of an ascii file
by broquaint (Abbot) on Sep 19, 2001 at 18:40 UTC
    If you do in fact *really* want to removes lines from the file you may want to use the flip-flop operater suggested above and try and take advantage of perl's edit in-place command-line switch like so
    # cnt=`wc -l filename | sed -e 's/ *\([0-9]\+\).*/\1/'` # perl -ni -e 'print if /^CPN/ .. '$cnt';' filename
    Admittedly that's very filthy, but if you use eof() you loose the last line (or at least I do). I'm sure there's a *far* better wholly perl solution, but that works just fine (if you've got a unix and gnu ;o)
    HTH

    broquaint