rahulruns has asked for the wisdom of the Perl Monks concerning the following question:

I need a help in writing a PERL SCRIPT that matches a pattern in file and deletes from the second occurrence of the pattern. The file contains output from vmstat and I need to parse the Log with Test::Parser:Vmstat that accepts log in a particular format only. Any alternative to Test::Prser::Vmstat would be a good help

use strict; use warnings; use Test::Parser::Vmstat; my $parser = new Test::Parser::Vmstat or die "Couldn't create Test::Parser::Vmstat object\n"; $parser->parse($ARGV[0] or \*STDIN) or die "Could not parse Vmstat log.\n"; print $parser->to_xml(); FILE procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st 0 0 0 50101200 234628 9636240 0 0 0 34 10782 6802 + 1 1 98 0 0 1 0 0 50102044 234628 9636276 0 0 0 96 8630 6980 +1 1 98 0 0 1 0 0 50113020 234628 9626112 0 0 0 3092 13393 10324 + 3 1 96 0 0 1 0 0 50111244 234628 9628188 0 0 0 1540 10106 8874 + 2 1 97 0 0 0 0 0 50111256 234628 9628228 0 0 0 0 8674 6961 +1 1 98 0 0 0 0 0 50109884 234628 9628228 0 0 0 280 11290 7593 + 1 1 97 0 0 0 0 0 50110672 234628 9628264 0 0 0 16 8886 7301 +1 1 98 0 0 1 0 0 50110708 234628 9628268 0 0 0 40 11285 6833 + 1 1 98 0 0 NEED TO MATCH AND DELETE procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st

Replies are listed 'Best First'.
Re: Removing matched pattern except the first pattern
by roboticus (Chancellor) on May 28, 2013 at 12:04 UTC

    rahulruns:

    If the two header lines are always the first two lines, you could just discard the first two lines, like:

    my $line = <$INPUT_FH>; $line = <$INPUT_FH>; while ($line = <$INPUT_FH>) { # process file }

    Of course, that'll be a problem if you ever have a file with a missing header, or if the header repeats later on in the file. in that case, you could take advantage of the fact that the lines you want to keep always start with a number:

    while (my $line = <$INPUT_FH>) { # Ignore all lines not beginning with a number next unless $line =~ /^\s*\d/; # process file }

    If you have other lines that you want to keep that don't start with a number, though, then you'll have to match the lines and reject them:

    while (my $line = <$INPUT_FH>) { # Ignore the header lines next if $line =~ /^(procs|\s*r\s+b\s+swpd)/; # process file }

    In this case, you'll need to make your pattern complete enough to recognize the header and not reject the other lines you want to keep.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Removing matched pattern except the first pattern
by hdb (Monsignor) on May 28, 2013 at 10:54 UTC

    An alternative approach is to match the lines you want to see:

    use strict; use warnings; while(<DATA>){ print if /^\s+\d\s+\d\s+/; } __DATA__ procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st 0 0 0 50101200 234628 9636240 0 0 0 34 10782 6802 + 1 1 98 0 0 1 0 0 50102044 234628 9636276 0 0 0 96 8630 6980 +1 1 98 0 0 1 0 0 50113020 234628 9626112 0 0 0 3092 13393 10324 + 3 1 96 0 0 1 0 0 50111244 234628 9628188 0 0 0 1540 10106 8874 + 2 1 97 0 0 0 0 0 50111256 234628 9628228 0 0 0 0 8674 6961 +1 1 98 0 0 0 0 0 50109884 234628 9628228 0 0 0 280 11290 7593 + 1 1 97 0 0 0 0 0 50110672 234628 9628264 0 0 0 16 8886 7301 +1 1 98 0 0 1 0 0 50110708 234628 9628268 0 0 0 40 11285 6833 + 1 1 98 0 0
Re: Removing matched pattern except the first pattern
by RMGir (Prior) on May 28, 2013 at 12:11 UTC
    You're asking a strange question... Why use a pattern to do this at all?

    In this case, you want to capture everything from line 3 onwards, so why not use the $. variable which captures the line #, and save all the lines where $.>2? Or just use a variable to count the lines and save lines based on the counter value?

    If you HAVE to do this with a regex for the lines to exclude, you still don't need to count anything - just exclude anything matching /procs|swpd/ for example...


    Mike
Re: Removing matched pattern except the first pattern
by 2teez (Vicar) on May 28, 2013 at 13:24 UTC

    Hi rahulruns,

    If I understand your posted title and question well Removing matched pattern except the first pattern :
    you wanted to match and remove the following

    procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st
    which occur once and again in your data, EXCEPT for the first occurrence of them.
    If that is correct, you might do like so:
    use warnings; use strict; my $flag = 0; my $matched_control = qr/^\s+?\d/; while (<DATA>) { chomp; ++$flag if !/$matched_control/; if ( $flag > 2 ) { next if !/$matched_control/; } print $_, $/; } __DATA__ procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st 0 0 0 50101200 234628 9636240 0 0 0 34 10782 6802 + 1 1 98 0 0 1 0 0 50102044 234628 9636276 0 0 0 96 8630 6980 +1 1 98 0 0 1 0 0 50113020 234628 9626112 0 0 0 3092 13393 10324 + 3 1 96 0 0 1 0 0 50111244 234628 9628188 0 0 0 1540 10106 8874 + 2 1 97 0 0 0 0 0 50111256 234628 9628228 0 0 0 0 8674 6961 +1 1 98 0 0 0 0 0 50109884 234628 9628228 0 0 0 280 11290 7593 + 1 1 97 0 0 0 0 0 50110672 234628 9628264 0 0 0 16 8886 7301 +1 1 98 0 0 1 0 0 50110708 234628 9628268 0 0 0 40 11285 6833 + 1 1 98 0 0 procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st 0 0 0 50101200 234628 9636240 0 0 0 34 10782 6802 + 1 1 98 0 0 1 0 0 50102044 234628 9636276 0 0 0 96 8630 6980 +1 1 98 0 0 1 0 0 50113020 234628 9626112 0 0 0 3092 13393 10324 + 3 1 96 0 0 1 0 0 50111244 234628 9628188 0 0 0 1540 10106 8874 + 2 1 97 0 0 0 0 0 50111256 234628 9628228 0 0 0 0 8674 6961 +1 1 98 0 0 procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st 0 0 0 50111256 234628 9628228 0 0 0 0 8674 6961 +1 1 98 0 0 0 0 0 50109884 234628 9628228 0 0 0 280 11290 7593 + 1 1 97 0 0 0 0 0 50110672 234628 9628264 0 0 0 16 8886 7301 +1 1 98 0 0 1 0 0 50110708 234628 9628268 0 0 0 40 11285 6833 + 1 1 98 0 0
    Please, note that I have modified the OP data shown, to illustrate my point.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      2teez: It is deleting all the lines with procs -----------memory---------- ---swap-- -----io---- --system-- but leaving all the lines with r b swpd free buff cache si so bi bo in I need to keep the first two lines procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st and delete any further occurence of these lines

        rahulruns,
        It is deleting all the lines with procs...

        WHICH?, your actual code or the codes I presented in Re: Removing matched pattern except the first pattern.
        Maybe, you need to check your data and how you implemented the code given.
        The output of the code presented perviously is:

        Please, try it yourself TIY.

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: Removing matched pattern except the first pattern
by AnomalousMonk (Archbishop) on May 28, 2013 at 23:44 UTC

    When I first read the OP, the phrase "matched pattern" triggered an "I know, I'll use a regex..." knee-jerk response, and I formulated the problem statement "keep the first n occurrences of a pattern in a string along with any intervening matter and delete all occurrences thereafter".

    It turns out that a regex-based approach is not appropriate for the OPed problem unless it be a "keep all lines that match a given regex" strategy, which would, IMHO, be quite good because it combines a possibly quite large element of validation with data extraction and is also completely scaleable.

    In any event, proceeding along the lines of my first-but-not-necessarily-best thought, I came up with this, which may be of interest:

    >perl -wMstrict -le "my $s = 'x foo1 foo2 x foo3 x yfoo9 foo4 foo5 foo9y x foo6 foo7 x'; print qq{'$s' \n}; ;; my $pat = qr{ \b foo \d \b }xms; ;; for my $n (0 .. 4) { (my $t = $s) =~ s{ \A (?: .*? $pat){$n} \K | $pat }''xmsg; print qq{'$t'}; } " 'x foo1 foo2 x foo3 x yfoo9 foo4 foo5 foo9y x foo6 foo7 x' 'x x x yfoo9 foo9y x x' 'x foo1 x x yfoo9 foo9y x x' 'x foo1 foo2 x x yfoo9 foo9y x x' 'x foo1 foo2 x foo3 x yfoo9 foo9y x x' 'x foo1 foo2 x foo3 x yfoo9 foo4 foo9y x x'

    Update: I should add that the  \K regex operator used above is available with Perl versions 5.10+.

      I was able to make the file look like the way I needed but still I am not able to parse the VMstat logs

      use strict; use warnings; use Test::Parser::Vmstat; use Tie::File; my @vmstat_data; tie @vmstat_data, 'Tie::File', $ARGV[0] or die $!; `sed '/[procs|swpd]/d' $ARGV[0] > /tmp/vmstat_intermidiate`; open (my $in, '<', "/tmp/vmstat_intermidiate") or die "Can't read ol +d file: $!"; open (my $out, '>', "/tmp/vmstat_log") or die "Can't write new file: $ +!"; print $out "$vmstat_data[0]\n"; print $out "$vmstat_data[1]\n"; while( <$in> ) { print $out $_; } close $out; close $in; `rm -rf /tmp/vmstat_intermidiate`; my $parser = new Test::Parser::Vmstat or die "Couldn't create Test::Parser::Vmstat object\n"; my $logfile = '/tmp/vmstat_log'; $parser->parse($logfile) or die "Could not parse Vmstat log.\n"; print $parser->to_xml(); OUTPUT procs -----------memory---------- ---swap-- -----io---- --system-- --- +--cpu----- r b swpd free buff cache si so bi bo in cs us s +y id wa st 1 0 0 50122424 234628 9616504 0 0 0 20 2 4 +0 0 99 0 0 0 0 0 50121664 234628 9616864 0 0 0 0 7813 5956 +1 1 98 0 0 1 0 0 50122252 234628 9616864 0 0 0 190 10727 6872 + 2 1 97 0 0 0 0 0 50122092 234628 9616864 0 0 0 164 8645 6966 +1 1 98 0 0 PERL SCRIPT OUTPUT [root@r01mgt ~]# perl parse_vmstat.pl /tmp/vmstat <vmstat> </vmstat>