rycher has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

I have a file that contains multiple, very similar lines. How would I read through the file and remove every line in a block, except the very last one?

Example:

Before --- "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr +ol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","647","0","218 555","Nuclear Contr +ol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr +ol","Nuclear Operator","BurnsM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield El +ementary","Student","SimpsonM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield El +ementary","Student","SimpsonH" "SimpsonB","Bart","Simpson","NULL","748","1","218 555","Springfield El +ementary","Student","SkinnerP" After -- "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr +ol","Nuclear Operator","SimpsonM" "SimpsonB","Bart","Simpson","NULL","748","1","218 555","Springfield El +ementary","Student","SkinnerP"

Replies are listed 'Best First'.
Re: Removing multiple lines
by BrowserUk (Patriarch) on May 01, 2009 at 05:47 UTC

    Adjust -MIN=50 to suit:

    perl -snle"print if 1+index$_,substr($last,0,$MIN),0;$last=$_" -- -MIN +=50 junk "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr +ol","Nuclear Operator","SimpsonM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield El +ementary","Student","SimpsonH"

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Removing multiple lines
by mikeraz (Friar) on May 01, 2009 at 12:02 UTC

    From your example what distinguishes a block appears to be the first field (ID?), or name of the person. Is that true?

    However, your example doesn't match your description:

    Before Snip --- "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","647","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","BurnsM" After Snip --- "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM"
    The line containing "SimpsonM" and "648" is the first in the block.
        ??

    In any case, consider:

    #!/usr/bin/perl use strict; my %output_hash; while(<DATA>) { my ($id_field, @undef) = split /,/, $_; $output_hash{$id_field} = $_; } # Quickie Print print values %output_hash; # Or loop around it if there is more to be done: # foreach my $id_key (sort keys %output_hash) { # the more to be done stuff # print $output_hash{$id_key}; #} __DATA__ "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","647","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","BurnsM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield E +lementary","Student","SimpsonM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield E +lementary","Student","SimpsonH" "SimpsonB","Bart","Simpson","NULL","748","1","218 555","Springfield E +lementary","Student","SkinnerP"
    Getting the output sorted to suit is left as an exercise for you.

    Also consider using Text::CSV to manipulate CSV data like the type you've presented as an example.


    Be Appropriate && Follow Your Curiosity
      I solved it by adding more data to the beginning...so basically, I cheated by not using PERL. :-\

      There is an audit_date stamp in the MySQL database where this information is being extracted from.

      I simply added the audit_date field and removed everything that wasn't 'audited' in 2009.

      Perhaps not the most ideal solution since that particular database gets audited twice a year, but it will do for now.

Re: Removing multiple lines
by codeacrobat (Chaplain) on May 01, 2009 at 06:33 UTC
    perl -ne 'print if $_ ne $last;$last=$_' file

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});

      You missed the "very similar" bit.