Removing multiple lines

rycher has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

I have a file that contains multiple, very similar lines. How would I read through the file and remove every line in a block, except the very last one?

Example:

Before ---
"SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr
+ol","Nuclear Operator","SimpsonM"
"SimpsonH","Homer","Simpson","NULL","647","0","218 555","Nuclear Contr
+ol","Nuclear Operator","SimpsonM"
"SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr
+ol","Nuclear Operator","BurnsM"
"SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield El
+ementary","Student","SimpsonM"
"SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield El
+ementary","Student","SimpsonH"
"SimpsonB","Bart","Simpson","NULL","748","1","218 555","Springfield El
+ementary","Student","SkinnerP"

After --
"SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr
+ol","Nuclear Operator","SimpsonM"
"SimpsonB","Bart","Simpson","NULL","748","1","218 555","Springfield El
+ementary","Student","SkinnerP"
[download]

Comment on Removing multiple lines Download Code

Replies are listed 'Best First'.
Re: Removing multiple lines by BrowserUk (Patriarch) on May 01, 2009 at 05:47 UTC
Adjust -MIN=50 to suit: `perl -snle"print if 1+index$_,substr($last,0,$MIN),0;$last=$_" -- -MIN +=50 junk "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Contr +ol","Nuclear Operator","SimpsonM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield El +ementary","Student","SimpsonH"` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Removing multiple lines by mikeraz (Friar) on May 01, 2009 at 12:02 UTC
From your example what distinguishes a block appears to be the first field (ID?), or name of the person. Is that true? However, your example doesn't match your description: `Before Snip --- "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","647","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","BurnsM" After Snip --- "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM"` [download] The line containing "SimpsonM" and "648" is the first in the block. ?? In any case, consider: #!/usr/bin/perl use strict; my %output_hash; while(<DATA>) { my ($id_field, @undef) = split /,/, $_; $output_hash{$id_field} = $_; } # Quickie Print print values %output_hash; # Or loop around it if there is more to be done: # foreach my $id_key (sort keys %output_hash) { # the more to be done stuff # print $output_hash{$id_key}; #} __DATA__ "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","647","0","218 555","Nuclear Cont +rol","Nuclear Operator","SimpsonM" "SimpsonH","Homer","Simpson","NULL","648","0","218 555","Nuclear Cont +rol","Nuclear Operator","BurnsM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield E +lementary","Student","SimpsonM" "SimpsonB","Bart","Simpson","NULL","748","0","218 555","Springfield E +lementary","Student","SimpsonH" "SimpsonB","Bart","Simpson","NULL","748","1","218 555","Springfield E +lementary","Student","SkinnerP" [download] Getting the output sorted to suit is left as an exercise for you. Also consider using Text::CSV to manipulate CSV data like the type you've presented as an example. Be Appropriate && Follow Your Curiosity	[reply] [d/l] [select]
Re^2: Removing multiple lines by rycher (Acolyte) on May 04, 2009 at 00:58 UTC
I solved it by adding more data to the beginning...so basically, I cheated by not using PERL. :-\ There is an audit_date stamp in the MySQL database where this information is being extracted from. I simply added the audit_date field and removed everything that wasn't 'audited' in 2009. Perhaps not the most ideal solution since that particular database gets audited twice a year, but it will do for now.	[reply]
Re: Removing multiple lines by codeacrobat (Chaplain) on May 01, 2009 at 06:33 UTC
`perl -ne 'print if $_ ne $last;$last=$_' file` [download] `print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});`	[reply] [d/l] [select]
Re^2: Removing multiple lines by BrowserUk (Patriarch) on May 01, 2009 at 06:37 UTC
You missed the "very similar" bit.	[reply]