in reply to Regular Expression to find duplicate text blocks
And the output is:#! /usr/bin/perl use strict; use warnings; my ($key, %hash, @keys); while (<DATA>) { if (/^\@:\d{10,}:(?:\d{10,})?:1$/) { chomp; $key = $_; # needed to process keys in file order push @keys, $key; } else { $hash{$key} .= $_; } } my ($value, %revhash); # changed to foreach to process keys in file order # while(($key, $value) = each %hash) { foreach $key (@keys) { $value = $hash{$key}; if(exists($revhash{$value})){ print "$key is a duplicate of $revhash{$value}\n"; } else { $revhash{$value} = $key; } } __DATA__ @:1107530184::1 kkkkkkkkkkkkmkmkmk kkkkkk confused.gif @:1107530257:1107530439:1 kmkmkm <br>kmkmkm <br> <br>Fri Feb 4 10:17:37 2005 <br> mad.gif @:1107530709::1 ygyg ygygygyg lol.gif @:1107530717::1 ygyg ygygygyg lol.gif @:1107530963::1 cool help cool.gif @:1107532649:1107532689:1 k <br>kkkkkkkkkkkkkkkkk <br> <br>Fri Feb 4 10:57:29 2005 <br> lol lol.gif @:1107532758::1 lll Lets mad.gif @:1107532976::1 lll Lets mad.gif
Update: If you only want to eliminate one block from each pair of consecutive duplicate blocks then this might work:@:1107530717::1 is a duplicate of @:1107530709::1 @:1107532976::1 is a duplicate of @:1107532758::1
undef $/; my $file = <DATA>; $file =~ s/^\@:\d{10,}:(?:\d{10,})?:1\n(.*?) (^\@:\d{10,}:(?:\d{10,})?:1\n\1)/$2/msxg; print $file;
|
|---|