in reply to Removing duplicate lines from a file
First, always use strictures (use strict; use warnings;). They will give you an early heads up about many silly errors and typos.
Use the three parameter version of open and check the result. It's more secure, the intent is clearer and checking the result saves a heap of time debugging silly errors.
Avoid slurping files (@file = <MYFILE>;). It doesn't scale well. It doesn't generally improve performance and it doesn't generally help code clarity.
push (@sort ,"@split[2]\t@split[1]\t@split[3]\t@split[0]\t@split[4]");
is better written:
push (@sort ,"$split[2]\t$split[1]\t$split[3]\t$split[0]\t$split[4]");
but is much clearer using an array slice:
push @sort, join "\t", @split[1, 2, 3, 4, 0];
However, in Perl when you think 'unique', you should generally then think 'hash'. Consider:
use strict; use warnings; my $data = <<DATA; 2009-01-08 09:29:19 ABCDEF 943973 MS08-011 Security Update + for Microsoft Works Suite 2005 (KB943973) 2009-01-08 09:29:19 ABCDEF 943973 MS08-011 Security Update + for Microsoft Works Suite 2005 (KB943973) 2009-01-08 09:29:19 ABCDEF 951944 MS08-055 Security Update + for the 2007 Microsoft Office System (KB951944) 2009-01-08 09:29:19 ABCDEF 953432 Update for Microsoft Of +fice Outlook 2003 (KB953432) 2009-01-08 09:29:19 ABCDEF 954038 MS08-051 Security Update + for 2007 Microsoft Office System (KB954038) 2009-01-08 09:29:19 ABCDEF 954326 MS08-052 Security Update + for the 2007 Microsoft Office System (KB954326) 2009-01-08 09:29:19 ABCDEF 956391 Cumulative Security Upd +ate for ActiveX Killbits for Windows 2000 (KB956391) 2009-01-08 09:29:20 ABCDEF 956828 MS08-072 Security Update + for the 2007 Microsoft Office System (KB956828) 2009-01-08 09:29:20 ABCDEF 956828 MS08-072 Security Update + for the 2007 Microsoft Office System (KB956828) 2009-01-08 09:29:20 ABCDEF 957832 Update for Microsoft Of +fice Outlook 2003 Junk Email Filter (KB957832) 2009-01-08 09:29:22 ABCDEF 958439 MS08-074 Security Update + for the 2007 Microsoft Office System (KB958439) 2009-01-08 09:29:22 ABCDEF 958439 MS08-074 Security Update + for the 2007 Microsoft Office System (KB958439) DATA open my $inFile, '<', \$data; my %entries; while (<$inFile>) { my ($date, $time, $endpoint, $kbid, $id, $title) = /(\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\w+-\d+\s+)? (.*)/x; $id ||= ''; $entries{$kbid} = { date => $date, time => $time, endpoint => $endpoint, id => $id, kbid => $kbid, title => $title, }; } close $inFile; print join ("\t", @{$_}{qw(id kbid title endpoint date)}), "\n" for sort {$a->{id} cmp $b->{id} or $a->{kbid} <=> $b->{kbid}} valu +es %entries;
Prints:
953432 Update for Microsoft Office Outlook 2003 (KB953432) A +BCDEF 2009-01-08 956391 Cumulative Security Update for ActiveX Killbits for Wind +ows 2000 (KB956391) ABCDEF 2009-01-08 957832 Update for Microsoft Office Outlook 2003 Junk Email Filt +er (KB957832) ABCDEF 2009-01-08 MS08-011 943973 Security Update for Microsoft Works Suite 20 +05 (KB943973) ABCDEF 2009-01-08 MS08-051 954038 Security Update for 2007 Microsoft Office Sy +stem (KB954038) ABCDEF 2009-01-08 MS08-052 954326 Security Update for the 2007 Microsoft Offic +e System (KB954326) ABCDEF 2009-01-08 MS08-055 951944 Security Update for the 2007 Microsoft Offic +e System (KB951944) ABCDEF 2009-01-08 MS08-072 956828 Security Update for the 2007 Microsoft Offic +e System (KB956828) ABCDEF 2009-01-08 MS08-074 958439 Security Update for the 2007 Microsoft Offic +e System (KB958439) ABCDEF 2009-01-08
|
|---|