Here's a solution that caches the filehandles (so no need to often make unnecessary open, close actions) and works in place (means, you don't have to keep all lines hanging around in memory)* and avoids doubled lines:
#!/usr/bin/perl -w
use strict;
use warnings;
use autodie; # I'm too lazy to write open ... or die stuff right here
# cache for file handles;
my %fh = ();
my %seen = ();
while (<DATA>) {
next unless /^(\d{2})/; # ignore lines starting with anything el
+se than 2 digits
next if $seen{$_}++; # ignore if a line comes again
unless ($fh{$1}) {
warn "'$1.txt' already exists" if -e "$1.txt";
open my $FH, '>>', "$1.txt";
$fh{$1} = $FH;
}
print {$fh{$1}} $_;
}
foreach my $FH (values %fh) {close $FH};
__DATA__
01 The quick red fox and dog as test.
02 Time flies like an arrow, fruit flies like a banana.
02 Time flies like an arrow, fruit flies like a banana.
03 Now is the time for all good men to come to the aid of their party.
01 The quick red fox jumped over the lazy brown dog.
01 The quick red fox jumped over the lazy brown dog.
02 Time flies like an arrow.
03 Now is the time for all good men to come to the aid of their party
+and not going.
03 Now is the time for all.
Greetings,
Janek Schleicher
*PS: O.K., that's not the hole truth as the keys of %seen are the lines :-). If it gets a memory problem, we can replace them with a hash function, e.g. with SHA1 like:
...
use Digest::SHA1 qw/sha1/;
# cache for file handles;
my %fh = ();
my %seen = ();
while (<DATA>) {
next unless /^(\d{2})/; # ignore lines starting with anything el
+se than 2 digits
next if $seen{sha1($_)}++; # ignore if a line comes again
....
...
|