in reply to Grouping unique lines into a file.

There are no "unique lines" in your sample data... at least, not within the normal (computer usage included) meaning of "unique."

Perhaps Limbic~Region's interpretation is correct; IMO that's likely and entirely plausible... but what do you want to do if an account includes multiple, unduplicated lines... as, for example, does 03.

If not, please explain what you really want to do, using more precise language, since another (semi-)plausible interpretation of your description is that you want each unique data element, per account number, included in a single file which includes all account numbers.


Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:
  1. code
  2. verbatim error and/or warning messages
  3. a coherent explanation of what "doesn't work actually means.

check Ln42!

Replies are listed 'Best First'.
Re^2: Grouping unique lines into a file.
by Anonymous Monk on Apr 21, 2014 at 17:27 UTC
    What is meant to be treated as unique lines are the numbers in front of each line.
    Trying to read a data file and group into separate files if the numbers in front are the same:

    One file name with all the "01s".
    One file name with all the "02s".
    One file name with all the "03s" and so on.

    Sorry for the lack of explanation.

      So does this do what you want?

      use strict; use warnings; my %accounts; while ( <DATA> ) { push @{ $accounts{$1} }, $2 if /^(\d+)\s+(.+)/; } for my $k ( keys %accounts ) { open my $fh, '>', $k . 'txt' or die "Can't open '$k.txt': $!\n"; print $fh join "\n", @{ $accounts{$k} }; }

      Update: Modified code slightly.

      Here's a solution that caches the filehandles (so no need to often make unnecessary open, close actions) and works in place (means, you don't have to keep all lines hanging around in memory)* and avoids doubled lines:
      #!/usr/bin/perl -w use strict; use warnings; use autodie; # I'm too lazy to write open ... or die stuff right here # cache for file handles; my %fh = (); my %seen = (); while (<DATA>) { next unless /^(\d{2})/; # ignore lines starting with anything el +se than 2 digits next if $seen{$_}++; # ignore if a line comes again unless ($fh{$1}) { warn "'$1.txt' already exists" if -e "$1.txt"; open my $FH, '>>', "$1.txt"; $fh{$1} = $FH; } print {$fh{$1}} $_; } foreach my $FH (values %fh) {close $FH}; __DATA__ 01 The quick red fox and dog as test. 02 Time flies like an arrow, fruit flies like a banana. 02 Time flies like an arrow, fruit flies like a banana. 03 Now is the time for all good men to come to the aid of their party. 01 The quick red fox jumped over the lazy brown dog. 01 The quick red fox jumped over the lazy brown dog. 02 Time flies like an arrow. 03 Now is the time for all good men to come to the aid of their party +and not going. 03 Now is the time for all.

      Greetings,
      Janek Schleicher

      *PS: O.K., that's not the hole truth as the keys of %seen are the lines :-). If it gets a memory problem, we can replace them with a hash function, e.g. with SHA1 like:
      ... use Digest::SHA1 qw/sha1/; # cache for file handles; my %fh = (); my %seen = (); while (<DATA>) { next unless /^(\d{2})/; # ignore lines starting with anything el +se than 2 digits next if $seen{sha1($_)}++; # ignore if a line comes again .... ...