sorting a filelist array by filename

Vasek has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have to make a script which is for filtering developers' sources before adding them into a CVS repo. There is just one thing which I solved very ugly: finding filename repeatitions.

Here is my solutions, but I'm really frustrated about this. Is there any suggestion to make it quite simply?

With the Bests: Vasek

### code from here ###
@classedSources = (
  "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql",
  "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql",
  "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql",
  "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql",
  "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql",
  "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql",
  "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql",
  "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql",
  "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql",
  "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql",
  "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql",
  "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql",
  "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql",
  "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql",
  "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql",
  "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql",
  "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql",
  "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql"
);

foreach (@classedSources) {
  my @path = split /\//, $_;
  my $file = pop @path;
  my $path = join '/', @path;
  @sourcesAndPaths = (@sourcesAndPaths, "$file|$path");
}
print "\n";
print "-" x 80, "\n";
print "-- REPEATED SOURCE(S):\n";
print "-" x 80, "\n";

@sourcesAndPaths = sort @sourcesAndPaths;
foreach (@sourcesAndPaths) {
  $fileidx++;
  my ($file, $path) = split /\|/, $_;
  my ($nextFile, $nextPath) = split /\|/, $sourcesAndPaths[$fileidx];
  if ($file eq $nextFile) {
    $hit = 1; $hitcntr++;
    print "\n$file\n" if $hitcntr == 1;
    print " " x 6, $path."\n";
  } else {
    print " " x 6, $path."\n" if $hit; $hit = ''; $hitcntr = '';
  }
}
[download]

Comment on sorting a filelist array by filename Download Code

Replies are listed 'Best First'.
Re: sorting a filelist array by filename by Util (Priest) on May 19, 2009 at 15:28 UTC
Use a hash-of-arrays to group the paths by filename. `my %sourcesAndPaths; foreach (@classedSources) { my ( $dir, $file ) = m{ ^ (.+) / ([^/]+?) $ }x or do { warn "Can't parse '$_'"; next; }; push @{ $sourcesAndPaths{$file} }, $dir; } foreach my $file ( sort keys %sourcesAndPaths ) { my @dirs = @{ $sourcesAndPaths{$file} }; next if @dirs == 1; print "\n$file\n"; print "\t$_\n" for @dirs; }` [download]	[reply] [d/l]
Re: sorting a filelist array by filename by roubi (Hermit) on May 19, 2009 at 15:38 UTC
Something like this (not tested): `use strict; use File::Basename qw(fileparse); my %paths; for my $source (@classedSources){ my ($file,$path) = fileparse($source); push @{$paths{$file}}, $path; } print "\n"; print "-" x 80, "\n"; print "-- REPEATED SOURCE(S):\n"; print "-" x 80, "\n"; my @duplicate_paths = grep { scalar @{$paths{$_}} > 1 } keys %paths; for my $dup (@duplicate_paths){ print "\n$file" . join("\n ", @{$paths{$dup}}); }` [download]	[reply] [d/l]
Re: sorting a filelist array by filename by tweetiepooh (Hermit) on May 19, 2009 at 16:01 UTC
Hashes are a good way to handle uniqueness. Look up "hash of arrays". Here is some code that illustrates some of what you want. #!/usr/local/bin/perl -w use 5.010; use strict; ### code from here ### my @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); my %hash; # this loop parses the array, splitting entries into path and filename # then populating a hash of arrays keyed on filename foreach (@classedSources) { my ($path,$file) = $_ =~ /(.\/)(.)/; push @{ $hash{$file} }, $path; } # parse the hash of arrays listing out the paths for each filename foreach my $file (sort keys %hash) { my @array = @{$hash{$file}}; say $file; foreach my $path (@array) { say "\t$path" } } [download]	[reply] [d/l]
Re^2: sorting a filelist array by filename by Vasek (Acolyte) on May 19, 2009 at 23:19 UTC
Dear tweetiepooh, thanks a lot for your clear and useful code. Exactly that was what I needed! Anyway util's code is very similar, but roubi uses the File::Basename modul which I should avoid in this case, cause I must make a very pure and native perl code, without using any modul. Thx for all, again!	[reply]
Re^3: sorting a filelist array by filename by toolic (Bishop) on May 19, 2009 at 23:36 UTC
the File::Basename modul which I should avoid in this case, cause I must make a very pure and native perl code, without using any modul File::Basename is a Core module which is included with every installation of Perl. There is no need to download and install it from CPAN, if that's what you're worried about.	[reply]
Re^4: sorting a filelist array by filename by Vasek (Acolyte) on May 20, 2009 at 07:33 UTC
Re: sorting a filelist array by filename by dwm042 (Priest) on May 19, 2009 at 23:07 UTC
If you use File::Basename, you can skip all that prep work and analyze your data directly. #!/usr/bin/perl use warnings; use strict; use File::Basename; ### code from here ### my @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); print "\n"; print "-" x 80, "\n"; print "-- REPEATED SOURCE(S):\n"; print "-" x 80, "\n"; my $fileidx = 0; my $hit = 0; my $hitcntr = 0; my $last_source = $#classedSources; foreach (sort { basename($a) cmp basename($b) } @classedSources ) { $fileidx++; next if $fileidx > $last_source; my ($file, $path) = fileparse( $_ ); my ($nextFile, $nextPath) = fileparse( $classedSources[$fileidx] ); if ($file eq $nextFile) { $hit = 1; $hitcntr++; print "\n$file\n" if $hitcntr == 1; print " " x 6, $path."\n"; } else { print " " x 6, $path."\n" if $hit; $hit = 0; $hitcntr = 0; } } [download]	[reply] [d/l]
Re: sorting a filelist array by filename by bichonfrise74 (Vicar) on May 20, 2009 at 04:47 UTC
How about this? #!/usr/bin/perl use strict; my @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); my %hash; for (@classedSources) { $hash{ (split /\//)[-1] }++; } while ( my ($key, $val) = each %hash ) { if ( $val > 1 ) { my @files = grep /\b$key\b/, @classedSources; print "$key\n"; print join "\n", @files , "\n"; } } [download]	[reply] [d/l]