Vasek has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have to make a script which is for filtering developers' sources before adding them into a CVS repo. There is just one thing which I solved very ugly: finding filename repeatitions.

Here is my solutions, but I'm really frustrated about this. Is there any suggestion to make it quite simply?

With the Bests: Vasek

### code from here ### @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); foreach (@classedSources) { my @path = split /\//, $_; my $file = pop @path; my $path = join '/', @path; @sourcesAndPaths = (@sourcesAndPaths, "$file|$path"); } print "\n"; print "-" x 80, "\n"; print "-- REPEATED SOURCE(S):\n"; print "-" x 80, "\n"; @sourcesAndPaths = sort @sourcesAndPaths; foreach (@sourcesAndPaths) { $fileidx++; my ($file, $path) = split /\|/, $_; my ($nextFile, $nextPath) = split /\|/, $sourcesAndPaths[$fileidx]; if ($file eq $nextFile) { $hit = 1; $hitcntr++; print "\n$file\n" if $hitcntr == 1; print " " x 6, $path."\n"; } else { print " " x 6, $path."\n" if $hit; $hit = ''; $hitcntr = ''; } }

Replies are listed 'Best First'.
Re: sorting a filelist array by filename
by Util (Priest) on May 19, 2009 at 15:28 UTC

    Use a hash-of-arrays to group the paths by filename.

    my %sourcesAndPaths; foreach (@classedSources) { my ( $dir, $file ) = m{ ^ (.+) / ([^/]+?) $ }x or do { warn "Can't parse '$_'"; next; }; push @{ $sourcesAndPaths{$file} }, $dir; } foreach my $file ( sort keys %sourcesAndPaths ) { my @dirs = @{ $sourcesAndPaths{$file} }; next if @dirs == 1; print "\n$file\n"; print "\t$_\n" for @dirs; }
Re: sorting a filelist array by filename
by roubi (Hermit) on May 19, 2009 at 15:38 UTC
    Something like this (not tested):
    use strict; use File::Basename qw(fileparse); my %paths; for my $source (@classedSources){ my ($file,$path) = fileparse($source); push @{$paths{$file}}, $path; } print "\n"; print "-" x 80, "\n"; print "-- REPEATED SOURCE(S):\n"; print "-" x 80, "\n"; my @duplicate_paths = grep { scalar @{$paths{$_}} > 1 } keys %paths; for my $dup (@duplicate_paths){ print "\n$file" . join("\n ", @{$paths{$dup}}); }
Re: sorting a filelist array by filename
by tweetiepooh (Hermit) on May 19, 2009 at 16:01 UTC
    Hashes are a good way to handle uniqueness. Look up "hash of arrays".

    Here is some code that illustrates some of what you want.

    #!/usr/local/bin/perl -w use 5.010; use strict; ### code from here ### my @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); my %hash; # this loop parses the array, splitting entries into path and filename # then populating a hash of arrays keyed on filename foreach (@classedSources) { my ($path,$file) = $_ =~ /(.*\/)(.*)/; push @{ $hash{$file} }, $path; } # parse the hash of arrays listing out the paths for each filename foreach my $file (sort keys %hash) { my @array = @{$hash{$file}}; say $file; foreach my $path (@array) { say "\t$path" } }
      Dear tweetiepooh, thanks a lot for your clear and useful code. Exactly that was what I needed! Anyway util's code is very similar, but roubi uses the File::Basename modul which I should avoid in this case, cause I must make a very pure and native perl code, without using any modul.

      Thx for all, again!
        the File::Basename modul which I should avoid in this case, cause I must make a very pure and native perl code, without using any modul
        File::Basename is a Core module which is included with every installation of Perl. There is no need to download and install it from CPAN, if that's what you're worried about.
Re: sorting a filelist array by filename
by dwm042 (Priest) on May 19, 2009 at 23:07 UTC
    If you use File::Basename, you can skip all that prep work and analyze your data directly.

    #!/usr/bin/perl use warnings; use strict; use File::Basename; ### code from here ### my @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); print "\n"; print "-" x 80, "\n"; print "-- REPEATED SOURCE(S):\n"; print "-" x 80, "\n"; my $fileidx = 0; my $hit = 0; my $hitcntr = 0; my $last_source = $#classedSources; foreach (sort { basename($a) cmp basename($b) } @classedSources ) { $fileidx++; next if $fileidx > $last_source; my ($file, $path) = fileparse( $_ ); my ($nextFile, $nextPath) = fileparse( $classedSources[$fileidx] ); if ($file eq $nextFile) { $hit = 1; $hitcntr++; print "\n$file\n" if $hitcntr == 1; print " " x 6, $path."\n"; } else { print " " x 6, $path."\n" if $hit; $hit = 0; $hitcntr = 0; } }
Re: sorting a filelist array by filename
by bichonfrise74 (Vicar) on May 20, 2009 at 04:47 UTC
    How about this?
    #!/usr/bin/perl use strict; my @classedSources = ( "hitdb/CustomerPartner/Opten/Scripts/Daily/egy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Daily/megegy_batch.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/cp_OptenTruncTables.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/htcp_firm_loadAll.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_EKN.sql", "hitdb/CustomerPartner/Opten/Scripts/Other/runHT_CP_FIRM.sql", "hitdb/Policy/Views/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/Policy/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaFE1.sql", "hitdb/Policy/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Policy/Views/Types/HtTypoWsCoverAlaListFE1.sql", "hitdb/Security/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Views/Types/HtTypoWsHobMfactorFE1.sql", "hitdb/Accounting/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Scripts/ht_tut_mail_address_ws.sql", "hitdb/Policy/Types/HtTypoWsAlaTarifFE1.sql", "hitdb/astools/as#arch/Setup/Scripts/ht_tut_mail_address_ws.sql" ); my %hash; for (@classedSources) { $hash{ (split /\//)[-1] }++; } while ( my ($key, $val) = each %hash ) { if ( $val > 1 ) { my @files = grep /\b$key\b/, @classedSources; print "$key\n"; print join "\n", @files , "\n"; } }