First, thank you all for your suggestions. The problem has been one of algorythym. I am iterating @select files from and @allfile loop, and parsing for equality conditions.
As the actual code is over 300 lines, I have included an edited snippet. This code is derived from an earlier script where I needed to parse for reexes in files, not necessarily exact matches , and not necesaarily at the beginning. parsing @selectexpr against @allfiles made sense there.
Hashes are an excellent idea. with them I can parse foo-bar-baz.doc as as hash directly against all foo keys, with proper splitting and filtering, of course. This would allow me to scale up more efficiently.
I would howver prefer, if possible to keep the matching to a regexp rather an an equality operator.
Please ignore syntax errors in the code below, it has been abbreviated.
if ($mymobi =~ m/($myepub)/) {print "DUPLICATE FOUND !\n" ; &movetodir($myfilt,$dupdir ); } #Does NOT work if ($mymobi eq $myepub) {print "DUPLICATE FOUND !\n" ; &movetodir($myfilt,$dupdir ); } #Works
For an author-title pair,the matching would be done in the title(value) portion rather than the key, which would be expected to identical (though there might be exceptions ).
I need to hit the books on hashes here, as i havent really dealt much with them outside of a 20,000+ listing database with about 2 dozen hash fields.
opendir(DIR, $dir2 ) or die $!; while ( $file = readdir(DIR)) { if (-f $file) { # read only files chomp($file); $file =~ s/^\s+|\s+$//g; $filenam = "" ; push ( @srcarray, $file) ; if ($file =~ m/\.mobi$/ig ) { &typefiles($file, "mobifile"); } if ($file =~ m/\.azw3$/ig ) { &typefiles($file, "azw3file"); } sub typefiles( $tfile , $filetype ) { ($tfile, $filetype ) = @_ ; if ($filetype eq "mobifile" ) { push ( @mobiarray, $file) ; } # End mobifiles # Main body - parsing directory listing and performing actions foreach $authf (@srcarray){ if ($authf =~ m/\.pl$/) { next; } if ($authf =~ m/\.epub/ig ) { our $authf2 = $authf ; foreach my $myfilt (@mobiarray){ my $mymobi = $myfilt; my $myepub = $authf2; $mymobi = &extfilter($mymobi); $myepub = &extfilter($myepub); sub extfilter($line) { ($line) = @_; $line =~ s/\.mobi//ig ; $line =~ s/\.epub//ig ; $line =~ s/^\s+|\s+$//g; $line = lc $line; return $line; }
In reply to Re^2: Duplicates in Directories
by kel
in thread Duplicates in Directories
by kel
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |