First, thank you all for your suggestions. The problem has been one of algorythym. I am iterating @select files from and @allfile loop, and parsing for equality conditions.

As the actual code is over 300 lines, I have included an edited snippet. This code is derived from an earlier script where I needed to parse for reexes in files, not necessarily exact matches , and not necesaarily at the beginning. parsing @selectexpr against @allfiles made sense there.

Hashes are an excellent idea. with them I can parse foo-bar-baz.doc as as hash directly against all foo keys, with proper splitting and filtering, of course. This would allow me to scale up more efficiently.

I would howver prefer, if possible to keep the matching to a regexp rather an an equality operator.


Please ignore syntax errors in the code below, it has been abbreviated.

if ($mymobi =~ m/($myepub)/) {print "DUPLICATE FOUND !\n" ; &movetodir($myfilt,$dupdir ); } #Does NOT work if ($mymobi eq $myepub) {print "DUPLICATE FOUND !\n" ; &movetodir($myfilt,$dupdir ); } #Works

For an author-title pair,the matching would be done in the title(value) portion rather than the key, which would be expected to identical (though there might be exceptions ).

I need to hit the books on hashes here, as i havent really dealt much with them outside of a 20,000+ listing database with about 2 dozen hash fields.

opendir(DIR, $dir2 ) or die $!; while ( $file = readdir(DIR)) { if (-f $file) { # read only files chomp($file); $file =~ s/^\s+|\s+$//g; $filenam = "" ; push ( @srcarray, $file) ; if ($file =~ m/\.mobi$/ig ) { &typefiles($file, "mobifile"); } if ($file =~ m/\.azw3$/ig ) { &typefiles($file, "azw3file"); } sub typefiles( $tfile , $filetype ) { ($tfile, $filetype ) = @_ ; if ($filetype eq "mobifile" ) { push ( @mobiarray, $file) ; } # End mobifiles # Main body - parsing directory listing and performing actions foreach $authf (@srcarray){ if ($authf =~ m/\.pl$/) { next; } if ($authf =~ m/\.epub/ig ) { our $authf2 = $authf ; foreach my $myfilt (@mobiarray){ my $mymobi = $myfilt; my $myepub = $authf2; $mymobi = &extfilter($mymobi); $myepub = &extfilter($myepub); sub extfilter($line) { ($line) = @_; $line =~ s/\.mobi//ig ; $line =~ s/\.epub//ig ; $line =~ s/^\s+|\s+$//g; $line = lc $line; return $line; }

In reply to Re^2: Duplicates in Directories by kel
in thread Duplicates in Directories by kel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.