comment on

Hi again,

when I announced in my reply to Anonymous Monk the solution of all my problems was found, I was to hasty.

The script worked well within the setting I developed to test it. When I started it under real conditions I was surprised that many files had been copied, whose filenames didn't include the IDs listed in the text file. This was my thought at least at a first glance.

Then I realized that they did, i.e.: The operation "$_ =~ /^\Q$file/" on ID "I C 17" finds not only "I C 17.jpg" and "I C 17 -A.jpg" but also "I C 170.jpg" and "I C 1778 - A.jpg" etc. It's a characteristic of operating with regex which is well known and named as "greedy".

Now here is my next step: a snippet, which deals exclusively with the greedyness of regex:

#Script tests matching of IDs with a list of filenames 
#Should find matches without being too greedy
#For testing input is given as two arrays, defined within the script
#Input will be a a list of IDs in a text file and a scan of a director
+y containing the image files
#
#
use strict;
use warnings;

my @dir = ("I C 17.jpg", "I C 17 a.jpg", "I C 17 a,b -A x.jpg", "I C 1
+70.jpg", "I C 171 a,b -A x.jpg", "I C 171 a,b -B x.jpg");
my @ids = ("I C 17", "I C 171");


foreach my $a (@ids) {
my $ext = "[^0-9]*\.jpg";
my $a_ext=$a.$ext;


        foreach my $b (@dir) {
            if ($b =~ m/($a_ext)/) {
            print "Found file: $b\n";
            }
        
        }
}
[download]

All I have to do now, is to implement this into the main script

I hope, if this is done, the routine for importing files will work

better (annoying this dull play on words, isn't it?)

update: I implemented this nontoogreedy matching into the main script and it works better than before. But it's getting even more complicated, because IDs named "I C 17 <1>" refer to image files named "I C 17 _1_ -A.jpg". So I have to replace the the brackets before matching.

In reply to Re^4: Looking up elements of an array in another array! by better
in thread Looking up elements of an array in another array! by better

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.