comment on

Oh great keepers of Perl Wisdom. I come before you humbly today with a question of strategy rather than tactics.

I have a very large amount of files (15,000+) in several directories. The file name of each of the files contains information that I need to process, like:

name-country-language-date.pdf

Currently I end up going through the whole list many times, and it's taking forever.

First I go through and put all of the name entries into a hash. But then for each entry, I have to look again to see what files go together with that one. (files go together by name+country+language, and differ by date).

I do make sure I don't go back over files that have already been looked at, but that doesn't speed things up much at all.

The end result of all this should be a hash with the identifying elements of the file as the key, and the value should be an array of all the files that fit under that name in date order.

Here's how it goes: I go through the output of readdir and match on:
/([a-z0-9]*?-[a-z]{2}-[a-z]{2,3})-(\d{8})(-eol)?\.(pdf|html)$
Then, I open the directory again and look for files that match $1-$2-$3
Then I reverse sort them by the date in the filename, create an array out of that, and put it onto a hash, with the $1-$2-$3 being the key, and the array being the value.

I know this is inefficent, but am at a loss as to what to do better. Any ideas?

Obviously after reading this tale, you'll know that I'm unworthy to receive your assistance, but I beg to receive it.

In reply to Need help with efficient processing by cardozo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.