I have a very large amount of files (15,000+) in several directories. The file name of each of the files contains information that I need to process, like:
name-country-language-date.pdf
Currently I end up going through the whole list many times, and it's taking forever.
First I go through and put all of the name entries into a hash. But then for each entry, I have to look again to see what files go together with that one. (files go together by name+country+language, and differ by date).
I do make sure I don't go back over files that have already been looked at, but that doesn't speed things up much at all.
The end result of all this should be a hash with the identifying elements of the file as the key, and the value should be an array of all the files that fit under that name in date order.
Here's how it goes:
I go through the output of readdir and match on:
/([a-z0-9]*?-[a-z]{2}-[a-z]{2,3})-(\d{8})(-eol)?\.(pdf|html)$
Then, I open the directory again and look for files that match $1-$2-$3
Then I reverse sort them by the date in the filename, create an array out of that, and put it onto a hash, with the $1-$2-$3 being the key, and the array being the value.
I know this is inefficent, but am at a loss as to what to do better. Any ideas?
Obviously after reading this tale, you'll know that I'm unworthy to receive your assistance, but I beg to receive it.
In reply to Need help with efficient processing by cardozo
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |