in reply to Increase script speed

It's not clear what you're trying to accomplish - a brief explanation of "initial state" and "desired result state" would be helpful (e.g. "start with a directory containing, say, 20000 mbox files" and "when done, there should be a (set of) file(s) with 20000 lines with the following info about the mboxes").

You seem to be using "find" on each iteration of a list of lines from your one input file. Are you sure you aren't doing repetitions? (When directory trees are really large, full traversals using File::Find can be really time consuming. Get a clear idea of what tree(s) you need to traverse, and make sure you only do that once (storing stuff in a hash or array as needed).

Apart from that, the timing issue would seem to depend mainly on the size of your input file, and the size of the directory trees you're traversing. Can you give us some stats on that?

Replies are listed 'Best First'.
Re^2: Increase script speed
by ctrevgo_learn_perl (Initiate) on Jun 06, 2015 at 05:43 UTC

    purpose remove old account directories to save space.

    Initial State

    Here is a what test.txt contains (mbox,userid only) the file is about 2gb contains about 10+ million lines.

    Expected results.

    read test.txt into script line by line

    check each directory volume on server looking for mailbox (mail boxes are hashed in reverse if you had a id that had a mailbox # 123456789 the following should be found in a directory like /home/folder/volume#/98/76/123456789 there are multiple volumes.

    once the mailbox is found build a path to mailbox for later removal

    if mbox is found move mailbox directory from /home/folder/volume#/98/76/123456789 to /home/folder/volume#/98/76/123456789-trash (for later deletion by another script if space saving is worth it)

    write the actions to a file called mbox_stats.txt with the following information

    id,full mailbox path ( for example /home/folder/volume#/98/76/123456789),directory size

    add up each directory found in mbox_stat to determine total savings if removed

      As mentioned by Count Zero below, the best bet for the quantities involved will be some sort of indexed database for storing the info you want about each mbox path. Something like sqlite should do reasonably well, and will be easy to put in place.

      As for traversing the directory tree to get information, you might want to have a look at a script that I posted here a while back: Get useful info about a directory tree. It was designed to do the fastest possible traversal of a directory, and produce a one-line summary for every directory in the tree. You could use it as-is to get summaries for (particular portions or volumes of) your system (the man page is included in the script), or you can adapt the approach used there to your own needs.