in reply to Re: Increase script speed
in thread Increase script speed

purpose remove old account directories to save space.

Initial State

Here is a what test.txt contains (mbox,userid only) the file is about 2gb contains about 10+ million lines.

Expected results.

read test.txt into script line by line

check each directory volume on server looking for mailbox (mail boxes are hashed in reverse if you had a id that had a mailbox # 123456789 the following should be found in a directory like /home/folder/volume#/98/76/123456789 there are multiple volumes.

once the mailbox is found build a path to mailbox for later removal

if mbox is found move mailbox directory from /home/folder/volume#/98/76/123456789 to /home/folder/volume#/98/76/123456789-trash (for later deletion by another script if space saving is worth it)

write the actions to a file called mbox_stats.txt with the following information

id,full mailbox path ( for example /home/folder/volume#/98/76/123456789),directory size

add up each directory found in mbox_stat to determine total savings if removed

Replies are listed 'Best First'.
Re^3: Increase script speed
by graff (Chancellor) on Jun 07, 2015 at 01:34 UTC
    As mentioned by Count Zero below, the best bet for the quantities involved will be some sort of indexed database for storing the info you want about each mbox path. Something like sqlite should do reasonably well, and will be easy to put in place.

    As for traversing the directory tree to get information, you might want to have a look at a script that I posted here a while back: Get useful info about a directory tree. It was designed to do the fastest possible traversal of a directory, and produce a one-line summary for every directory in the tree. You could use it as-is to get summaries for (particular portions or volumes of) your system (the man page is included in the script), or you can adapt the approach used there to your own needs.