You need to consider a different approach -- something that will make efficient use of existing tools for doing basic things, and that will reduce the comparison problem to a simple matter of string diffs between two plain-text listings (i.e. using the standard "diff" tool that comes with unix/linux). There's no need to have 1 GB data structures in memory.
How about breaking the problem down to three separate procedures:
- Create a sorted list of all the directories of interest on each scan.
- For each directory, create a separate sorted list for the symlinks and data files in that direcory.
- To find differences between yesterday and today, use the standard unix/linux "diff" tool on consecutive directory lists, and on consecutive file lists for each directory.
File::Find will be good for the first step, though you might want to consider just using the available unix/linux tools:
find /path/of_interest -type d | sort > /path/for_log_info/dirlist.y
+ymmdd
Using "diff" on two consecutive "dirlist" files will reveal the addition or removal of directories.
For step 2, I would do something like:
open( DLIST, "<", $dirlist );
while ( my $dir = <DLIST>) {
chomp $dir;
opendir D, $dir or do {
warn "opendir failed on $dir: $!\n"; next;
};
( my $file_list_name = $dir ) =~ tr{/}{%};
open( FLIST, ">", "$log_path/$file_list_name.$today" )
or die "cannot write to $log_path/$file_list_name.$today: $!\n
+";
for my $file ( sort grep { !-d "$dir/$_" } readdir( D )) {
# check for symlink vs. datafile
# gather other stat info as needed,
# print a nicely formatted line to FLIST
}
close FLIST;
closedir D;
}
close DLIST;
With that done, running the basic "diff" command on two consecutive file listings for a given directory (assuming that the directory existed on both days) will tell you which files changed, which were added, and which were removed. Just figure out what you want to do with the output from "diff".
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.