Memory Question

PrimeLord has asked for the wisdom of the Perl Monks concerning the following question:

Once again monks I have come to you for some advice. I have written a script that will search a unix file system for files or directories with user defined permissions. For example it will search for world readable files. It then reads a baseline file of world readable files it found the day before and produces a report on what has changed between the days. I am running into some memory issues though.

The script runs a find for the files it is looking for and then reads them into a hash. It also reads the contents of the baseline file into a hash and then does some cross comparisons of the two to generate the report. The problem I am having is the list of files can be as big as 19 meg. So when it reads in 19 meg of current data and another 19 meg of yesterdays data and then tries to compare them it takes forever to run and bogs the system down some. Is there a more efficent way for me to do this. Here is some of the code.

sub _todays_files {
        my ($host, $script_mode, $options) = @_;
        my %today;
        my $search_files = (split /:/, $options->{$script_mode})[0];
        if ($host =~ /hosta/) {
                open IN, "find / $search_files -print |"
                        or die "Error: Could not run find command:\n$!
+";
                while (<IN>) {
                        chomp;
                        $today{$_}++;
                }
                close IN
                        or warn "Error: Could not close find command:\
+n$!";
        } else {
                open IN, "find / -path '/usr/home' -prune -o $search_f
+iles -print |"
                        or die "Error: Could not run find command:\n$!
+";
                while (<IN>) {
                        chomp;
                        $today{$_}++;
                }
                close IN
                        or warn "Error: Could not close find command:\
+n$!";
        }
        return \%today;
}

sub _read_benchmark {
        my ($bench_dir, $host) = @_;
        my %yesterday;
        if (-e "$bench_dir/$host.benchmark") {
                open BENCH, "$bench_dir/$host.benchmark"
                        or die "Error: Could not open $bench_dir/$host
+.benchmark:\n$!";
                while (<BENCH>) {
                        chomp;
                        $yesterday{$_}++;
                }
                close BENCH
                        or warn "Error: Could not close $bench_dir/$ho
+st.benchmark:\n$!";
        }
        return \%yesterday;
}

sub _write_benchmark {
        my ($bench_dir, $host, $today, $user_uid, $user_gid) = @_;
        open BENCH, "> $bench_dir/$host.benchmark"
                or die "Error: Could not open $bench_dir/$host.benchma
+rk for writing:\n$!";
        for (sort keys %$today) {
                print BENCH "$_\n";
        }
        close BENCH
                or warn "Error: Could not close $bench_dir/$host.bench
+mark:\n$!";
        chown $user_uid, $user_gid, "$bench_dir/$host.benchmark";
        chmod 0640, "$bench_dir/$host.benchmark";
        return;
}

sub _print_report {
        my ($dirs, $today, $yesterday, $options, $script_mode, $host) 
+= @_;
        my $skip = join ('|', map { quotemeta } keys %$dirs);
        my $title = (split /:/, $options->{$script_mode})[2];
        my $title_count = length($title);
        my $host_count = length($host);
        my $new_count = ($host_count + $title_count + 30);
        my $old_count = ($host_count + $title_count + 34);
        print NEW "########## New $title on $host ##########\n\n";
        for (sort keys %$today) {
                print NEW "$_\n" unless (/^($skip)/) || exists $yester
+day->{$_};
        }
        print NEW "\n";
        print NEW "#" x $new_count;
        print OLD "########## Removed $title on $host ##########\n\n";
        for (sort keys %$yesterday) {
                print OLD "$_\n" unless (/^($skip)/) || exists $today-
+>{$_};
        }
        print OLD "\n";
        print OLD "#" x $old_count;
        return;
}
[download]

I didn't put the whole script because I figured that was a lot of information to read through as it is, but I think that should be enough to get the gist of what I am doing. Basically it reads all the files it finds into a today hash and all the files it found yesterday into a yesterday hash. And then compares the hashes and writes the differences into two different reports. Any suggestions on how I can optimize this?

Thanks,
Prime

Comment on Memory Question Download Code

Replies are listed 'Best First'.
Re: Memory Question by dragonchild (Archbishop) on Feb 21, 2003 at 19:05 UTC
Would it just be easier to use unix commands to do this? find, sort, grep ... those were unix commands before they were co-opted as Perl keywords. To me, this sounds like a job for the shell, not Perl. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply]
Re: Re: Memory Question by l2kashe (Deacon) on Feb 21, 2003 at 20:03 UTC
While this could be done in the shell, I have found personnally that Perl get things done slightly faster and cleaner than spawning massive chains of shell commands. Its also easier to collect all the info and compare / process it in perl than with shell utils, but then again I am not shell guru by any means. Someone pointed out File::Find which will get you rolling, and will be slightly kinder on your system than a 'find' would be. Also there are some decent chapters in the panther book (Advanced Perl Programming) that deal with efficiently comparing 2 hashes, and pulling out the differences between them. Best of luck /* And the Creator, against his better judgement, wrote man.c */	[reply]
Re: Re: Re: Memory Question by Limbic~Region (Chancellor) on Feb 21, 2003 at 22:40 UTC
While I LOVE Perl, I disagree in this case. If the *only* thing needing to be checked is permissions, the following would suffice: `find / -exec ls -l {} \; > /tmp/pass1 find / -exec ls -l {} \; > /tmp/pass2 diff /tmp/pass1 /tmp/pass2` [download] Of course you would tailor the find command as PrimeLord indicated (only get specific files with permissions). In my experience, this specific task is faster/easier/efficient using Unix commands. Cheers - L~R	[reply] [d/l]
Re: Memory Question by derby (Abbot) on Feb 21, 2003 at 19:39 UTC
Instead of piping out to find and saving that to a hash, you could use File::Find and utilizing the wanted function, compare the found file with the pre-loaded yesterdays' finds. -derby	[reply]
Re: Memory Question by Thelonius (Priest) on Feb 21, 2003 at 20:57 UTC
As derby points out, you don't need both hashes in memory at once. Further than that, you could use a tied hash. Another approach is to keep the files sorted and then use the `comm` command to find the differences. E.g. `comm -23 old new >oldonly comm -13 old new >newonly` [download]	[reply] [d/l] [select]


more useful options
	PerlMonks