comment on

Once again monks I have come to you for some advice. I have written a script that will search a unix file system for files or directories with user defined permissions. For example it will search for world readable files. It then reads a baseline file of world readable files it found the day before and produces a report on what has changed between the days. I am running into some memory issues though.

The script runs a find for the files it is looking for and then reads them into a hash. It also reads the contents of the baseline file into a hash and then does some cross comparisons of the two to generate the report. The problem I am having is the list of files can be as big as 19 meg. So when it reads in 19 meg of current data and another 19 meg of yesterdays data and then tries to compare them it takes forever to run and bogs the system down some. Is there a more efficent way for me to do this. Here is some of the code.

sub _todays_files {
        my ($host, $script_mode, $options) = @_;
        my %today;
        my $search_files = (split /:/, $options->{$script_mode})[0];
        if ($host =~ /hosta/) {
                open IN, "find / $search_files -print |"
                        or die "Error: Could not run find command:\n$!
+";
                while (<IN>) {
                        chomp;
                        $today{$_}++;
                }
                close IN
                        or warn "Error: Could not close find command:\
+n$!";
        } else {
                open IN, "find / -path '/usr/home' -prune -o $search_f
+iles -print |"
                        or die "Error: Could not run find command:\n$!
+";
                while (<IN>) {
                        chomp;
                        $today{$_}++;
                }
                close IN
                        or warn "Error: Could not close find command:\
+n$!";
        }
        return \%today;
}

sub _read_benchmark {
        my ($bench_dir, $host) = @_;
        my %yesterday;
        if (-e "$bench_dir/$host.benchmark") {
                open BENCH, "$bench_dir/$host.benchmark"
                        or die "Error: Could not open $bench_dir/$host
+.benchmark:\n$!";
                while (<BENCH>) {
                        chomp;
                        $yesterday{$_}++;
                }
                close BENCH
                        or warn "Error: Could not close $bench_dir/$ho
+st.benchmark:\n$!";
        }
        return \%yesterday;
}

sub _write_benchmark {
        my ($bench_dir, $host, $today, $user_uid, $user_gid) = @_;
        open BENCH, "> $bench_dir/$host.benchmark"
                or die "Error: Could not open $bench_dir/$host.benchma
+rk for writing:\n$!";
        for (sort keys %$today) {
                print BENCH "$_\n";
        }
        close BENCH
                or warn "Error: Could not close $bench_dir/$host.bench
+mark:\n$!";
        chown $user_uid, $user_gid, "$bench_dir/$host.benchmark";
        chmod 0640, "$bench_dir/$host.benchmark";
        return;
}

sub _print_report {
        my ($dirs, $today, $yesterday, $options, $script_mode, $host) 
+= @_;
        my $skip = join ('|', map { quotemeta } keys %$dirs);
        my $title = (split /:/, $options->{$script_mode})[2];
        my $title_count = length($title);
        my $host_count = length($host);
        my $new_count = ($host_count + $title_count + 30);
        my $old_count = ($host_count + $title_count + 34);
        print NEW "########## New $title on $host ##########\n\n";
        for (sort keys %$today) {
                print NEW "$_\n" unless (/^($skip)/) || exists $yester
+day->{$_};
        }
        print NEW "\n";
        print NEW "#" x $new_count;
        print OLD "########## Removed $title on $host ##########\n\n";
        for (sort keys %$yesterday) {
                print OLD "$_\n" unless (/^($skip)/) || exists $today-
+>{$_};
        }
        print OLD "\n";
        print OLD "#" x $old_count;
        return;
}
[download]

I didn't put the whole script because I figured that was a lot of information to read through as it is, but I think that should be enough to get the gist of what I am doing. Basically it reads all the files it finds into a today hash and all the files it found yesterday into a yesterday hash. And then compares the hashes and writes the differences into two different reports. Any suggestions on how I can optimize this?

Thanks,
Prime

In reply to Memory Question by PrimeLord

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.