Once again monks I have come to you for some advice. I have written a script that will search a unix file system for files or directories with user defined permissions. For example it will search for world readable files. It then reads a baseline file of world readable files it found the day before and produces a report on what has changed between the days. I am running into some memory issues though.
The script runs a find for the files it is looking for and then reads them into a hash. It also reads the contents of the baseline file into a hash and then does some cross comparisons of the two to generate the report. The problem I am having is the list of files can be as big as 19 meg. So when it reads in 19 meg of current data and another 19 meg of yesterdays data and then tries to compare them it takes forever to run and bogs the system down some. Is there a more efficent way for me to do this. Here is some of the code.
sub _todays_files {
my ($host, $script_mode, $options) = @_;
my %today;
my $search_files = (split /:/, $options->{$script_mode})[0];
if ($host =~ /hosta/) {
open IN, "find / $search_files -print |"
or die "Error: Could not run find command:\n$!
+";
while (<IN>) {
chomp;
$today{$_}++;
}
close IN
or warn "Error: Could not close find command:\
+n$!";
} else {
open IN, "find / -path '/usr/home' -prune -o $search_f
+iles -print |"
or die "Error: Could not run find command:\n$!
+";
while (<IN>) {
chomp;
$today{$_}++;
}
close IN
or warn "Error: Could not close find command:\
+n$!";
}
return \%today;
}
sub _read_benchmark {
my ($bench_dir, $host) = @_;
my %yesterday;
if (-e "$bench_dir/$host.benchmark") {
open BENCH, "$bench_dir/$host.benchmark"
or die "Error: Could not open $bench_dir/$host
+.benchmark:\n$!";
while (<BENCH>) {
chomp;
$yesterday{$_}++;
}
close BENCH
or warn "Error: Could not close $bench_dir/$ho
+st.benchmark:\n$!";
}
return \%yesterday;
}
sub _write_benchmark {
my ($bench_dir, $host, $today, $user_uid, $user_gid) = @_;
open BENCH, "> $bench_dir/$host.benchmark"
or die "Error: Could not open $bench_dir/$host.benchma
+rk for writing:\n$!";
for (sort keys %$today) {
print BENCH "$_\n";
}
close BENCH
or warn "Error: Could not close $bench_dir/$host.bench
+mark:\n$!";
chown $user_uid, $user_gid, "$bench_dir/$host.benchmark";
chmod 0640, "$bench_dir/$host.benchmark";
return;
}
sub _print_report {
my ($dirs, $today, $yesterday, $options, $script_mode, $host)
+= @_;
my $skip = join ('|', map { quotemeta } keys %$dirs);
my $title = (split /:/, $options->{$script_mode})[2];
my $title_count = length($title);
my $host_count = length($host);
my $new_count = ($host_count + $title_count + 30);
my $old_count = ($host_count + $title_count + 34);
print NEW "########## New $title on $host ##########\n\n";
for (sort keys %$today) {
print NEW "$_\n" unless (/^($skip)/) || exists $yester
+day->{$_};
}
print NEW "\n";
print NEW "#" x $new_count;
print OLD "########## Removed $title on $host ##########\n\n";
for (sort keys %$yesterday) {
print OLD "$_\n" unless (/^($skip)/) || exists $today-
+>{$_};
}
print OLD "\n";
print OLD "#" x $old_count;
return;
}
I didn't put the whole script because I figured that was a lot of information to read through as it is, but I think that should be enough to get the gist of what I am doing. Basically it reads all the files it finds into a today hash and all the files it found yesterday into a yesterday hash. And then compares the hashes and writes the differences into two different reports. Any suggestions on how I can optimize this?
Thanks,
Prime