Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Memory Question

by PrimeLord (Pilgrim)
on Feb 21, 2003 at 18:44 UTC ( [id://237532]=perlquestion: print w/replies, xml ) Need Help??

PrimeLord has asked for the wisdom of the Perl Monks concerning the following question:

Once again monks I have come to you for some advice. I have written a script that will search a unix file system for files or directories with user defined permissions. For example it will search for world readable files. It then reads a baseline file of world readable files it found the day before and produces a report on what has changed between the days. I am running into some memory issues though.

The script runs a find for the files it is looking for and then reads them into a hash. It also reads the contents of the baseline file into a hash and then does some cross comparisons of the two to generate the report. The problem I am having is the list of files can be as big as 19 meg. So when it reads in 19 meg of current data and another 19 meg of yesterdays data and then tries to compare them it takes forever to run and bogs the system down some. Is there a more efficent way for me to do this. Here is some of the code.
sub _todays_files { my ($host, $script_mode, $options) = @_; my %today; my $search_files = (split /:/, $options->{$script_mode})[0]; if ($host =~ /hosta/) { open IN, "find / $search_files -print |" or die "Error: Could not run find command:\n$! +"; while (<IN>) { chomp; $today{$_}++; } close IN or warn "Error: Could not close find command:\ +n$!"; } else { open IN, "find / -path '/usr/home' -prune -o $search_f +iles -print |" or die "Error: Could not run find command:\n$! +"; while (<IN>) { chomp; $today{$_}++; } close IN or warn "Error: Could not close find command:\ +n$!"; } return \%today; } sub _read_benchmark { my ($bench_dir, $host) = @_; my %yesterday; if (-e "$bench_dir/$host.benchmark") { open BENCH, "$bench_dir/$host.benchmark" or die "Error: Could not open $bench_dir/$host +.benchmark:\n$!"; while (<BENCH>) { chomp; $yesterday{$_}++; } close BENCH or warn "Error: Could not close $bench_dir/$ho +st.benchmark:\n$!"; } return \%yesterday; } sub _write_benchmark { my ($bench_dir, $host, $today, $user_uid, $user_gid) = @_; open BENCH, "> $bench_dir/$host.benchmark" or die "Error: Could not open $bench_dir/$host.benchma +rk for writing:\n$!"; for (sort keys %$today) { print BENCH "$_\n"; } close BENCH or warn "Error: Could not close $bench_dir/$host.bench +mark:\n$!"; chown $user_uid, $user_gid, "$bench_dir/$host.benchmark"; chmod 0640, "$bench_dir/$host.benchmark"; return; } sub _print_report { my ($dirs, $today, $yesterday, $options, $script_mode, $host) += @_; my $skip = join ('|', map { quotemeta } keys %$dirs); my $title = (split /:/, $options->{$script_mode})[2]; my $title_count = length($title); my $host_count = length($host); my $new_count = ($host_count + $title_count + 30); my $old_count = ($host_count + $title_count + 34); print NEW "########## New $title on $host ##########\n\n"; for (sort keys %$today) { print NEW "$_\n" unless (/^($skip)/) || exists $yester +day->{$_}; } print NEW "\n"; print NEW "#" x $new_count; print OLD "########## Removed $title on $host ##########\n\n"; for (sort keys %$yesterday) { print OLD "$_\n" unless (/^($skip)/) || exists $today- +>{$_}; } print OLD "\n"; print OLD "#" x $old_count; return; }
I didn't put the whole script because I figured that was a lot of information to read through as it is, but I think that should be enough to get the gist of what I am doing. Basically it reads all the files it finds into a today hash and all the files it found yesterday into a yesterday hash. And then compares the hashes and writes the differences into two different reports. Any suggestions on how I can optimize this?

Thanks,
Prime

Replies are listed 'Best First'.
Re: Memory Question
by dragonchild (Archbishop) on Feb 21, 2003 at 19:05 UTC
    Would it just be easier to use unix commands to do this? find, sort, grep ... those were unix commands before they were co-opted as Perl keywords. To me, this sounds like a job for the shell, not Perl.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      While this could be done in the shell, I have found personnally that Perl get things done slightly faster and cleaner than spawning massive chains of shell commands. Its also easier to collect all the info and compare / process it in perl than with shell utils, but then again I am not shell guru by any means.

      Someone pointed out File::Find which will get you rolling, and will be slightly kinder on your system than a 'find' would be. Also there are some decent chapters in the panther book (Advanced Perl Programming) that deal with efficiently comparing 2 hashes, and pulling out the differences between them.

      Best of luck

      /* And the Creator, against his better judgement, wrote man.c */
        While I LOVE Perl, I disagree in this case.

        If the only thing needing to be checked is permissions, the following would suffice:

        find / -exec ls -l {} \; > /tmp/pass1 find / -exec ls -l {} \; > /tmp/pass2 diff /tmp/pass1 /tmp/pass2
        Of course you would tailor the find command as PrimeLord indicated (only get specific files with permissions).

        In my experience, this specific task is faster/easier/efficient using Unix commands.

        Cheers - L~R

Re: Memory Question
by derby (Abbot) on Feb 21, 2003 at 19:39 UTC
    Instead of piping out to find and saving that to a hash, you could use File::Find and utilizing the wanted function, compare the found file with the pre-loaded yesterdays' finds.

    -derby

Re: Memory Question
by Thelonius (Priest) on Feb 21, 2003 at 20:57 UTC
    As derby points out, you don't need both hashes in memory at once. Further than that, you could use a tied hash.

    Another approach is to keep the files sorted and then use the comm command to find the differences. E.g.

    comm -23 old new >oldonly comm -13 old new >newonly

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://237532]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-20 00:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found