vxp has asked for the wisdom of the Perl Monks concerning the following question:

I have a file which contains the output of df from 3 servers. It also lists a summary for each server, listing total/used/available space at the bottom of each df output (the example of this file will be provided for your convinience later on here) what im trying to do is probably best described by example when i run the script, i'd like it to parse that file for today's input and make a nice little list, saying "server such and such, total/used/available space is blah blah." and then for each server list the partitions, with how much space they take up.. now, what i dont understand how to do is this: i made a regular expression to find the server's name, and today's datestamp.. but how do i now parse it's "chunk" in this input file?? the code that i have so far, and the example of that file follows below, in the replies. please help/guide :)
  • Comment on couple of file content manipulation questions

Replies are listed 'Best First'.
Re: couple of file content manipulation questions
by halley (Prior) on May 23, 2003 at 15:10 UTC
    Here's a rough sketch of the approach I would take. Fill in the code to match the comments.
    # A hash can keep the interesting info for each server. # For every line of input, # If this line introduces a new server, # Create a new empty server structure in the hash. # If this line introduces a new partition, # Add a new partition structure in the current server. # If this line has interesting information, # Assert it into the current server and/or partition.
    When you're done, you have a breakdown of each server and its interesting information. An example hash is dumped below.
    $VAR1 = { 'server1' => { 'part1' => { total => 300, free => 100, used => 200 } }, 'server2' => { 'part1' => { total => 200, free => 100, used => 100 }, 'part2' => { total => 400, free => 100, used => 300 } } };

    --
    [ e d @ h a l l e y . c c ]

Re: couple of file content manipulation questions
by BrowserUk (Patriarch) on May 23, 2003 at 15:50 UTC

    When parsing files that contain sections like this, you can make life a lot easier by setting $/ (see $INPUT_RECORD_SEPARATOR) to a string that will allow you to read in a whole section at a time. This is considerably easier than trying to parse line by line and remember where you are.

    In this case, each section appears to be separated by two lines of underscores, so by setting $/="_\n_";, each read using <FILE> will grab an entire section at a time. You can then parse each chunk in one go by using the /s modifier on your regexes. As each server has two consequetive sections that are delimited in the same way, you need to skip the second section that you are not interested in by performing a read within the loop and discarding the results.

    Some code, that needs lots of work, to get you started

    Output

    D:\Perl\test>260369 sis4.snap.synapticcorp.com: Total:263270140 Used:198358572 Free: 6 +4911568 lemon.snap.synapticcorp.com: Total: 3678032 Used: 1827809 Free: +1808477

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
Re: couple of file content manipulation questions
by pzbagel (Chaplain) on May 23, 2003 at 15:48 UTC

    The output of df is fixed length fields. You can use pack/unpack to get at the data. Read the pack tutorial on how this works. Basically, you calculate the field widths and feed them to pack which splits up the string for you. Then all your loop needs to do is recognize header and footer lines and handle them as special cases. Everything else is simply unpack()ed. No messy regexes.

    Based on your data, I would do something like:

    1. read line of dashes
    2. read header line and parse with regex(unpack could work here)
    3. read lines till I saw /^Filesystem/
    4. read lines and send through unpack until a blank line
    5. process footer lines with regex or unpack
    6. read next line of dashes and back to the top

    I notice that your data may be coming from different sources with different field widths. This may just be the cutting and pasting. But if it is so, you can actually use the header line of your data(from each machine) with index() to calculate the field widths and feed those to pack.

    HTH

Re: couple of file content manipulation questions
by Coplan (Pilgrim) on May 23, 2003 at 15:26 UTC
    I'll only provide the methodology that I would use. I won't provide any code, as I don't have the means to test anything right now.

    I would first look at your file and look for any patterns. Here's a good start: each server apparently ends with a line starting "AVAILABLE" followed by a blank line then two break lines. You could use these breaklines to split the servers apart from each other. Then you could take the data from each server separately, and skip any breakline or any heading line (the "next if(//)" combo is good in loops for this purpose). Otherwise, each line looks to be a tab separated list. So you can use that to split each line and store the data how you see fit.

    As for the TOTAL, USED and AVAILABLE lines, that's something that I would set your loop up to search for. If it sees TOTAL on a line, grab the data associated with it, store it. Same goes for the Used line and the AVAILABLE line.

    One caveat. NEVER assume the same number of tabs in a tab separated list (or space separated, etc). Make sure you use the one-or-more flag in your regexp. Otherwise, that first line, which might have two tabs in there, might not parse correctly.

    Another Caveat: In some instances, it might be worthwhile to parse the heading lines and grab the headings as your hash keys. First of all, it keeps your data associated with the proper heading. Second of all, I see that on the sis8 server, the heading is different. This would cause problems if you didn't see this up front. You would likely need to nest some of your hashes in order to store the data in a logical way considering there are multiple lines of filesystems for each server.

    Hope that helps you think about the world of regexp and data file parsing. It's really a whole different frame of mind.

    --Coplan

Re: couple of file content manipulation questions
by arthas (Hermit) on May 23, 2003 at 15:50 UTC
    Hi!

    I took a few minutes to create the following code. Although a bit confused ( ;-) ), it parses all you need and prints in out some fomat I decided. Instead of printing, you can store the data in an hash or other structure you might like and do something later with that.

    A couple of caveats:
    • The regular expressions I inserted only work for the first server. Since the other two are a bit different in structure, it has to be modified to get also their data.
    • This program suppose the file has a fairley fixed structure, no error handling sorry. ;-)

    Anyhow, I hope it can be useful as a hint for you.

    Michele.
Re: couple of file content manipulation questions
by Anonymous Monk on May 23, 2003 at 14:53 UTC
Re: couple of file content manipulation questions
by Anonymous Monk on May 23, 2003 at 14:55 UTC
    and this is the code that i got so far :)
Re: couple of file content manipulation questions
by hacker (Priest) on May 24, 2003 at 13:38 UTC
    Would Filesys::Df, or Filesys::DiskFree help you out here? A small working example I just whipped up (with a bit of "sauce" for commifying the output):
    use strict; use Filesys::Df; my $fobj = df("/"); print commify($fobj->{bavail} * 1024) . " bytes of " . commify($fobj->{blocks} * 1024) . " bytes free\n"; sub commify { my $text = reverse $_[0]; $text =~ s/(\d{3})(?=\d)(?!\d*\.)/$1,/g; return scalar reverse $text; }

    That being said, I hope you're using open() or system() in list-mode, and not using backticks to get this info, if you're not using one of these modules. Something like:

    use strict; my $fs = join(' ', @_); open(DF, "df $filesystems |"); my @df = <DF>; close(DF); chomp(@df);