kryptonite has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I need to parse a file a little differently than usual and am a little flumoxed as to an approach. My file has information grouped in sections by database output. Each section is defined by an introductory "DATABASE: <DBASENAME>" line followed by that database's output.

I'd like to be able to parse each section in order, then add a line just below the DATABASE: line that indicates some piece of information found in that section like "WIDGETS FOUND: 13," after which I would proceed to the next section. Does anyone have any suggestions at all as to how I might approach this?

TIA!

Replies are listed 'Best First'.
Re: File parsing query
by Joost (Canon) on Feb 16, 2005 at 20:21 UTC
      Thanks for the quick reply. I have a similar start to that, but my problem lies in doing this: once I find the "DATABASE:" (the current one) I have to track widgets until the next occurrence of "DATABASE:" and then post the quantity in the line below the current "DATABASE:" line. That's where I'm stuck. I need to pause the parse and go back, then go forward again. I hope this makes sense...
      I think the OP wanted to output the count of widgets at the top of the database section which was counted. That is, something more like this:
      my $widgetsFound=0; my $databaseName; my $databaseSection; while( <> ) { if( /^DATABASE:/ ) { if( $databaseName ) { print $databaseName; print "Widgets found: $widgetsFound\n"; print $databaseSection; } $databaseName = $_; $databaseSection = ""; $widgetsFound = 0; } else { $databaseSection .= $_; } if( $something ) { $widgetsFound++; } } # print the last database section that was parsed if( $databaseName ) { print $databaseName; print "Widgets found: $widgetsFound\n"; print $databaseSection; }
      Updated: to correct errors noticed by Animator (who answered the original question first as well :) ).

        Some important notes:

        • $widgetsFound is not intialised at the start, which can lead to an undefined value that is printed (if there are no widget in the first database-section)
        • The printing of the last database section is always done, even if the file would be empty, which will lead to three undefined values being printed.

        An unimportant note: I posted similar code as a reply to the OP's node before you did :)

Re: File parsing query
by Animator (Hermit) on Feb 16, 2005 at 20:40 UTC

    This should work (I think: untested code)

    my $modified_text = ""; my $counter = 0; my $db_line = ""; my $text; open (FH, "<", "some_file"); while (<FH>) { if (m/^\s*DATABASE: /i) { # This will only match if 'DATABASE:' ap +pears on the start of the line, or is prefixed with whitespace. /i si +nce the case of the word probably doesn't matter. if (defined $db_line) { my $insert = "WIDGETS FOUND: $counter\n"; # The text to insert + right after the 'database:'-line. $modified_data .= $db_line . $insert . $text; } $counter = 0; $text = ""; $db_line = $_; next; } # When should the counter be increased? if (m/WIDGET/) { $counter++: } $text .= $_; } close (FH); open (FH, ">", "some_file"); print FH $modified_text; if (defined $db_line) { my $insert = "WIDGETS FOUND: $counter\n"; # The text to insert rig +ht after the last 'database:'-line. print FH $db_line . $insert . $text; } close (FH);

    Update: this approach will store all the data in the memory, if the file is too large to do that, then you should create a temp file, and change the $modified_data .= ...; line to print TEMP ...;

    And then you can rename/remove the old file and rename/copy the temp file.

    Update2: forgot to print the last database-section...

Re: File parsing query
by injunjoel (Priest) on Feb 16, 2005 at 20:44 UTC
    Greetings all,
    Right off the bat I would think of using seek and the $. special variable (current line number: see Perl Special Variables Quick Reference) to keep track of where you have been and where you want to print to.

    -InjunJoel

    Update!
    Though it seemed simple at first, upon actually attempting to code something using seek and $. I agree with Animator. What I got was 40+ lines of buggy code before I finally stopped. Disregard my posting above.
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo

      First note: adding in the middle of a file is impossible (overwritting is possible).

      Second note: In my post above I did not use $. (I did think about it though), because the OP speaks about widgets, so I guess that a widget is a special kind of line, but I can be wrong ofc...

      Seek and tell in this case would be hard to use.

      It would be possible to use it when the following steps are followed:

      • A new temp file is created,
      • The original file is opened for reading,
      • File is read line by line
      • If line matches the Database-string:
        • Print to temp-file
        • Print a fixed amount of (empty) bytes to the file, store the starting address of this data in a var (by using tell)
        • Go to the previous point where empty-bytes where added (with seek) and overwrite them with the correct data.
        • Jump back to the place where we were. (using seek)
      • If it didn't match, check for widget etc and print to temp file.

      Some remarks by this story: how much bytes should be reserved for the string?

      • preferably it would be exactly the number of bytes that are needed. This could be accomplished by storing the number as an integer instead of ASCII for example,
      • If too few bytes are added then vital information is overwritten,
      • If too much bytes are added then the file is filled with empty data.

      Using seek and tell would be possible if and only if a fixed number of bytes are added, so you can open a new (temp), write data to it while you read, add some empty bytes (which will be overriden at a later point) and do the ma

Re: File parsing query
by tphyahoo (Vicar) on Feb 17, 2005 at 09:16 UTC
    Maybe P::RD (Parse::RecDescent). Unfortunately, this is hard to learn how to use (I am a total beginner myself) but it might be a case where if you learn it once doing this type of thing in the future becomes easier.

    If you post some sample input, and the desired sample output, some P::RD master may be able to help you out.