in reply to Re: Parsing a file and finding the dependencies in it
in thread Parsing a file and finding the dependencies in it

This works but I have another question: If there is another field, say "Desc" with a value that is space delimited, how can I put the entire value into a variable.
Example:
[ ID: 456 Desc: This is a test job Start: /complete/success.1 /complete/success.2 /tmp/file.3 Done: /complete/success.3 /complete/success.4 ]
The code currently splits all by space so only the first word is in the value.
E.g.
my ($desc) = @{$record{'Desc:'}};
$desc will only contain the word "This " and not the full value of "This is a test job".
What is the best way to solve this?

Replies are listed 'Best First'.
Re^3: Parsing a file and finding the dependencies in it
by Marshall (Canon) on Jul 06, 2011 at 19:16 UTC
    There is nothing wrong with making Desc: a special case for the splitting. I show some code below...

    In this special situation, you can just test for /^Desc:/. The technique is to limit the number of things returned from the split, in this case 2 things. Doing that requires that we take care of one more detail, a chomp() is needed.

    When we let split() do its default thing, a chomp() is not needed because the trailing \n will be removed (default split is on any sequence of the 5 whitespace characters (space,\n,\f,\r,\t). If we tell split() to stop working after it has 2 things, then we have to do manually what it would have done to the last thing.

    I set up %record so that it is a Hash of Array, each value is a reference to an anonymous array of data. That is true even for a single value like the id number. "Same-ness" is a good thing in programming. So, I would do the same for the description string.

    Then the question of so what do you do with this description once the record is complete? You could say put another dimension on the hash which has id's as the key. However, there is something to be said for keeping things simple. You could just make another hash that is keyed on id's with the string as the value. Some purists might shudder in horror, but again simplicity has virtues!

    # ........ snip if (my $num = /\[/.../\]/) { if (/^Desc:/) { chomp; my ($desc, $string) = split(/\s+/,$_,2); $record{$desc} = [$string]; next; } my ($tag, @values) = split; @{$record{$tag}} = @values; #........ snip OR....perhaps... if (/^Desc:/) { chomp; my ($desc, $string) = split(/\s+/,$_,2); $record{$desc} = [$string]; # same as @{$record{$desc}} = ($s +tring); } else { my ($tag, @values) = split; @{$record{$tag}} = @values; #....snip...
      I didn't realize or check to see if you could limit what is returned from split. Thanks for that, that works perfectly fine.
      The "flip-flop" implementation that Marshall referred to was new to me as well, so much to learn!

      Can anyone help to explain what this line does?

      print map{" $_\n"}grep{!$seen{$_}++}priorFiles($file);

      I've read the perldoc on the map function and I think the grep{!$seen{$_}++}priorFiles($file) portion extracts unique elements and the priorFiles subroutine returns the "Start" files? Could someone explain it please?

      Also, I have been trying to figure out how I would be able to tell if an "ID" or "Desc" depends on another "ID" or "Desc" such as showing ID 456 depends on ID 423 which basically entails looking up the input or "Start" files to see where (which "ID") they came from
        Yes, grep{!$seen{$_}++} just removes duplicates from the list. Perl grep is filtering operation and more powerful than command line grep. It passes the input to the output if the last line of the grep evaluates to "true".

        This grep code checks the "truthness" of the seen hash entry for $_. The ! makes it a "not". So this is true if we have not seen a value before. The ++ is a post increment which happens after we've tested for existence. If the key does not already exist, Perl creates it, and allows the undef initial value to be used in the increment. The resulting value is 1 (0+1). If the key already exists it just gets incremented.

        The list returned from priorFiles is every possible file that could have affected a particular output file. It contains dupes because some of the input files will share a at least a partial ancestry. priorFiles() is a bit tricky as it calls itself. This is a recursive function and may a bit mind-bending if you haven't seen one before.

        I think you are on the way now. Play with the code, insert prints to watch what it does.

Re^3: Parsing a file and finding the dependencies in it
by Anonymous Monk on Jul 06, 2011 at 15:10 UTC

    It is a custom format.

    I've heard that often, but I'll take your word for it :)

    The problem is not parsing the file, but finding the dependencies as explained above. Thanks.

    I hate to be contrary :) but yes, the problem is parsing.

    First you build a data structure (parse), then you walk the data structure looking for dependencies.

    If parsing wasn't a problem, surely you would have shared your parser, or at least, the data structure it creates?

    Your response to Marshal's node Re: Parsing a file and finding the dependencies in it firmly confirms that parsing was indeed a problem.

      Yes, I agree it is a parsing issue and that I indeed was incorrect. I've been using Marshall's parser to check the dependencies but just need to get it to pull out the full value field as I noted in http://www.perlmonks.org/?node_id=912984