in reply to Re: Help extracting pattern of data
in thread Help extracting pattern of data

No, the reason I thought that I need not decode it is because from the json output, All I need to do is to get all occurances of the fields that I mentioned above (for example say "latestTime": and get its value from the entire file.

If decoding it would be more appropriate I am ok with that too.

I tried something naive but could you please help improvise

my $content = <_DATA_> #which i extract from a file, so I would give t +he file handle my $out = $content =~ m/ "latestTime": (.*) , /; print $out;

I was thinking I could extract the data like this but I am wrong. Kindly help

Replies are listed 'Best First'.
Re^3: Help extracting pattern of data
by davido (Cardinal) on Jan 27, 2014 at 04:04 UTC

    The DATA handle can be read using: <DATA> ...the point being that it's not <_DATA_>, it's <DATA>. Then at the end of your script, it is not _DATA_ (with a single underscore on either side), it's __DATA__, with two underscores on each side. At first I thought maybe you just made a typo, but I'm seeing you do it repeatedly, so it's worth mentioning.

    As for decoding the JSON first, YES. It doesn't need to be a pattern matching problem (which is almost always much more difficult than people expect) when there are perfectly good JSON parsing modules that will return a structure that you can simply traverse, iterate over, or manipulate however you wish, easily.

    Here's how I would start:

    use strict; use warnings; use JSON; use Data::Dumper; my $j = JSON->new; my $structure = $j->decode( do { local $/ = undef; <DATA> } ); print Dumper $structure; __DATA__ JSON here.....

    Then after getting a look at how my data is structured, I could plan how I want to work with it.

    If JSON and JSON::XS seem to heavy for you, try JSON::Tiny, in which case: my $structure = JSON::Tiny->new->decode( ..... );


    Dave

      Thankyou Dave for correcting me. Appreciate it.

      I tried out and your structure after decoding looks like this:
      $VAR1 = { 'eNewConnec' => 0, 'peakConnec' => 0, 'eCompressed' => 15164826, 'iUncompressed' => 639536, 'bytesSeries' => [ [ '1390795661000', 1179, 4940, 1016, 3428 ], [ '1390792141000', 0, 0 ], [ '1390792121000', 0, 0 ], [ '1390792101000', 0, 0 ], [ '1390792081000', 0, 0 ] ], 'earliestTime' => '1390792081000', 'eClosedConn' => 0, 'latestTime' => '1390795661000', 'iNewConn' => 0, 'eUncompressed' => 23533344, 'iCompressed' => 228561 };

      My data is incremental so the same content with different values get appended to __DATA__ over time. I need to grep all occurances of say earliestTime from this structure and get its value in a variable.

      Thankyou Davio. I could extract the data from the structure now. But how can I iterate through this structure now. Assume that this structure will have multiple occurances of latestTime. And I want to print all the occurances I use this line to get the data:
      my $time = $structure->{latestTime};
        My data is incremental so the same content with different values get appended to __DATA__ over time.

        I'm guessing a bit here, but it sounds like you will wind up with an Array-of-Hashes structure (see perldsc). If so, and building on davido's JSON code, try something like (untested):

        my @fields = qw(latestTime eCompressed eUncompressed iCompressed iUnco +mpressed); for my $i (0 .. $#$structure) { my $hr_record = $structure->[$i]; print qq{record $i: }; for my $field (@fields) { print qq{$field is $hr_record->{$field}, }; } print qq{\n}; }
Re^3: Help extracting pattern of data
by AnomalousMonk (Archbishop) on Jan 27, 2014 at 04:13 UTC

    Unless the data file you're dealing with is (or is likely to become) quite large (many megabytes), I still think just decoding to a hash would be best.

    But if you're committed to regexes (and assuming the entire file content has been slurped into $content), try something like (untested):
        my @latestTimes = $content =~ m{ "latestTime \s* : \s* \K \d+ }xmsg;
    if you have Perl 5.10+ (for the  \K operator) or else
        my @latestTimes = $content =~ m{ "latestTime" \s* : \s* (\d+) }xmsg;

    Then just repeat for your other fields:  eCompressed eUncompressed etc.

    Update: But see davido's ++reply above.