in reply to Help extracting pattern of data

... how can I use regular expression ...

I don't understand why you wouldn't just decode what looks like standard JSON data to a hash and loop through the hash for the needed data. The use of regexes, although possible, seems inappropriate. Is there a reason not to decode the data?

Replies are listed 'Best First'.
Re^2: Help extracting pattern of data
by spikeinc (Acolyte) on Jan 27, 2014 at 03:52 UTC

    No, the reason I thought that I need not decode it is because from the json output, All I need to do is to get all occurances of the fields that I mentioned above (for example say "latestTime": and get its value from the entire file.

    If decoding it would be more appropriate I am ok with that too.

    I tried something naive but could you please help improvise

    my $content = <_DATA_> #which i extract from a file, so I would give t +he file handle my $out = $content =~ m/ "latestTime": (.*) , /; print $out;

    I was thinking I could extract the data like this but I am wrong. Kindly help

      The DATA handle can be read using: <DATA> ...the point being that it's not <_DATA_>, it's <DATA>. Then at the end of your script, it is not _DATA_ (with a single underscore on either side), it's __DATA__, with two underscores on each side. At first I thought maybe you just made a typo, but I'm seeing you do it repeatedly, so it's worth mentioning.

      As for decoding the JSON first, YES. It doesn't need to be a pattern matching problem (which is almost always much more difficult than people expect) when there are perfectly good JSON parsing modules that will return a structure that you can simply traverse, iterate over, or manipulate however you wish, easily.

      Here's how I would start:

      use strict; use warnings; use JSON; use Data::Dumper; my $j = JSON->new; my $structure = $j->decode( do { local $/ = undef; <DATA> } ); print Dumper $structure; __DATA__ JSON here.....

      Then after getting a look at how my data is structured, I could plan how I want to work with it.

      If JSON and JSON::XS seem to heavy for you, try JSON::Tiny, in which case: my $structure = JSON::Tiny->new->decode( ..... );


      Dave

        Thankyou Dave for correcting me. Appreciate it.

        I tried out and your structure after decoding looks like this:
        $VAR1 = { 'eNewConnec' => 0, 'peakConnec' => 0, 'eCompressed' => 15164826, 'iUncompressed' => 639536, 'bytesSeries' => [ [ '1390795661000', 1179, 4940, 1016, 3428 ], [ '1390792141000', 0, 0 ], [ '1390792121000', 0, 0 ], [ '1390792101000', 0, 0 ], [ '1390792081000', 0, 0 ] ], 'earliestTime' => '1390792081000', 'eClosedConn' => 0, 'latestTime' => '1390795661000', 'iNewConn' => 0, 'eUncompressed' => 23533344, 'iCompressed' => 228561 };

        My data is incremental so the same content with different values get appended to __DATA__ over time. I need to grep all occurances of say earliestTime from this structure and get its value in a variable.

        Thankyou Davio. I could extract the data from the structure now. But how can I iterate through this structure now. Assume that this structure will have multiple occurances of latestTime. And I want to print all the occurances I use this line to get the data:
        my $time = $structure->{latestTime};

      Unless the data file you're dealing with is (or is likely to become) quite large (many megabytes), I still think just decoding to a hash would be best.

      But if you're committed to regexes (and assuming the entire file content has been slurped into $content), try something like (untested):
          my @latestTimes = $content =~ m{ "latestTime \s* : \s* \K \d+ }xmsg;
      if you have Perl 5.10+ (for the  \K operator) or else
          my @latestTimes = $content =~ m{ "latestTime" \s* : \s* (\d+) }xmsg;

      Then just repeat for your other fields:  eCompressed eUncompressed etc.

      Update: But see davido's ++reply above.