spikeinc has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a set of json data from which I want to extract a specific pattern and print it as a table based on time. In the _DATA_ below (which is json), I wouldnt need to decode json as the information that I need is already in a extractable format, but I need to export every occurance of the below information from the json and sort it based on the latest time. I can use arrays to feed each value and sort it, but how can I use regular expression to loop through this data, and extract the below information for every occurance. Kindly help.

Latest Time = 1390789741000 Ecompressed = 19955876 Euncompressed = 29837736 ICompressed = 248395 IUncompressed = 656440 _DATA_ {"ingressClosedConnections":1,"bytesSeries":[[1390789541000,1163,4588, +972,3076],[1390789521000,282,1088,261,1280],[1390789501000,1177,4636, +1005,3124],[1390789481000,1117,4492,937,2708],[1390789461000,409,1632 +,366,1472],[1390789441000,1182,4636,989,3124],[1390789421000,1104,414 +0,945,2980],[1390789401000,389,1584,325,1168],[1390786321000,572,1968 +,572,2260],[1390786301000,1118,4588,957,3332],[1390786281000,902,3804 +,681,1936],[1390786261000,605,2016,622,2308],[1390786241000,1179,4636 +,976,3380],[1390786221000,980,4252,726,2032],[1390786201000,497,1824, +506,1844],[1390786181000,1101,4140,926,3236],[1390786161000,1043,4300 +,766,2080]],"eCompressed":19955876,"peakConnections":1,"iNewConnectio +ns":1,"eUncompressed":29837736,"iUncompressed":656440,"iCompressed":2 +48395,"latestTime":1390789741000,"egressClosedConnections":1,"connect +ionSeries":[[1390789741000,0,0],[1390789721000,0,0],[1390789701000,0, +0],[1390789681000,0,0],[1390789661000,0,0],[1390789641000,0,0],[13907 +89621000,0,0],[1390789601000,0,0],[1390789581000,0,0],[1390789561000, +0,0],[1390789541000,0,0],[1390789521000,0,0],[1390789501000,0,0],[139 +0789481000,0,0],[1390789461000,0,0],[1390789441000,0,0],[139078942100 +0,0,0],[1390789401000,0,0],[1390789381000,0,0],[1390786321000,1,0],[1 +390786301000,1,0],[1390786281000,1,0],[1390786261000,1,0],[1390786241 +000,1,0],[1390786221000,1,0],[1390786201000,1,0],[1390786181000,1,0] ,"eCompressed":19955876,"peakConnections":1,"iNewConnections":1,"eUnco +mpressed":29837736,"iUncompressed":656440,"iCompressed":248395,"lates +tTime":1390789741000,"egressClosedConnections":1,"connectionSeries":[ +[1390789741000,0,0],[1390789721000,0,0],[1390789701000,0,0],[13907896 +81000,0,0],[1390789661000,0,0], ..... ,"eCompressed":19955876,"peakConnections":1,"iNewConnections":1,"eUnco +mpressed":29837736,"iUncompressed":656440,"iCompressed":248395,"lates +tTime":1390789741000,"egressClosedConnections":1,"connectionSeries":[ +[1390789741000,0,0],[1390789721000,0,0],[1390789701000,0,0],[13907896 +81000,0,0],[1390789661000,0,0], .... ,"eCompressed":19955876,"peakConnections":1,"iNewConnections":1,"eUnco +mpressed":29837736,"iUncompressed":656440,"iCompressed":248395,"lates +tTime":1390789741000,"egressClosedConnections":1,"connectionSeries":[ +[1390789741000,0,0],[1390789721000,0,0],[1390789701000,0,0],[13907896 +81000,0,0],[1390789661000,0,0],

Replies are listed 'Best First'.
Re: Help extracting pattern of data
by AnomalousMonk (Archbishop) on Jan 27, 2014 at 03:36 UTC
    ... how can I use regular expression ...

    I don't understand why you wouldn't just decode what looks like standard JSON data to a hash and loop through the hash for the needed data. The use of regexes, although possible, seems inappropriate. Is there a reason not to decode the data?

      No, the reason I thought that I need not decode it is because from the json output, All I need to do is to get all occurances of the fields that I mentioned above (for example say "latestTime": and get its value from the entire file.

      If decoding it would be more appropriate I am ok with that too.

      I tried something naive but could you please help improvise

      my $content = <_DATA_> #which i extract from a file, so I would give t +he file handle my $out = $content =~ m/ "latestTime": (.*) , /; print $out;

      I was thinking I could extract the data like this but I am wrong. Kindly help

        The DATA handle can be read using: <DATA> ...the point being that it's not <_DATA_>, it's <DATA>. Then at the end of your script, it is not _DATA_ (with a single underscore on either side), it's __DATA__, with two underscores on each side. At first I thought maybe you just made a typo, but I'm seeing you do it repeatedly, so it's worth mentioning.

        As for decoding the JSON first, YES. It doesn't need to be a pattern matching problem (which is almost always much more difficult than people expect) when there are perfectly good JSON parsing modules that will return a structure that you can simply traverse, iterate over, or manipulate however you wish, easily.

        Here's how I would start:

        use strict; use warnings; use JSON; use Data::Dumper; my $j = JSON->new; my $structure = $j->decode( do { local $/ = undef; <DATA> } ); print Dumper $structure; __DATA__ JSON here.....

        Then after getting a look at how my data is structured, I could plan how I want to work with it.

        If JSON and JSON::XS seem to heavy for you, try JSON::Tiny, in which case: my $structure = JSON::Tiny->new->decode( ..... );


        Dave

        Unless the data file you're dealing with is (or is likely to become) quite large (many megabytes), I still think just decoding to a hash would be best.

        But if you're committed to regexes (and assuming the entire file content has been slurped into $content), try something like (untested):
            my @latestTimes = $content =~ m{ "latestTime \s* : \s* \K \d+ }xmsg;
        if you have Perl 5.10+ (for the  \K operator) or else
            my @latestTimes = $content =~ m{ "latestTime" \s* : \s* (\d+) }xmsg;

        Then just repeat for your other fields:  eCompressed eUncompressed etc.

        Update: But see davido's ++reply above.