Parsing file in Perl post processing

gbwien has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing file in Perl post processing by NetWallah (Canon) on Sep 08, 2015 at 22:38 UTC
A perl programmer would prefer a HASH to represent the parsed data, rather than an array. Assuming that this is what you want, here is a one-liner to get you started: `>perl -E "my %h = map {split ':',$_} split /\s+/, $ARGV[0]; say qq\|$_\t$h{$_}\n\| for sort keys %h" "THREAD_ID:1bf1d698 CDR_TY 40815144127 DEL_TIME:240815144127 OA_ADDR:5.0.OTSDC PRE_TRANS_OA +:5.0.OTSDC DA_ADDR:1.1.966555696176 PRE_TRANS_DA:1.1.966555696176 +"` [download] The data does have some inconsistencies that are not handled by the regex .. this is just to get you started. You will need to develop the regex to handle the pathological data. Update:If you run into trouble parsing the pathalogical data, please post the code you tried here, and explain your problems. Monks here will gladly explain and help correct code, provided you display some effort. Software efficiency halves every 18 months, thus compensating for Moore's Law.	[reply] [d/l]
Re: Parsing file in Perl post processing by Laurent_R (Canon) on Sep 09, 2015 at 08:11 UTC
The immediate idea would be to split the data on spaces, but that does not work entirely because your two last "fields" have embedded spaces: `SEG_NUM:1 of 1 DLV_ATT:0 END_POINT:ESME FINAL_STATE:DELIVERED REG_DEL:1` [download] So I would probably try to first process these two last fields with a regular expression, something like: `@endfields = /(SEG_NUM.+?)\s+(END_POINT.+?)/;` [download] remove them from the string and then use something like `split /\s+/` on the rest of the string, and finally to reassemble the array in the proper order. Update: I did not originally noticed, but it appears that at least two other fields have embedded spaces: `DEST_IDNT:Syniverse A2P I_ERR:0.0 PPS_ID: PPS_PROFILE:AO Submission - OA charged` [download] So splitting on spaces becomes harder to use, at least for about the last half of the original string. Although I don't like the idea too much, perhaps a long regex with each field key is the only solution, at least for the eight fields or so.	[reply] [d/l] [select]
Re: Parsing file in Perl post processing by GotToBTru (Prior) on Sep 09, 2015 at 13:31 UTC
I am thinking there are no embedded spaces in the field names by the otherwise consistent use of underscore. But lines 12 and 13 of the example output do confuse things. Can you clarify? This ~~almost~~ works ~~(can't get value of last key)~~: use strict; use warnings; use Data::Dumper; my $string = 'THREAD_ID:1bf1d698 CDR_TYPE:AO SUB_TIME:240815144127 + DEL_TIME:240815144127 OA_ADDR:5.0.OTSDC PRE_TRANS_OA:5.0.OTSDC + DA_ADDR:1.1.966555696176 PRE_TRANS_DA:1.1.966555696176 ORIG_LOC +N:10.100.80.7/7220 ORIG_IDNT:OTS A2P DEST_LOCN:173.209.195 +.44/8341 DEST_IDNT:Syniverse A2P I_ERR:0.0 PPS_ID: PPS_PROFIL +E:AO Submission - OA charged PPS_ERR:1.0 O_ERR:0.0 SILO: + MSG_LEN:22 SEG_NUM:1 of 1 DLV_ATT:0 END_POINT:ESME FINA +L_STATE:DELIVERED REG_DEL:1'; my (@keys) = ($string =~ m/([A-Z_]+):/g); my $z = qr{(?:[A-Z_]+:\|$)}; my %hash = map { $_, ($string =~ m/$_:(.+?)\s$z/)} @keys; print Dumper(\%hash); [download] Output: $VAR1 = { 'PPS_ID' => ' ', 'THREAD_ID' => '1bf1d698 ', 'DEST_IDNT' => 'Syniverse A2P ', 'CDR_TYPE' => 'AO ', 'ORIG_LOCN' => '10.100.80.7/7220 ', 'REG_DEL' => '1', 'SILO' => ' ', 'DEST_LOCN' => '173.209.195.44/8341 ', 'O_ERR' => '0.0 ', 'OA_ADDR' => '5.0.OTSDC ', 'PRE_TRANS_DA' => '1.1.966555696176 ', 'PPS_PROFILE' => 'AO Submission - OA charged ', 'I_ERR' => '0.0 ', 'DLV_ATT' => '0 ', 'ORIG_IDNT' => 'OTS A2P ', 'DA_ADDR' => '1.1.966555696176 ', 'MSG_LEN' => '22 ', 'FINAL_STATE' => 'DELIVERED ', 'SEG_NUM' => '1 of 1 ', 'SUB_TIME' => '240815144127 ', 'DEL_TIME' => '240815144127 ', 'PPS_ERR' => '1.0 ', 'END_POINT' => 'ESME ', 'PRE_TRANS_OA' => '5.0.OTSDC ' }; [download] Update: with help of MidLifeXis, corrected regex in map to work even for last key:value pair in list. Update 2: changed final \s in map to \s. Thanks to NetWallah and poj. Dum Spiro Spero	[reply] [d/l] [select]
Re^2: Parsing file in Perl post processing by NetWallah (Canon) on Sep 09, 2015 at 17:23 UTC
Great solution (++). adding a '+' to the regex eliminates trailing spaces in the values: `# = my %hash = map { $_, ($string =~ m/$_:(.+?)\s+$z/)} @keys; # =` [download] Software efficiency halves every 18 months, thus compensating for Moore's Law.	[reply] [d/l]
Re^2: Parsing file in Perl post processing by gbwien (Sexton) on Sep 12, 2015 at 22:51 UTC
Sorry with my limited exposure to perl I am trying to understand what you are doing in these lines of code `my $z = qr{(?:[A-Z_]+:\|$)}; my %hash = map { $_, ($string =~ m/$_:(.+?)\s*$z/)} @keys` [download] How does qr work could you please explain what you are doing? How is the hash created, I don't understand map and $_, and the $string part Thanks Tom	[reply] [d/l]
Re^3: Parsing file in Perl post processing by GotToBTru (Prior) on Sep 13, 2015 at 09:15 UTC
By using qr{} I am telling Perl the string inside will be used in a regex. See http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators. The most common reason to do this is to save time when you are using the same regex over and over; I use it because it often seems to make sure Perl interprets the regex in the way I expect. map {block} @array returns an array created by executing {block} once for each element of @array, each time assigning one value to $_. `@a=(1,2,3); @a_plus_1 = map { $_ + 1 } @a;` [download] The block, which in this example is $_ + 1, will be execute 3 times, once for each value in @a. It will put the results also in an array, so the values in @a_plus_1 will be (2,3,4). In my solution, I have the map return two values separated by a comma. This is one way to define a hash. You can see this using the debugger: perl -d -e 1 Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 1 DB<1> %h=(1,2,3,4) DB<2> print $h{1} 2 [download] Dum Spiro Spero	[reply] [d/l] [select]
Re^4: Parsing file in Perl post processing by gbwien (Sexton) on Sep 14, 2015 at 14:11 UTC
Re^5: Parsing file in Perl post processing by GotToBTru (Prior) on Sep 14, 2015 at 14:52 UTC
Some notes below your chosen depth have not been shown here
Re: Parsing file in Perl post processing by u65 (Chaplain) on Sep 09, 2015 at 10:48 UTC
Regarding the data format, it looks to me to be a string of keys (\w+\:) followed by their values consisting of all characters up to the next key, and the value may be empty. How any subkeys are to be assembled with their parent keys in the output is another matter that needs defining. UPDATE: Unless we are told otherwise, I see we have been given the subkey arrangement in the example output.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.