Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have a communication log now like below

Message: 11 started at: 2018-06-29 16:20:07 Transmit: ATV1[0D] Transmit: [01]10179311000=[03] Receive: [01]20179321157>[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]10179312000>[03] Receive: [01]20179331157?[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]10179313000?[03] Receive: [01]201793411578[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]101793140008[03] Receive: [01]201793511579[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]101793150009[03] Receive: [01]001793611578[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Message: reading of spontaneous buffer not ordered Message: Periodic Buffer: Start: 2018-06-29 13:15:00 End: 2018-06-29 16:10:00 Periods: 36 Dec: 8 Points: 15 bytes collected: 5564 estimated: 556 +4 Message: amount of bytes collected ok, data accepted Message: 11 ended at: 2018-06-29 16:20:46
the log has 3 sections: Message, Transmit and Receive. Every section compose of 3 parts: head(like Message:,Transmit:etc), data(mix hex char,ascii code, and return because every data can't be larger than 80 bytes so maybe one received data could be split to 2 or 3lines), and tail(only a return).

my job is retrieved all data in receive section, remove every return tailed fromdata, convertascii code tohex string,likebelow:
Receive: [01]201793511579[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] #old 01 32 30 31...........30 03 00 #newstring has only one +line, is allmade of hex string and no []
what I do is create 2 tempery array to deal with it:
my $start_str = "Receive"; my $end_str ="\n"; my @first_loop = grep { /$start_str/../^$end_str$/ ? 1 : 0; } @logs; my @second_loop; my $data = ""; my $k = 0;my $j=0; for(@first_loop){ #first loop is to rem +ove return, head and tail do { $k = 1, next } if $_ =~ /$start_str/; do { $k = 0; push @second_loop, $data; $data = ""; next;} if $ +_ =~ /^$end_str$/; do { $sctm_data .= $_; chomp $sctm_data; next } if $k == 1; } my @strings; my $refined_logs; my $hex_str; for(@second_loop){ # second loop is to convert and remo +ve [] my $data_str = $_; chomp $data_str; for(split(//, $data_str)) { do { $j = 1, next } if $_ =~ /\[/; do { $j = 0;push @strings, $hex_str; $hex_str = ""; next;} if +$_ =~ /\]/; do { $hex_str .= $_; next } if $k == 1; push @strings, sprintf("%02X", ord($_)); } my $last_str = join( " ", @strings ); @strings =(); push @refined_logs, $last_str; }
It works, but I'd like to only use map plus grep to deal with it. like @output_array = map{} map {} grep{} grep{} input_array. I think it's more tidy and easy understanding. Could you enlightened me using several examples? or in this senario, map and grep is not a good way? Thanks for your help!

Replies are listed 'Best First'.
Re: how use map and grep or several loops?
by kcott (Archbishop) on Jul 05, 2018 at 07:19 UTC

    If you're dealing with log files, they're often large and copying all their data into arrays, and then creating more arrays from that data, is generally not a good idea: it's likely to be very slow and you could run into memory issues.

    Instead, read the files one line at a time or, in your case here, one paragraph at a time. The following line (which you'll see in the script I've provided below) turns on paragraph mode:

    local $/ = '';

    I'll also just comment on all those do statements. Is there a reason you coded it that way? Instead of

    do { STATEMENTS } if CONDITION;

    why not write

    if (CONDITION) { STATEMENTS }

    Anyway, I truncated your data quite substantially but left the main features: head, multiline data and tail. The data you posted had a space after every "Receive:" — I've left that in but you should check if it's there in the original data (it could be an artefact of copy/paste, HTML rendering, etc.). Here's the script to process it:

    #!/usr/bin/env perl use strict; use warnings; { local $/ = ''; my $wanted = "Receive: \n"; my $re = qr{(\[\d\d\]|.)}; while (<DATA>) { chomp; next unless substr($_, 0, length $wanted, '') eq $wanted; #print "|$_|\n"; # For demo only - see current $_ value my @hex; while (/$re/g) { push @hex, length $1 == 1 ? sprintf '%02X', ord $1 : substr($1, 1, 2); } print "@hex\n"; } } __DATA__ Message: 11 started at: 2018-06-29 16:20:07 Transmit: ATV1[0D] Transmit: [01]10179311000=[03] Receive: [01]20179321157>[02]00 00 00[03][00] Transmit: [01]10179312000>[03] Receive: [01]20179331157?[02]00 00 00[03][00] Message: amount of bytes collected ok, data accepted Message: 11 ended at: 2018-06-29 16:20:46

    Output:

    01 32 30 31 37 39 33 32 31 31 35 37 3E 02 30 30 30 30 30 30 03 00 01 32 30 31 37 39 33 33 31 31 35 37 3F 02 30 30 30 30 30 30 03 00

    If you uncomment that demo print line, you'll probably find it a bit easier to compare the data being processed and the data being output. It changes the output to this:

    |[01]20179321157>[02]00 00 00[03][00]| 01 32 30 31 37 39 33 32 31 31 35 37 3E 02 30 30 30 30 30 30 03 00 |[01]20179331157?[02]00 00 00[03][00]| 01 32 30 31 37 39 33 33 31 31 35 37 3F 02 30 30 30 30 30 30 03 00

    — Ken

      Thanks kcott!

      The reason I prefer map/grep than a while to deal with something like this is because I like the feel that map/grep bring me. Image you drive on a freeway, if the way is only one exit, you can concentrate driving or you have to note every road board. the performance you sacrificed is worthy I think. ;)

Re: how use map and grep or several loops?
by johngg (Canon) on Jul 05, 2018 at 12:09 UTC

    Some might disagree with you regarding the clarity of the grep and map approach and a solution using loops of various flavours might be easier to maintain in the long run. However, here's a solution for you which wraps everything in a do block to contain the localization of paragraph mode.

    use 5.022; use warnings; open my $logFH, q{<}, \ <<__EOD__ or die $!; Message: 11 started at: 2018-06-29 16:20:07 Transmit: ATV1[0D] Transmit: [01]10179311000=[03] Receive: [01]20179321157>[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]10179312000>[03] Receive: [01]20179331157?[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]10179313000?[03] Receive: [01]201793411578[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]101793140008[03] Receive: [01]201793511579[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Transmit: [01]101793150009[03] Receive: [01]001793611578[02]00000068801400000000000000000000000000000000006880 +14000000 0000000068801400000000000000000000400000000040000000004000000000400000 +00000000 000000000000000000[03][00] Message: reading of spontaneous buffer not ordered Message: Periodic Buffer: Start: 2018-06-29 13:15:00 End: 2018-06-29 16:10:00 Periods: 36 Dec: 8 Points: 15 bytes collected: 5564 estimated: 556 +4 Message: amount of bytes collected ok, data accepted Message: 11 ended at: 2018-06-29 16:20:46 __EOD__ my @records = do { local $/ = q{}; map { join q{ }, map { m{\[\d\d\]} ? substr $_, 1, 2 : map { sprintf q{%02x}, ord } split m{}; } @$_; } map { s{^Receive:}{}; s{\s+}{}g; [ split m{(\[\d\d\])} ]; } grep { m{^Receive:} } <$logFH>; }; say qq{$_\n} for @records;

    I don't show the output here as it will wrap horribly. Despite giving you this solution I would recommend sticking with looping constructs.

    Update: Got rid of the middle map by moving the split into the first map.

    Update 2: The first update would have made more sense if I'd included the original code so here it is :-

    map { join q{ }, map { m{\[\d\d\]} ? substr $_, 1, 2 : map { sprintf q{%02x}, ord } split m{}; } @$_; } map { [ split m{(\[\d\d\])} ] } map { s{^Receive:}{}; s{\s+}{}g; $_; } grep { m{^Receive:} } <$logFH>;

    Cheers,

    JohnGG

      Many thanks johngg !

      although you recommend loop way to deal with something like this, I prefer map/grep way. It's like a FIFO queue, so that you can easily understand how original data transform to the data I want.