in reply to Parsing log file with regular-expression match nested brackets [RESOLVED]

Finally I came up with a solution that works for me.

I do not know if it will help anyone in future I am always posting my solutions just in case someone would be benefited by it.

So I observed based on experimentation that regex:

my @matches = $pdus =~ /[^{}]+ | \{ (?: (?R) | [^{}]+ )+ \} /gx; __END__ $VAR1 = [ '[ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: UDT +', '{ [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine2: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine3: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine4: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine5: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine6: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine7: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne8: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne9: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne10 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne11: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne12: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne13: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne14: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne15: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne16: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne17: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne18: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne19: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne 20 { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine21 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e22 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e23 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e24: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e25: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line26 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line27: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line29 data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line30: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line32: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line33: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line34: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line35: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line36: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line38: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line39: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line40: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line42 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line43: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line44: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line45: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line46: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line47: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line48: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line49: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line50: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line51: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line52: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line53: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line55: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line56: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line57: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line58: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line59: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line60 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line61: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line63: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line65 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line66 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line67: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line69: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line70: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line72 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line73: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line74: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line75: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line76: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line77: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line78: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line79: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line80: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line81: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line82: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line84: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line85: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line86: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line87: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line88: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line89: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line90: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line91: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line92: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line93: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line94: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line95: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line98: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line99: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line100: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line104: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line105: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line106: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line107: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e110: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: }' ];

Also regex:

my @matches = $pdus =~ /\{ (?: (?R) | [^{}]+ )+ \}/gx; __END__ $VAR1 = [ '{ [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine2: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine3: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine4: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine5: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine6: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine7: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne8: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne9: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne10 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne11: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne12: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne13: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne14: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne15: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne16: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne17: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne18: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne19: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: li +ne 20 { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: l +ine21 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e22 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e23 data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e24: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e25: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line26 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line27: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line29 data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line30: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line32: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line33: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line34: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line35: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line36: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line38: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line39: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line40: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line42 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line43: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line44: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line45: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line46: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line47: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line48: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line49: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line50: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line51: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line52: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line53: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line55: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line56: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line57: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line58: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line59: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line60 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line61: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line63: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line65 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line66 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line67: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line69: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line70: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line72 = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line73: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line74: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line75: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line76: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line77: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line78: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line79: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line80: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line81: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line82: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line84: data = { [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line85: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line86: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line87: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line88: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line89: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line90: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line91: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line92: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line93: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line94: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line95: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line98: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line99: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line100: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line104: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line105: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line106: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + line107: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: + } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: } [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: lin +e110: data [ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: }' ];

The difference between the two regex is that one parses the string into pieces when the curly bracket begins and the second one is parsing the string by ignoring anything before the curly bracket.

Thanks to poj for his interesting idea to use the first part of the string before the curly bracket as a key it actually was really useful for me. I am using timestamps in the string which helps me to differentiate the keys of a hash and sort them.

So the final solution to my problem is:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub fileParse { my ( @files ) = @_; my $refHash = {}; open(my $fh, '<' , $files[0]) or die "Could not open file '$files[0]' $!"; my $pdus = do { local $/; <$fh> }; close $fh or die "Could not close file '$files[0]' $!"; chomp( $pdus ); # my @matches = $pdus =~ /\{ (?: (?R) | [^{}]+ )+ \}/gx; my @matches = $pdus =~ /[^{}]+ | \{ (?: (?R) | [^{}]+ )+ \} /gx; # Trim leading and trailing white spaces s/^\s+|\s+$//g for (@matches); %{$refHash} = @matches; return $refHash; } my $final = fileParse( @ARGV ); my @sorted = (); push @sorted, $_ for (sort keys %{$final}); print Dumper \@sorted; __END__ $VAR1 = [ '[ 11] 25/2/2017-19:02:06.980 proces_name thanos-Rx: UDT +', '[ 12] 25/2/2017-19:02:06.996 proces_name thanos-Tx: UDT +', '[ 13] 25/2/2017-19:02:07.185 proces_name thanos-Rx: UDT +', '[ 14] 25/2/2017-19:02:07.227 proces_name thanos-Tx: UDT +', '[ 17] 25/2/2017-19:02:07.413 proces_name thanos-Tx: UDT +', '[ 18] 25/2/2017-19:02:09.794 proces_name thanos-Rx: UDT +' ];

The final code output is with a test including multiple entries similar to the one that I posted on the question. Due to many lines output I am posting the whole file, but it can be replicated by copying and pasting the same file multiple times and then you need to alter the time-stamp slightly to differentiate them.

Hope this helps someone else also in the future.

Seeking for Perl wisdom...on the process of learning...not there...yet!