Extract a small part of a long sentence using regular expressions

swatzz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extract a small part of a long sentence using regular expressions by toolic (Bishop) on Dec 02, 2014 at 13:26 UTC
Your regex doesn't work because you are looking for a number at the beginning of the line, but your line starts with a "[" character. You could grab all the contents between the parens, then split on comma: `use warnings; use strict; use Data::Dumper; my @actionData; while (my $line = <DATA>) { if ( $line =~ /action$([^)]+)$/ ) { my @nums = grep { $_ != 0 } split /,/, $1; push @actionData, @nums; } } print Dumper(\@actionData); __DATA__ [AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/Test +s/Mcu/A_test.cCALL: (null)` [download] outputs: `$VAR1 = [ 62, 1, 5, 53, 9, 190 ];` [download]	[reply] [d/l] [select]
Re^2: Extract a small part of a long sentence using regular expressions by swatzz (Novice) on Dec 02, 2014 at 13:34 UTC
Oh more importantly, thanks for the help on regular expression. I hope someday i can figure this thing out right!	[reply]
Re^2: Extract a small part of a long sentence using regular expressions by swatzz (Novice) on Dec 02, 2014 at 13:32 UTC
Thank you toolic! Using grep never occured to me!! Cheers!	[reply]
Re: Extract a small part of a long sentence using regular expressions by choroba (Cardinal) on Dec 02, 2014 at 13:33 UTC
You can use a bit more sophisticated regex to capture the numbers and commas in the parentheses. Then split them on commas, grep for non-zeroes, and use a hash slice to populate the key-value pairs in one step: `#!/usr/bin/perl use warnings; use strict; use Data::Dumper; while (<DATA>) { next unless /action $ ([0-9,]+ ) $ /x; my @args = grep $_, split /,/, $1; my %hash; @hash{ ('a' .. 'z')[0 .. $#args] } = @args; print Dumper \%hash; } __DATA__ blah blah [AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/Test +s/Mcu/A_test.cCALL: (null)` [download] Output: `$VAR1 = { 'e' => '9', 'c' => '5', 'a' => '62', 'b' => '1', 'd' => '53', 'f' => '190' };` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^2: Extract a small part of a long sentence using regular expressions by Anonymous Monk on Dec 02, 2014 at 16:06 UTC
Thank you! I tried your method but somehow 2 things go wrong. 1. I am unable to sort the keys in the right order. 2. I keep getting 2 hash dumpers, both containing the same key-value pairs only in different orders!	[reply]
Re^3: Extract a small part of a long sentence using regular expressions by choroba (Cardinal) on Dec 02, 2014 at 16:15 UTC
Hash keys are not sorted. Check `$Data::Dumper::Sortkeys` in Data::Dumper. Your data probably contain two matching lines. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^4: Extract a small part of a long sentence using regular expressions by swatzz (Novice) on Dec 02, 2014 at 16:36 UTC
Re^2: Extract a small part of a long sentence using regular expressions by fionbarr (Friar) on Dec 02, 2014 at 21:01 UTC
hi....elegant solution but please: where does the array hash figure....I see the %hash declaration but how is the array sigil used?	[reply]
Re^3: Extract a small part of a long sentence using regular expressions by choroba (Cardinal) on Dec 02, 2014 at 21:30 UTC
It's called a hash slice. See Slices in perldata. The @ sigil just means plural, as -s in English, it doesn't necessarily mean "array". لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^3: Extract a small part of a long sentence using regular expressions by Anonymous Monk on Dec 02, 2014 at 21:38 UTC
{ 'hashes', 'are', 'curly', 'ones' } [ 'arrays', 'are', 'square' ] ( 'lists are round' ) my %foo; @foo{ 'hash', 'slices' } = ( 'use', 'curly braces' ); my @bar; @bar[ 1,2,3,4 ] = ( 'array', 'slices', 'use', 'square brackets' ); See also References quick reference	[reply]
Re: Extract a small part of a long sentence using regular expressions by QM (Parson) on Dec 02, 2014 at 13:34 UTC
As long as the target is a parenthesized list of integers, this will grab the list: Update: Fixed the regex to capture correctly by putting parens around the list inside the literal parens, and ignoring captures on the internal group. `# First, just grab the list if (my ($list) = $line =~ /$(\d+(?:,\d+)*)$/) { # split the list by commas, assuming no whitespace my @list = split ',', $list; # initialise the magic alpha incrementer key my $key = 'a'; my %hash; for my $value (@list) { next unless $value; $hash{$key} = $value; # increment magically ++$key; } do_something_with(%hash); }` [download] Then the question is whether you need to do something with `%hash` for each line, or accumulate these across the whole file. If it's file level, move the `my %hash;` to before the `if`, and the `do_something_with(%hash)` after the `if` block. Also, `do_something_with(%hash)` might be better as a hash reference: `do_something_with(\%hash);` [download] -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re^2: Extract a small part of a long sentence using regular expressions by AnomalousMonk (Archbishop) on Dec 02, 2014 at 17:40 UTC
`if (my $list = $line =~ /$\d+(,\d+)$/) { ... }`* The problem with this is it only captures the match success status in the `$list` scalar: `c:\@Work\Perl>perl -wMstrict -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; if (my $list = $line =~ /$\d+(,\d+)$/) { print qq{'$list'}; } " '1'` [download] Because of the way something like `(,\d+)` works, changing `$list` to an array `@list` isn't much better: `c:\@Work\Perl>perl -wMstrict -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; if (my @list = $line =~ /$\d+(,\d+)$/) { print qq{(@list)}; } " (,190)` [download] (This works the same with or without a `/g` modifier on the `m//` match.) To extract all digit groups, you could do something like: `c:\@Work\Perl>perl -wMstrict -MData::Dump; -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; if (my @list = $line =~ m{ (?: \G , \| action\( ) (\d+) }xmsg) { printf qq{'$_' } for @list; print ''; my %hash = do { my $k = 'a'; map { $_ ? ($k++ => $_) : () } @list +}; dd \%hash; } " '62' '1' '0' '0' '0' '0' '5' '53' '9' '0' '190' { a => 62, b => 1, c => 5, d => 53, e => 9, f => 190 }` [download] (Add `\s` whitespace flavoring to taste.) (Update: The `\G ,` pattern assumes that a `,` (comma) never occurs at the beginning of `$line`.) Update: If you want to get a bit fancy, do it all in one swell foop and then just test if the hash has anything in it: `c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; ;; my %hash = do { my $k = 'a'; map { $_ ? ($k++ => $_) : () } $line =~ m{ (?: \G , \| action [(] ) \K \d+ }xmsg; }; ;; if (%hash) { dd \%hash; } else { print 'no got'; } " { a => 62, b => 1, c => 5, d => 53, e => 9, f => 190 }` [download] (The `\K` regex operator comes with Perl versions 5.10+. If your version pre-dates 5.10, let me know and I'll supply a simple fix.)	[reply] [d/l] [select]
Re^3: Extract a small part of a long sentence using regular expressions by swatzz (Novice) on Dec 03, 2014 at 07:45 UTC
This worked like a charm! Thank you. I have now learnt what a 'named backreference' is and what it can do and also how magical the incrementer `my $k = 'a';` can be!	[reply] [d/l]
Re^2: Extract a small part of a long sentence using regular expressions by Anonymous Monk on Dec 02, 2014 at 16:00 UTC
Thank you! The fact that one small thing in Perl can be figured out in so many different ways give me the creeps! This is a very intersting approach and i must admit i had not thought of this... Just one question though, this magic alpha incrementer, i don't get it. Is it liek a normal counter where we say `$count = 1;` and then increment it or is this something different??	[reply] [d/l]
Re^3: Extract a small part of a long sentence using regular expressions by Myrddin Wyllt (Hermit) on Dec 02, 2014 at 23:22 UTC
It is like a normal incrementer but it works on string variables, which is the 'magical' part. The variable has to only have been used in string context since it was set and match the pattern: `/^[a-zA-Z][0-9]$/` and not be the null string. It's pretty much designed for cases like this. If you have more than 26 keys, it will go from 'z' to 'aa' and so on. The autodecrement operator (--) ISN'T magical, and I don't think the incrementer works on Unicode, but it's still pretty cool.	[reply] [d/l]
Re: Extract a small part of a long sentence using regular expressions by karlgoethebier (Abbot) on Dec 02, 2014 at 15:07 UTC
Eclectic TIMTOWTDI: `use Data::Dump; use strict; use warnings; my $line = qq([AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/T +ests/Mcu/A_test.cCALL: (null)); my $key = q(a); my %hash = map { $key++ => $_ } grep { $_ != 0 } &{ sub { $line =~ /action$([^)]+)$/; split /,/, $1; } }; dd \%hash; __END__ { a => 62, b => 1, c => 5, d => 53, e => 9, f => 190 }` [download] Regards, Karl «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l]
Re^2: Extract a small part of a long sentence using regular expressions by Anonymous Monk on Dec 02, 2014 at 15:53 UTC
Thanks Karl! Might be a dumb question(with Perl i am always in the figuring out stage!!) but what does `&{ sub { blah }` do exactly??	[reply] [d/l]
Re^3: Extract a small part of a long sentence using regular expressions by choroba (Cardinal) on Dec 02, 2014 at 16:02 UTC
It's a dereference, I'd rather (if ever) write it as `my %hash = map { $key++ => $_ } grep { $_ != 0 } sub { $line =~ /action$([^)]+)$/; split /,/, $1; }->();` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^4: Extract a small part of a long sentence using regular expressions by karlgoethebier (Abbot) on Dec 02, 2014 at 17:19 UTC
Re^5: Extract a small part of a long sentence using regular expressions by choroba (Cardinal) on Dec 02, 2014 at 17:44 UTC
Some notes below your chosen depth have not been shown here
Re^4: Extract a small part of a long sentence using regular expressions by Anonymous Monk on Dec 02, 2014 at 16:07 UTC
Re^3: Extract a small part of a long sentence using regular expressions by karlgoethebier (Abbot) on Dec 02, 2014 at 17:28 UTC
"Might be a dumb question..." Sorry - my fault. I should have mentioned this. Please see perlsub as well as anonymous functions. Regards, Karl P.S.: There are no dumb questions. Just dumb answers ;-) «The Crux of the Biscuit is the Apostrophe»	[reply]