swatzz has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I am not exactly new to Perl and have been writing programms quite often but i am ever so often stumped by regular expreessions in Perl. Would really appreciate if somone could give me some hints with regards to the following problem.

I have a text file to parse, only for one matchcase, which occurs multpile times in the text file. The matchcase is a long sentence (string!) but i need to extract only a part of it as output.

Here is my code:

open FILE, "<", "XYZ.txt" or die $!; my @actionData = (); my $i = 0; while (my $line = <FILE>) { if ( $line =~ /action/ ) { push(@actionData, (split (/^\d+/, $line))); $i++; } }
The $line (the long sentence!) is:
[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/Test +s/Mcu/A_test.cCALL: (null)
I need to extract just  (62,1,0,0,0,0,5,53,9,0,190) and set all non zero numbers as some key-value pair:
a => 62 b => 1 c => 5 d => 53
and so on... Any hints would be welcome really! And thank you all in advance.

PS: I have been using Perl monks for over a year now and it really has helped me with lots of doubts that needed clarification but this is also my first post here! I hope i have framed the question understandable enough and in the right format!!

Replies are listed 'Best First'.
Re: Extract a small part of a long sentence using regular expressions
by toolic (Bishop) on Dec 02, 2014 at 13:26 UTC
    Your regex doesn't work because you are looking for a number at the beginning of the line, but your line starts with a "[" character. You could grab all the contents between the parens, then split on comma:
    use warnings; use strict; use Data::Dumper; my @actionData; while (my $line = <DATA>) { if ( $line =~ /action\(([^)]+)\)/ ) { my @nums = grep { $_ != 0 } split /,/, $1; push @actionData, @nums; } } print Dumper(\@actionData); __DATA__ [AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/Test +s/Mcu/A_test.cCALL: (null)

    outputs:

    $VAR1 = [ 62, 1, 5, 53, 9, 190 ];
      Oh more importantly, thanks for the help on regular expression. I hope someday i can figure this thing out right!
      Thank you toolic! Using grep never occured to me!!

      Cheers!

Re: Extract a small part of a long sentence using regular expressions
by choroba (Cardinal) on Dec 02, 2014 at 13:33 UTC
    You can use a bit more sophisticated regex to capture the numbers and commas in the parentheses. Then split them on commas, grep for non-zeroes, and use a hash slice to populate the key-value pairs in one step:
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; while (<DATA>) { next unless /action \( ([0-9,]+ ) \) /x; my @args = grep $_, split /,/, $1; my %hash; @hash{ ('a' .. 'z')[0 .. $#args] } = @args; print Dumper \%hash; } __DATA__ blah blah [AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/Test +s/Mcu/A_test.cCALL: (null)

    Output:

    $VAR1 = { 'e' => '9', 'c' => '5', 'a' => '62', 'b' => '1', 'd' => '53', 'f' => '190' };
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thank you! I tried your method but somehow 2 things go wrong.

      1. I am unable to sort the keys in the right order.

      2. I keep getting 2 hash dumpers, both containing the same key-value pairs only in different orders!

        1. Hash keys are not sorted. Check $Data::Dumper::Sortkeys in Data::Dumper.
        2. Your data probably contain two matching lines.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      hi....elegant solution but please: where does the array hash figure....I see the %hash declaration but how is the array sigil used?
        It's called a hash slice. See Slices in perldata. The @ sigil just means plural, as -s in English, it doesn't necessarily mean "array".
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        { 'hashes', 'are', 'curly', 'ones' }

        [ 'arrays', 'are', 'square' ]

        ( 'lists are round' )

        my %foo; @foo{ 'hash', 'slices' } = ( 'use', 'curly braces' );

        my @bar; @bar[ 1,2,3,4 ] = ( 'array', 'slices', 'use', 'square brackets' );

        See also References quick reference

Re: Extract a small part of a long sentence using regular expressions
by QM (Parson) on Dec 02, 2014 at 13:34 UTC
    As long as the target is a parenthesized list of integers, this will grab the list:

    Update: Fixed the regex to capture correctly by putting parens around the list inside the literal parens, and ignoring captures on the internal group.

    # First, just grab the list if (my ($list) = $line =~ /\((\d+(?:,\d+)*)\)/) { # split the list by commas, assuming no whitespace my @list = split ',', $list; # initialise the magic alpha incrementer key my $key = 'a'; my %hash; for my $value (@list) { next unless $value; $hash{$key} = $value; # increment magically ++$key; } do_something_with(%hash); }

    Then the question is whether you need to do something with %hash for each line, or accumulate these across the whole file. If it's file level, move the my %hash; to before the if, and the do_something_with(%hash) after the if block.

    Also, do_something_with(%hash) might be better as a hash reference:

    do_something_with(\%hash);

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      if (my $list = $line =~ /\(\d+(,\d+)*\)/) { ... }

      The problem with this is it only captures the match success status in the  $list scalar:

      c:\@Work\Perl>perl -wMstrict -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; if (my $list = $line =~ /\(\d+(,\d+)*\)/) { print qq{'$list'}; } " '1'
      Because of the way something like  (,\d+)* works, changing  $list to an array  @list isn't much better:
      c:\@Work\Perl>perl -wMstrict -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; if (my @list = $line =~ /\(\d+(,\d+)*\)/) { print qq{(@list)}; } " (,190)
      (This works the same with or without a  /g modifier on the  m// match.)

      To extract all digit groups, you could do something like:

      c:\@Work\Perl>perl -wMstrict -MData::Dump; -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; if (my @list = $line =~ m{ (?: \G , | action\( ) (\d+) }xmsg) { printf qq{'$_' } for @list; print ''; my %hash = do { my $k = 'a'; map { $_ ? ($k++ => $_) : () } @list +}; dd \%hash; } " '62' '1' '0' '0' '0' '0' '5' '53' '9' '0' '190' { a => 62, b => 1, c => 5, d => 53, e => 9, f => 190 }
      (Add  \s* whitespace flavoring to taste.) (Update: The  \G , pattern assumes that a  , (comma) never occurs at the beginning of  $line.)

      Update: If you want to get a bit fancy, do it all in one swell foop and then just test if the hash has anything in it:

      c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $line = '[AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/ +Tests/Mcu/A_test.cCALL: (null)'; ;; my %hash = do { my $k = 'a'; map { $_ ? ($k++ => $_) : () } $line =~ m{ (?: \G , | action [(] ) \K \d+ }xmsg; }; ;; if (%hash) { dd \%hash; } else { print 'no got'; } " { a => 62, b => 1, c => 5, d => 53, e => 9, f => 190 }
      (The  \K regex operator comes with Perl versions 5.10+. If your version pre-dates 5.10, let me know and I'll supply a simple fix.)

        This worked like a charm! Thank you. I have now learnt what a 'named backreference' is and what it can do and also how magical the incrementer  my $k = 'a'; can be!
      Thank you! The fact that one small thing in Perl can be figured out in so many different ways give me the creeps! This is a very intersting approach and i must admit i had not thought of this...

      Just one question though, this magic alpha incrementer, i don't get it. Is it liek a normal counter where we say  $count = 1; and then increment it or is this something different??

        It is like a normal incrementer but it works on string variables, which is the 'magical' part. The variable has to only have been used in string context since it was set and match the pattern:

        /^[a-zA-Z]*[0-9]*$/

        and not be the null string. It's pretty much designed for cases like this. If you have more than 26 keys, it will go from 'z' to 'aa' and so on. The autodecrement operator (--) ISN'T magical, and I don't think the incrementer works on Unicode, but it's still pretty cool.

Re: Extract a small part of a long sentence using regular expressions
by karlgoethebier (Abbot) on Dec 02, 2014 at 15:07 UTC

    Eclectic TIMTOWTDI:

    use Data::Dump; use strict; use warnings; my $line = qq([AHB_REPORTER][INFO]: action(62,1,0,0,0,0,5,53,9,0,190)D:/XYZ/reg/T +ests/Mcu/A_test.cCALL: (null)); my $key = q(a); my %hash = map { $key++ => $_ } grep { $_ != 0 } &{ sub { $line =~ /action\(([^)]+)\)/; split /,/, $1; } }; dd \%hash; __END__ { a => 62, b => 1, c => 5, d => 53, e => 9, f => 190 }

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Thanks Karl! Might be a dumb question(with Perl i am always in the figuring out stage!!) but what does  &{ sub { blah } do exactly??
        It's a dereference, I'd rather (if ever) write it as
        my %hash = map { $key++ => $_ } grep { $_ != 0 } sub { $line =~ /action\(([^)]+)\)/; split /,/, $1; }->();
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        "Might be a dumb question..."

        Sorry - my fault. I should have mentioned this. Please see perlsub as well as anonymous functions.

        Regards, Karl

        P.S.: There are no dumb questions. Just dumb answers ;-)

        «The Crux of the Biscuit is the Apostrophe»