PerlJam2015 has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to start off by saying this is literally my first post on this website, so if there are some conventions I'm not following that are immediately and clearly irritating you, kindly inform me so that I might edit this post to follow the "monastery's" standards.

I'm learning Perl on my own with Simon Cozen's free version of "Beginning Perl" available online. It's pretty dated, and truth be told I've already begun to get a little frustrated with it since in chapter 3 one of the exercises required knowledge of the  chomp function before he introduced it.

It's my first language, and I'm learning it over my winter break in the short period before classes resume at my college (I'm a chemical engineering major and I have an interest in de-novo protein design which will require a data pipeline involving at least three languages, Perl or Python being the midway between C and R (CPR data pipeline. Go figure)) and frankly have fallen in love with the language's ways of evoking variables and especially the regular expressions (I can feel the power!) but I'm in a rut.

I'm not asking anyone to do my homework for me. There is no instructor. I'm just genuinely confused to the point of complete and utter stagnation. I've already spent hours looking at the documentation (it mentions grep which confused me even more), and a few posts from 2005. (One had a solution involving a subroutine which normally I'd be ready to jump right into and use, but I'd like to figure out how to do this with an  until loop. or something)

So getting right into it... (TLDR)

Earlier in the chapter it notes that when using a Regular Expression with grouping, the sucker "eats up" hits and stores them in incrementally changing variables of $1, $2... Immediately it showed an example like the following

EDIT: Edited parts in bold
use warnings; use strict; $_ = '1: A silly sentence (495,a) *BUT* one which will be useful. (3)' +; my $pattern = <STDIN>;
if (/$pattern/){
followed by a
print "words $pattern\n"; print "\$1 morewords\n"; print "\$2 morewords\n"; print "\$seewhereimgoing\n"; }
( https://drive.google.com/viewerng/viewer?url=http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf for those interested) Now, as soon as I saw that I realized "There's probably a way to set some $i so that I can just
if /$pattern/{ print "$$i\n"; $i++; }
EDIT For all the $n

But for the life of me, I figure I'm reading the question wrong because it seems (from the way the author describes it) that there's a list generated that has the set of all these that I could just print or refer to in order to make this process a lot less head-scratching.

"3. When we use groups, the // operator returns a list of all the text strings that have been matched. Modify our example program matchtest2.plx, so that it produces its output from this list, rather than using special variables."

Am I completely insane here? Is there a magic  $wizardextract that stores the series of  $1, $2,.. $lots that I managed to overlook?

EDIT: Specifically, if the original data being looked at is hardcoded, and your regular expression is  <STDIN> how could you write your program such that it prints out ALL the  $1, $2, $n

I appreciate the heck out of any non-subroutine tips on this, and odds are if the consensus is "lol just go to chapter 6 (subroutines) and go back" I probably will.

-Mark

UPDATE:

So now, I've got this...

$_ = '1: A silly sentence (495,a) *BUT* one which will be useful. (3)' +; print "Enter a regular expression: "; my $pattern = <STDIN>; chomp $pattern; if (my @matches = /$pattern/) { for my $i (0..@-) { print "$i: $matches[$i]\n"; } }

Thanks to choroba for the means of capturing them in an array, and Corion for the @- series!

Now, the goal is to be able to print out all the variables that match all the groupings in a regular expression, for the example I'll be using: (a-z+?)(.*?)(a-z+?). Right now the output is...

Enter a regular expression: ([a-z]+?)(.*?)([a-z]+?) 0: s 1: 2: i Use of unitialized value in concatenation (.) or string at exp3.pl lin +e 14, <STDIN> line 1. 3. Use of unitialized value in concatenation (.) or string at exp3.pl lin +e 14, <STDIN> line 1. 4:
The goal is achieved, but now I'm confused as to what the error message means and how to deal with it. I'm working on that now. Re-update:

Thanks AnomalousMonk for the hint / warning. Now it's time for me to meditate, I suppose.

Replies are listed 'Best First'.
Re: exist backreference variable list?
by choroba (Cardinal) on Jan 09, 2015 at 11:16 UTC
    Hi PerlJam2015, welcome to the Monastery!

    You can store the matches in an array:

    $_ = 'abcd'; if (my @matches = /(.)(.)/) { for my $i (0, 1) { print "$i: $matches[$i]\n"; } }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: exist backreference variable list?
by Discipulus (Canon) on Jan 09, 2015 at 11:14 UTC
    welcome to Perl and to the monastry PerlJam2015!

    First: dont waste time with a very outdated book! there are some good and recent books about Perl. Some years ago i was in your same situation: first programming language and a very empiric computer experience. The problem you are experiencing is "how to assimilate the syntax" and Perl is a bit hard to approach, like the easter lunch, but as your stomach (mind?) get adapted, you got more and more hungry and you'll see a perlish way to get the job done.

    Because of this let me suggest you some different reading. Exists a good book that illustrate in a very effective way the Perl's panorama. I'm talking about 'Perl Cookbook'. May seems a nonsense because that book too is somehow aged. But is perfect Perl 5 even if not so modern. But the cookbook approach separated in meaningful chapter was something invaluable for me.

    You preferably get your copy of the free Moder Perl book or read it online. This book is a little harder than cookbook but is full of gems and explication of concepts you dont find in other book (defaults, context, coercion..) and is moder: id est it explains all new features of the language. Take that books together and dont forget the official documentation for function reference, tutorials, core modules docs..

    Now something about your question.
    Yes, when there are grouping parens in a regular expression Perl automatically put each resulting part in the special variable $<digits> ($1, $2, ...), and yes you can get all results into an array:
    perl -e "$ARGV[0] =~ /(a)(l)(l)/; print qq($1 $2 $3) " alltogether #out:a l l perl -e "@array = $ARGV[0] =~ /(a)(l)(l)/; print qq(@array) " alltoge +ther #out:a l l


    Regexes can do many many things and is a language inside another. Most of times you'll need only basic features of regexes.

    As last tip to learn a good Perl is to pay attention to idioms: prooved and safe ways to do some task in Perl. Perl as one motto that states: 'There is more than one way to do it' (aka TIMTOWTDI or Tim Taody for closer friends..). As you learn more and more the language you tend to write perlish and idiomatic Perl. Some idiom is illustrated here in the tutorials section of the monastery, others in Modern Perl book mentioned above.

    HtH
    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: exist backreference variable list?
by ww (Archbishop) on Jan 09, 2015 at 13:15 UTC

    Jumping first to your question about the existance of a $wizardextract -- well, yes there is: namely, use of elements enclosed by simple parentheses (eg  /(foo) bar (bat)/ inside the regex (more on this below)

    perl -e "my $str='foo bar bat'; if ($str =~ /(foo) bar (bat)/ ) {print + \"\$1: $1; \$2: $2)\"; }" $1: foo; $2: bat)
    Note 1: 'Doze quoting
    Note 2: Other uses of parens exist but are outside the scope of your question.

    The parens INSIDE the regex -- that is, between the forward slashes -- tell Perl to capture the elements they enclose (when matched) to the special $n variables. The only parens I found in the first half dozen pages of the text you referenced (TL, DRIA) are those required in the if (...) clause which may be a source of your misunderstanding.

    Switching topics: you object to the author invoking chomp before introducing it. That's perhaps pedagologicly sound, but it also misses (missed?) a critical lesson about learning Perl!

    You can use your command line and Perl's own documententation to understand (or at least get an introduction to) Perl functions. Here's an example:

    perldoc -f chomp chomp VARIABLE chomp( LIST ) chomp This safer version of "chop" removes any trailing string t +hat corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the "English" module). It retur +ns the total number of characters removed from all its arguments. + It's often used to remove the newline from the end of an input +record ....

    HTH

    Update: fixed last closing code tag's missing < before the slash-c. TY, many times over, to choroba & hippo

      Thank you for your response, ww. The book in chapter 5 in fact covers how to write the regular expression to search data with grouping, and later  print it. I realize now how vague I was being in the question and have changed it to reflect that the regular expression will not be hardcoded, but be <STDIN>

      Switching topics: you object to the author invoking chomp before introducing it. That's perhaps pedagologicly sound, but it also misses (missed?) a critical lesson about learning Perl!

      To be specific he requests, in chapter 3 that you

      3. Store your important phone numbers in a hash. Write a program to look up numbers by the person's name.

      I assumed values from <STDIN> would just match, and up till that point chomp isn't mentioned nor is the fact that the <STDIN> function throws in a /n at the end of every entry. I was bashing my head against the wall when the hash wasn't matching the entry, even if (so far as I could tell) it matched verbatim. It was only after pouring over the nodes here for awhile that I discovered the chomp function and what it did.

      That being said, I'm sort of glad it happened. It forced me to reach out and do some (?=) as you mentioned.

Re: exist backreference variable list?
by Corion (Patriarch) on Jan 09, 2015 at 13:33 UTC

    Personally, I haven't encountered a situation where I needed them, but perlvar also shows @+ and @-, which contain the start and end positions of the captured patterns in the target string. You can iterate over 0..@- to enumerate them all.

Re: exist backreference variable list?
by AnomalousMonk (Archbishop) on Jan 10, 2015 at 00:26 UTC

    for my $i (0..@-) {
        print "$i: $matches[$i]\n";
    }
    ...
    Use of unitialized value in concatenation ...
    ...
    ... I'm confused as to what the error message means and how to deal with it.

    If a regex matches against a string, the zeroth index of the  @- array always holds the overall match offset in the string, and higher indices (1, 2, ...) hold capture group match offsets, if there are any capture groups. (Note that if there is an overall match, all capture groups will be represented in  @- even if they did not match!) See  @- in perlvar. So in a regex that has two capture group matches, there are three elements in the array, and you want to iterate over the range  0 .. $#- to see all match offsets (or maybe over  1 .. $#- to see just capture group match offsets). Iterating over  0 .. @- gives you an extraneous, undefined element beyond the end of the  @- array. (Evaluating an array in scalar context yields the number of elements of the array, not the highest array index, given by $#array.)


    Give a man a fish:  <%-(-(-(-<

Re: exist backreference variable list?
by AnomalousMonk (Archbishop) on Jan 09, 2015 at 23:53 UTC

    I find regular expressions in general to be the least intuitive, most surprising computer "language" I've encountered (technically, they're a Domain-Specific Embedded Language, a DSEL) that is actually intended to be practical and useful rather than merely obscure; for an example of the latter, see Brainfuck, the source code of which looks remarkably like a traditional "line noise" regex definition.

    My favorite example of this counter-intuitiveness is the result of matching the simple regex  /(b*)/ against the string  'aaaaabbb':
        'aaaaabbb' =~ /(b*)/;
    What will be matched and captured to  $1 and where will the match occur? Knowing that matching is, by default, "greedy" and matches as much as possible, one's first thought might be as mine has often been, that it will match/capture  'bbb' at offset 5 in the string. Contemplation of the "Leftmost, Longest" rule for regex matching would seem to support this initial idea: offset 5 is the leftmost position at which the most  'b' characters are found — all of them in fact.

    A simple experiment shows we are deceived:

    c:\@Work\Perl\monks>perl -wMstrict -le "print qq{matched '$1' at offset $-[1]} if 'aaaaabbb' =~ /(b*)/; " matched '' at offset 0
    (The  @- array holds the offset of the start of each corresponding capture group match. See the Variables related to regular expressions section of perlvar.)
    I leave it to you, gentle PerlJam2015, to ponder why this regex actually matches an empty string (no  'b' at all) located as far from any  'b' as it could possibly be. Also consider the simplest way one might alter the regex so as to actually capture something like what we were expecting from a location near where we were expecting it.

    I do not show you these things in order to discourage you, but rather to steel you against the frustrations and perplexities that inevitably accompany the study and use of regular expressions.


    Give a man a fish:  <%-(-(-(-<

Re: exist backreference variable list?
by Anonymous Monk on Jan 09, 2015 at 12:09 UTC
    Hi, Mark, I frankly don't understand what you're asking. Can you please elaborate?
    Earlier in the chapter it notes that when using a Regular Expression with grouping, the sucker "eats up" hits and stores them in incrementally changing variables of $1, $2...
    It seems to me the book didn't make the way capture groups (parens) work in Perl's regexes clear? Perhaps it's just not a good book?