costas has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to find an example of a certain regular expression on the Net but cant seem to find an example.

I understand that there is a method of capturing matched cases using $1,$2, $3 etc

for example, i want to capture the number "99"
$variable = "-----;=99;helloworld";

What is the reg ex syntax to capture the "99" bit of the string and put it in $1, so that...
$1 = "99"

Replies are listed 'Best First'.
Re: Reg ex question
by Corion (Patriarch) on May 18, 2001 at 17:57 UTC

    Your question is a bit wide, as you don't tell us how your data is structured. I recommend http://www.activestate.com and their Komodo GUI, which has a regex explorer to become accustomed to regular expressions.

    For the sake of the node, let's assume that your data is CSV data (see Text::xSV, also on CPAN), and that you don't want (for some obscure reason) to use Text::XSV.

    What I'm going to do now is to construct a regular expression that skips n semicolons and then tries to match =nnn; :

    $line =~ m/(?: # Collect heading fields [^;]+ # delimited by semicolons ; ){1} # In your case, there is only one field before t +he interesting field = # followed by a "=" (\d+) # and followed by our numbers, stuff them into $ +1 ; # and delimited by another semicolon /x;

    Another good approach would have been to split() the line on ";" and then munge the second element of that list afterwards.

Re: Reg ex question
by jeroenes (Priest) on May 18, 2001 at 17:52 UTC
    Generally:
    $variable =~ m/;=(\d+);/ print $1; #holds 99 now.
    So everything between parens gets assigned to variables with a number as the name. They stay available until the next match or substitution. You can read all about it in perlop and perlre.

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)
    Update: chromatic pointed out to me that it would be good to make the remark that $1 is only set when the string matches. I totally agree with him.

Re: Reg ex question
by davorg (Chancellor) on May 18, 2001 at 17:49 UTC

    I assume that you really want to match two digits.

    $variable = "-----;=99;helloworld"; if ($variable =~ /(\d\d)/) { # $1 contains 99 }

    The parentheses in the regex mark the parts that you want to capture.

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

Re: Reg ex question
by tachyon (Chancellor) on May 18, 2001 at 19:02 UTC
    # you have been shown this, but I have added another 2 digits $variable = "-----;=99;helloworld88"; $variable =~ m/(\d\d)/; print "$1\n"; # 99 -> *the first 2 digit string* in $variable will now be in $1 # Note match will stop with the first \d\d sequence looking L->R # we can get all the \d\d sequences with this regex # the key is the /g at the end which stands for global print "got a $1\n" while $variable =~ m/(\d\d)/g; #Often we might then go on to assign $1 to a variable ie my $number = $1; #You can do this in on step with the expression ($number) = $variable =~ m/(\d\d)/; print "\$number is $number\n"; # you can also capture the values of $1,$2,$3 all at once into a list $record = "John Smith 919 909 900"; ($first,$last,$phone) = $record =~ m/(\w+)\s+(\w+)\s+([\d\s]+)/; print"$first,$last,$phone\n"; # or an array @stuff = $record =~ m/(\w+)\s+(\w+)\s+([\d\s]+)/; # popped this in here for educational purposes # $" is the output record seperator,default is one space $" = ' : '; print "@stuff"; Hope this helps tachyon
Re: Reg ex question
by Anonymous Monk on May 19, 2001 at 07:18 UTC
    say you want to match '99' in : 1990 , you can... $variable =~ /\d+(99)\d+/

    will match the 99 in 1990, and store it in $1, because the regexp was put in the back reference using ()'s ie:

    /\d+(\d+\d+)\d+/ will store the 2nd and 3rd digits in $1

    $var =~ /hello\s+(.*)/ will store whatver is after hello in $1

    if: $var = "hello world"; then $1="world";
      /\d+(\d+\d+)\d+/ will store the 2nd and 3rd digits in $1 only on a 4 digit number. and if you are only dealing with 4 digit numbers then the regex can be more simply written as /\d(\d\d)\d/. the + sign refers to one or more occurrences of the preceding thing. what you will find is that if you have a number of more than 4 digits your regex will match the third-last and second-last. this is because perl's regexes are "greedy" so the first \d+ will suck up as many digits as it can before backtracking to allow the rest to match. again, this could be more simply written as /\d+(\d\d)\d/.

      larryk $less->{'chars'} = `"time in the pub" | more`; # :-D

Re: Reg ex question
by markjugg (Curate) on May 19, 2001 at 00:54 UTC
    Parens are used to capture pieces for backreferences, with the contents of the first set going into $1, the second into $2 and so. In this case:
    $variable =~ m/-----;=(99);helloworld/;
    Most likely you want something more general, but this shows you to how capture "99" for use in the $1 backreference.

    -mark