Reg ex question

costas has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reg ex question by Corion (Patriarch) on May 18, 2001 at 17:57 UTC
Your question is a bit wide, as you don't tell us how your data is structured. I recommend http://www.activestate.com and their Komodo GUI, which has a regex explorer to become accustomed to regular expressions. For the sake of the node, let's assume that your data is CSV data (see Text::xSV, also on CPAN), and that you don't want (for some obscure reason) to use Text::XSV. What I'm going to do now is to construct a regular expression that skips n semicolons and then tries to match `=nnn;` : `$line =~ m/(?: # Collect heading fields [^;]+ # delimited by semicolons ; ){1} # In your case, there is only one field before t +he interesting field = # followed by a "=" (\d+) # and followed by our numbers, stuff them into $ +1 ; # and delimited by another semicolon /x;` [download] Another good approach would have been to `split()` the line on `";"` and then munge the second element of that list afterwards.	[reply] [d/l]
Re: Reg ex question by jeroenes (Priest) on May 18, 2001 at 17:52 UTC
Generally: `$variable =~ m/;=(\d+);/ print $1; #holds 99 now.` [download] So everything between parens gets assigned to variables with a number as the name. They stay available until the next match or substitution. You can read all about it in perlop and perlre. Hope this helps, Jeroen "We are not alone"(FZ) Update: chromatic pointed out to me that it would be good to make the remark that $1 is only set when the string matches. I totally agree with him.	[reply] [d/l]
Re: Reg ex question by davorg (Chancellor) on May 18, 2001 at 17:49 UTC
I assume that you really want to match two digits. `$variable = "-----;=99;helloworld"; if ($variable =~ /(\d\d)/) { # $1 contains 99 }` [download] The parentheses in the regex mark the parts that you want to capture. -- <http://www.dave.org.uk> "Perl makes the fun jobs fun and the boring jobs bearable" - me	[reply] [d/l]
Re: Reg ex question by tachyon (Chancellor) on May 18, 2001 at 19:02 UTC
# you have been shown this, but I have added another 2 digits $variable = "-----;=99;helloworld88"; $variable =~ m/(\d\d)/; print "$1\n"; # 99 -> the first 2 digit string in $variable will now be in $1 # Note match will stop with the first \d\d sequence looking L->R # we can get all the \d\d sequences with this regex # the key is the /g at the end which stands for global print "got a $1\n" while $variable =~ m/(\d\d)/g; #Often we might then go on to assign $1 to a variable ie my $number = $1; #You can do this in on step with the expression ($number) = $variable =~ m/(\d\d)/; print "\$number is $number\n"; # you can also capture the values of $1,$2,$3 all at once into a list $record = "John Smith 919 909 900"; ($first,$last,$phone) = $record =~ m/(\w+)\s+(\w+)\s+([\d\s]+)/; print"$first,$last,$phone\n"; # or an array @stuff = $record =~ m/(\w+)\s+(\w+)\s+([\d\s]+)/; # popped this in here for educational purposes # $" is the output record seperator,default is one space $" = ' : '; print "@stuff"; Hope this helps tachyon [download]	[reply] [d/l]
Re: Reg ex question by Anonymous Monk on May 19, 2001 at 07:18 UTC
say you want to match '99' in : 1990 , you can... `$variable =~ /\d+(99)\d+/` will match the 99 in 1990, and store it in $1, because the regexp was put in the back reference using ()'s ie: `/\d+(\d+\d+)\d+/` will store the 2nd and 3rd digits in $1 `$var =~ /hello\s+(.*)/` will store whatver is after hello in $1 if: $var = "hello world"; then $1="world";	[reply] [d/l] [select]
Re: Re: Reg ex question by larryk (Friar) on May 19, 2001 at 14:25 UTC
`/\d+(\d+\d+)\d+/` will store the 2nd and 3rd digits in $1 only on a 4 digit number. and if you are only dealing with 4 digit numbers then the regex can be more simply written as `/\d(\d\d)\d/`. the `+` sign refers to one or more occurrences of the preceding thing. what you will find is that if you have a number of more than 4 digits your regex will match the third-last and second-last. this is because perl's regexes are "greedy" so the first `\d+` will suck up as many digits as it can before backtracking to allow the rest to match. again, this could be more simply written as `/\d+(\d\d)\d/`. larryk $less->{'chars'} = `"time in the pub" \| more`; # :-D	[reply] [d/l] [select]
Re: Reg ex question by markjugg (Curate) on May 19, 2001 at 00:54 UTC
Parens are used to capture pieces for backreferences, with the contents of the first set going into $1, the second into $2 and so. In this case: `$variable =~ m/-----;=(99);helloworld/;` [download] Most likely you want something more general, but this shows you to how capture "99" for use in the $1 backreference. -mark	[reply] [d/l]