in reply to Regular Expression, Catching Variables
Generally, the right tool for regularly-delimited data like this is 'split'. In your case, you'd probably want to use a regex to get rid of the content you don't want (i.e., the parenthesized bits) and then use 'split', e.g.:
Regexes are usually used for data that's more of a challenge (i.e., does not follow any regular pattern.) Having said that, and since you've mentioned that you're doing this as a learning experience, here are a couple of suggestions:$line =~ s/\([^)]+\)//g; my @results = split /,/, $line; print "$_: $results[$_]\n" for 0 .. $#results;
Unless you have a specific reason for doing so, try to avoid using the '*' quantifier in captures (parentheses): it's likely to mislead you, either by matching nothing or by matching too much, so that the remaining captures end up empty or undefined.
A useful technique for capturing data followed by some delimiter is to capture a string of what I call "inverted delimiters":
$string = "abc,def;ghi"; $string =~ /^([^,]+),([^;]+);(.+)$/;
I used that technique in the first snippet, to say "replace all '('s followed by any number of non-')'s, followed by a ')'".
Last of all, you need to have a capture (parenthesis set in your regex) for every variable you expect to create. This is, of course, part of the pain of using a regex for a long, complicated line - and one of the reasons to try to automate the whole thing. You have four captures, and therefore, only four variables.
Here's another technique that you may find useful for future reference: you can build a regex out of "pieces" each of which represents a field. The "work" part of this technique is in constructing one or more definitions of what a field is.
# Capture a 'non-comma/non-open-paren' string, optionally # followed by parens (not captured), optionally followed by a comma my $s = '([^,(]+)(?:\([^)]+\))?,?'; # Regex consists of 11 of these my $re = $s x 11; my @out = $line =~ /^$re$/; print "$_: $out[$_]\n" for 0 .. $#out;
This is not, as you've probably guessed by now, an uncommon problem. :)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regular Expression, Catching Variables
by ack (Deacon) on Jun 23, 2009 at 19:17 UTC |