comment on

Generally, the right tool for regularly-delimited data like this is 'split'. In your case, you'd probably want to use a regex to get rid of the content you don't want (i.e., the parenthesized bits) and then use 'split', e.g.:

$line =~ s/\([^)]+\)//g;
my @results = split /,/, $line;

print "$_: $results[$_]\n" for 0 .. $#results;
[download]

Regexes are usually used for data that's more of a challenge (i.e., does not follow any regular pattern.) Having said that, and since you've mentioned that you're doing this as a learning experience, here are a couple of suggestions:

Unless you have a specific reason for doing so, try to avoid using the '*' quantifier in captures (parentheses): it's likely to mislead you, either by matching nothing or by matching too much, so that the remaining captures end up empty or undefined.

A useful technique for capturing data followed by some delimiter is to capture a string of what I call "inverted delimiters":

$string = "abc,def;ghi";
$string =~ /^([^,]+),([^;]+);(.+)$/;
[download]

I used that technique in the first snippet, to say "replace all '('s followed by any number of non-')'s, followed by a ')'".

Last of all, you need to have a capture (parenthesis set in your regex) for every variable you expect to create. This is, of course, part of the pain of using a regex for a long, complicated line - and one of the reasons to try to automate the whole thing. You have four captures, and therefore, only four variables.

Here's another technique that you may find useful for future reference: you can build a regex out of "pieces" each of which represents a field. The "work" part of this technique is in constructing one or more definitions of what a field is.

# Capture a 'non-comma/non-open-paren' string, optionally
# followed by parens (not captured), optionally followed by a comma
my $s = '([^,(]+)(?:\([^)]+\))?,?';
# Regex consists of 11 of these
my $re = $s x 11;

my @out = $line =~ /^$re$/;

print "$_: $out[$_]\n" for 0 .. $#out;
[download]

This is not, as you've probably guessed by now, an uncommon problem. :)

--
"Language shapes the way we think, and determines what we can think about."
-- B. L. Whorf

In reply to Re: Regular Expression, Catching Variables by oko1
in thread Regular Expression, Catching Variables by lev

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.