Re: Simple RegEx Substring Extraction from a Delimited Text Record

For parsing a delimited text format like the one you've described, use split. Even if the next person who has to maintain your code is a "regex expert", assuming they know perl they'll expect split to be used in cases like this. As for your example, I find the regex example very confusing. It's certainly not suitable for obtaining anything except the second field, and requires that the second field be of a particular form. If you wanted all fields from a regex, you'd need something like:

/([^!]*)!([^!]*)!([^!]*)!([^!]*)!([^!]*)/
[download]

which is surely a lot less readable or maintainable than

split /!/
[download]

If you just want the numbers between INT= and the following !, you might do something more like:

 
my $int = ($rec =~ /!INT=([0-9]+)!/);
[download]

Note that doesn't guarantee that the value will come from the second field, but neither did your example. If you wanted to do that, you might use:

 
my $int = ($rec =~ /^[^!]*!INT=([0-9]+)!/);
[download]

One final note, you should be using print rather than printf in your examples.

Comment on Re: Simple RegEx Substring Extraction from a Delimited Text Record Select or Download Code

Replies are listed 'Best First'.
Re^2: Simple RegEx Substring Extraction from a Delimited Text Record by Anonymous Monk on Mar 15, 2006 at 07:52 UTC
Interestingly, in perl6 I would think it natural to begin with such as: `grammar foo_db { rule key { <[A-Z]> } rule scalar { <[^,]>+ } rule list { [<scalar> ,]+ <scalar> } rule term { <key> = [<list> \| <scalar>] } rule record { [<term> <[\|]>]* <term> } }` [download] -- to exactly specify the records of the database, rather than either loosely accept the data or make it hard to quickly determine how loosely (or, perhaps, wrongly) the data is taken. Although I like to do this in perl5, it's not quite so easy to take a verifying regex and make it only accept certain keys, or fail on invalid values. As it isn't so easy or terse or maintainable, split// is preferred, and ultimately very strenuous testing is preferred. Please don't take that last paragraph as an indictment.	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Simple RegEx Substring Extraction from a Delimited Text Record
by Anonymous Monk on Mar 15, 2006 at 07:52 UTC

Interestingly, in perl6 I would think it natural to begin with such as:

grammar foo_db {
  rule key { <[A-Z]> }
  rule scalar { <[^,]>+ }
  rule list { [<scalar> ,]+ <scalar> }
  rule term { <key> = [<list> | <scalar>] }
  rule record { [<term> <[|]>]* <term> }
}
[download]

-- to exactly specify the records of the database, rather than either loosely accept the data or make it hard to quickly determine how loosely (or, perhaps, wrongly) the data is taken.

Although I like to do this in perl5, it's not quite so easy to take a verifying regex and make it only accept certain keys, or fail on invalid values. As it isn't so easy or terse or maintainable, split// is preferred, and ultimately very strenuous testing is preferred.

Please don't take that last paragraph as an indictment.

[reply]
[d/l]