regex problem

Otogi has asked for the wisdom of the Perl Monks concerning the following question:

Line[ ]above\n
(Description[ ]"(.*)"\n)?
Line[ ]below\n
[download]

is there any way to differentiate between the following two target strings:

Line above
Description "xyz"
Line below
 
Line above
Line below
[download]

The problem is the lower string will return blank and the one above also will return blank if "xyz" is not there. Right now I bracket the whole line to know if the line actually exists or not but thats dumb, there must be a better way. Any thoughts? I hope I was clear please ask if you need more clarification on some point.

Thank you.

Comment on regex problem Select or Download Code

Replies are listed 'Best First'.
Re: regex problem by GrandFather (Saint) on Feb 27, 2006 at 22:02 UTC
It's not clear to me what you are trying to achieve, but maybe the following will point you in the right direction: `use strict; use warnings; while (<DATA>) { next if ! (/Line above/ .. /Line below/); next if ! /Description\s+"([^"]*)"/; print "Found $1 in line $.\n"; } __DATA__ Line above Description "xyz" Line below Line above Line below` [download] Prints: `Found xyz in line 2` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: regex problem by diotalevi (Canon) on Feb 27, 2006 at 22:05 UTC
Assuming the pattern matched, $1 will be true if there was a description line. $2 will contain the description. As an optimization, you can get a faster pattern match by not putting your spaces into character classes. That just slows the engine down. ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply]
Re^2: regex problem by QM (Parson) on Feb 27, 2006 at 23:18 UTC
As an optimization, you can get a faster pattern match by not putting your spaces into character classes. That just slows the engine down. I smell premature optimization... You're correct. However, there is not enough information from the OP to determine if that's a good idea. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re^3: regex problem by diotalevi (Canon) on Feb 27, 2006 at 23:20 UTC
Someone recently estimated to me in a BOTE calculation that [ ] is 10x slower. That is, this hamstrings the screamingly fast boyer-moore literal string matching part of the regex engine and compiles `/xxx[ ]xxx/` down to `exact("xxx"), anyof(" "), exact("xxx")` where it was originally `exact( "xxx xxx" ).` You've changed the regex from one for a seven character literal to one that contains two three character literals. That changes how the engine matches and it changes how quickly the BM part of the engine can either discard entirely or find suitable candidates. It also causes more overhead because three ops have to be executed instead of just one. I'm asserting that `[ ]` is pretty to look at but has serious consequences to the regexp's performance. I wouldn't ordinarilly choose the pretty version in that case. I rarely ever use /x though so I don't need to escape my spaces. ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply] [d/l] [select]
Re^4: regex problem by QM (Parson) on Feb 27, 2006 at 23:38 UTC
Re^5: regex problem by diotalevi (Canon) on Feb 27, 2006 at 23:40 UTC
Re: regex problem by ambrus (Abbot) on Feb 28, 2006 at 11:53 UTC
I don't think it's dumb to use a bracket just to see which branch of a regexp succeeded, or at least I do that too. For example, in the glob_to_re sub of my cgrep script, I use empty brackets for this reason. (Update: I've used ampersands as a regexp delimiter in a stupid moment, so look for `m&` if you can't find it.)	[reply] [d/l]