how to find what's not there with a regex?

samizdat has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: how to find what's not there with a regex? by pbeckingham (Parson) on Aug 24, 2005 at 13:42 UTC
How about this: #! /usr/bin/perl use strict; use warnings; while (<DATA>) { chomp; print "[$_]\n" for /\s([^=]+\s+=\s+'[^']+'\|\S+\s+=(?:\s+[^=]+)+(?:( +?=\s+\S+\s+=)\|$))/msg; } __DATA__ xyz = 'some long exp' xyz = some long exp with no more than one +space separating parts a = b c d = e drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045udistm1' + delm2 = '0 + 0.07udistm2' delm3 = '0 + 0.07udistm3' + delm4 = '0 + 0.07udistm4' delmt = '0 + + 0.07udistmt' delml = '0.16u + 0.43udistml' + delam = '0.32u + 0.86udistam' dele1 = '0 + 0.2 +5udiste1' dele2 = '0 + 0.25udiste2' + delma = '0.16u + 0.6udistma' pmsxt = 'npmsxt + 12.5 +udpmsxt' tih = 0.35u capct = '0.50u + + 0.13uxdcapct' capcti = '0.55u + 0.13uxdcapct' + m1t = '0.41u + 0.05uxdm1t' m1ti = '0.36u + 0.05uxdm1t' + m2t = '0.48u + 0.057udm2t' m3t = '0.4 +8u + 0.057udm3t' m4t = '0.48u + 0.057udm4t' + mtt = '0.48u + 0.057udmtt' qtt = '0.242u + +0.0202udqtt' htt = '0.242u + 0.0202udhtt' m +lt = '2.0u + 0.2udmlt' amt = '4.0u + 0.4udam +t' e1t = '3.0u + 0.5ude1t' e2t = ' +4.0u + 0.5uxde1mat' mat = '4.0u + 0.4udmat' + m1m2t = '0.35u + 0.05udm1m2t' [download] Generates the output: [xyz = 'some long exp'] [xyz = some long exp with no more than one space separating parts] [a = b c] [d = e] [drsubc = agauss(0, 1, 3) ] [delm1 = '0 + 0.045udistm1'] [delm2 = '0 + 0.07udistm2'] [delm3 = '0 + 0.07udistm3'] [delm4 = '0 + 0.07udistm4'] [delmt = '0 + 0.07udistmt'] [delml = '0.16u + 0.43udistml'] [delam = '0.32u + 0.86udistam'] [dele1 = '0 + 0.25udiste1'] [dele2 = '0 + 0.25udiste2'] [delma = '0.16u + 0.6udistma'] [pmsxt = 'npmsxt + 12.5udpmsxt'] [tih = 0.35u ] [capct = '0.50u + 0.13uxdcapct'] [capcti = '0.55u + 0.13uxdcapct'] [m1t = '0.41u + 0.05uxdm1t'] [m1ti = '0.36u + 0.05uxdm1t'] [m2t = '0.48u + 0.057udm2t'] [m3t = '0.48u + 0.057udm3t'] [m4t = '0.48u + 0.057udm4t'] [mtt = '0.48u + 0.057udmtt'] [qtt = '0.242u + 0.0202udqtt'] [htt = '0.242u + 0.0202udhtt'] [mlt = '2.0u + 0.2udmlt'] [amt = '4.0u + 0.4udamt'] [e1t = '3.0u + 0.5ude1t'] [e2t = '4.0u + 0.5uxde1mat'] [mat = '4.0u + 0.4udmat'] [m1m2t = '0.35u + 0.05u*dm1m2t'] [download] pbeckingham - typist, perishable vertebrate.	[reply] [d/l] [select]
Re^2: how to find what's not there with a regex? by samizdat (Vicar) on Aug 24, 2005 at 13:51 UTC
Almost. I think you're on the right track, because your solution's caught all but drsubc and tih correctly. Let me study what you've done, and thanks very much!	[reply]
Re^3: how to find what's not there with a regex? by pbeckingham (Parson) on Aug 24, 2005 at 13:52 UTC
Fixed the drsubc and tih. pbeckingham - typist, perishable vertebrate.	[reply] [d/l]
Re: how to find what's not there with a regex? by Eimi Metamorphoumai (Deacon) on Aug 24, 2005 at 13:29 UTC
Please read How (Not) To Ask A Question. In particular, could you please post some sample data, along with exactly what parts you're trying to extract, and what the criteria are? Reread your question from the point of view of someone who doesn't already know what you want, and I think you'll see that you're leaving out pretty much everything we need to know. Update: Now there's some data, but still no real specification of how your parameters are separated or what's really going on here. It looks like this may do what you want, but if not you'll have to step back for a moment and think about what you're doing. #!/usr/bin/perl -l use strict; use warnings; use Data::Dumper; my %variables; undef $/; $_ = <DATA>; while (s/\s(\w+)\s=\s([^=]+)\s\z//){ $variables{$1} = $2; } print Dumper(\%variables); __DATA__ drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045udistm1' + delm2 = '0 + 0.07udistm2' delm3 = '0 + 0.07udistm3' + delm4 = '0 + 0.07udistm4' delmt = '0 + 0 +.07udistmt' delml = '0.16u + 0.43udistml' delam = '0 +.32u + 0.86udistam' dele1 = '0 + 0.25udiste1' + dele2 = '0 + 0.25udiste2' delma = '0.16u + 0. +6udistma' pmsxt = 'npmsxt + 12.5udpmsxt' ti +h = 0.35u capct = '0.50u + 0.13uxdcapct' + capcti = '0.55u + 0.13uxdcapct' m1t = '0.41u + 0. +05uxdm1t' m1ti = '0.36u + 0.05uxdm1t' m2t = ' +0.48u + 0.057udm2t' m3t = '0.48u + 0.057udm3t' + m4t = '0.48u + 0.057udm4t' mtt = '0.48u ++ 0.057udmtt' qtt = '0.242u + 0.0202udqtt' + htt = '0.242u + 0.0202udhtt' mlt = '2.0u + 0.2udmlt +' amt = '4.0u + 0.4udamt' e1t = + '3.0u + 0.5ude1t' e2t = '4.0u + 0.5uxde1mat' + mat = '4.0u + 0.4udmat' m1m2t = '0.35u + +0.05u*dm1m2t' [download]	[reply] [d/l]
Re: how to find what's not there with a regex? by inman (Curate) on Aug 24, 2005 at 15:43 UTC
Reversing the initial input makes the regex easier. The resulting array needs reversing and every item in the array needs reversing. `my $data = reverse <DATA>; my @answers = map {scalar reverse $_} reverse $data =~ /(.?\s?=\s.*?\w+)/g; print "$_\n" foreach (@answers);` [download]	[reply] [d/l]
Re^2: how to find what's not there with a regex? by samizdat (Vicar) on Aug 24, 2005 at 16:38 UTC
That's brilliant. You're absolutely right, that makes it much simpler!!!	[reply]
Re: how to find what's not there with a regex? by ikegami (Patriarch) on Aug 24, 2005 at 13:50 UTC
This works with your data: `while (<>) { chomp; while ( / (\w+) # Name ($1) \s* # Spaces (optional) = # Equal sign \s* # Spaces (optional) ( ' # Quote [^']* # Non-quotes ' # Quote \| # -or- [^'\s]+ # Non-spaces\|quotes ) /xg ) { my ($name, $expr) = ($1, $2); $expr = substr($expr, 1, -1) if substr($expr, 0, 1) eq "'"; print("var: $name, expr: $expr\n"); } }` [download] Updated to catch unquoted expressions. Output: var: drsubc, expr: agauss(0, <--- Doesn't work :( var: delm1, expr: 0 + 0.045udistm1 var: delm2, expr: 0 + 0.07udistm2 var: delm3, expr: 0 + 0.07udistm3 var: delm4, expr: 0 + 0.07udistm4 var: delmt, expr: 0 + 0.07udistmt var: delml, expr: 0.16u + 0.43udistml var: delam, expr: 0.32u + 0.86udistam var: dele1, expr: 0 + 0.25udiste1 var: dele2, expr: 0 + 0.25udiste2 var: delma, expr: 0.16u + 0.6udistma var: pmsxt, expr: npmsxt + 12.5udpmsxt var: tih, expr: 0.35u <--- Works :) var: capct, expr: 0.50u + 0.13uxdcapct var: capcti, expr: 0.55u + 0.13uxdcapct var: m1t, expr: 0.41u + 0.05uxdm1t var: m1ti, expr: 0.36u + 0.05uxdm1t var: m2t, expr: 0.48u + 0.057udm2t var: m3t, expr: 0.48u + 0.057udm3t var: m4t, expr: 0.48u + 0.057udm4t var: mtt, expr: 0.48u + 0.057udmtt var: qtt, expr: 0.242u + 0.0202udqtt var: htt, expr: 0.242u + 0.0202udhtt var: mlt, expr: 2.0u + 0.2udmlt var: amt, expr: 4.0u + 0.4udamt var: e1t, expr: 3.0u + 0.5ude1t var: e2t, expr: 4.0u + 0.5uxde1mat var: mat, expr: 4.0u + 0.4udmat var: m1m2t, expr: 0.35u + 0.05u*dm1m2t [download]	[reply] [d/l] [select]
Re^2: how to find what's not there with a regex? by samizdat (Vicar) on Aug 24, 2005 at 13:54 UTC
That works with the quoted variant, ikegami, but not the unquoted variant, like the first function. How do I say 'anything including spaces up to the first occurrence of more than one space in a row'?	[reply]
Re^3: how to find what's not there with a regex? by ikegami (Patriarch) on Aug 24, 2005 at 14:40 UTC
What follows is a solution which requires the minimum knowledge of the format. It works with the two special cases. Sorry, I must be tired today. `while (<>) { chomp; while ( / (\w+) # An identifier. \s* = \s* # Equal with opt spaces. ( (?: (?! \s+ \w+ \s* = ) # Stop if we see the next formula. . # A chararacter. )+ ) /xg ) { my ($name, $expr) = ($1, $2); $expr = substr($expr, 1, -1) if substr($expr, 0, 1) eq "'"; print("var: $name, expr: $expr\n"); } }` [download] Output: var: drsubc, expr: agauss(0, 1, 3) <- Works var: delm1, expr: 0 + 0.045udistm1 var: delm2, expr: 0 + 0.07udistm2 var: delm3, expr: 0 + 0.07udistm3 var: delm4, expr: 0 + 0.07udistm4 var: delmt, expr: 0 + 0.07udistmt var: delml, expr: 0.16u + 0.43udistml var: delam, expr: 0.32u + 0.86udistam var: dele1, expr: 0 + 0.25udiste1 var: dele2, expr: 0 + 0.25udiste2 var: delma, expr: 0.16u + 0.6udistma var: pmsxt, expr: npmsxt + 12.5udpmsxt var: tih, expr: 0.35u <- Works var: capct, expr: 0.50u + 0.13uxdcapct var: capcti, expr: 0.55u + 0.13uxdcapct var: m1t, expr: 0.41u + 0.05uxdm1t var: m1ti, expr: 0.36u + 0.05uxdm1t var: m2t, expr: 0.48u + 0.057udm2t var: m3t, expr: 0.48u + 0.057udm3t var: m4t, expr: 0.48u + 0.057udm4t var: mtt, expr: 0.48u + 0.057udmtt var: qtt, expr: 0.242u + 0.0202udqtt var: htt, expr: 0.242u + 0.0202udhtt var: mlt, expr: 2.0u + 0.2udmlt var: amt, expr: 4.0u + 0.4udamt var: e1t, expr: 3.0u + 0.5ude1t var: e2t, expr: 4.0u + 0.5uxde1mat var: mat, expr: 4.0u + 0.4udmat var: m1m2t, expr: 0.35u + 0.05u*dm1m2t [download]	[reply] [d/l] [select]
Re^4: how to find what's not there with a regex? by samizdat (Vicar) on Aug 24, 2005 at 14:58 UTC
Re^5: how to find what's not there with a regex? by ikegami (Patriarch) on Aug 24, 2005 at 15:15 UTC
Re^3: how to find what's not there with a regex? by ysth (Canon) on Aug 24, 2005 at 14:30 UTC
How do I say 'anything including spaces up to the first occurrence of more than one space in a row'? A literal translation (untested) would be `/(?>.*?(?= ))/s`.	[reply] [d/l]
Re^4: how to find what's not there with a regex? by ikegami (Patriarch) on Aug 24, 2005 at 14:54 UTC
Re^2: how to find what's not there with a regex? by pbeckingham (Parson) on Aug 24, 2005 at 13:53 UTC
Sorry - this doesn't handle the non-quoted element. pbeckingham - typist, perishable vertebrate.	[reply]
Re^3: how to find what's not there with a regex? by ikegami (Patriarch) on Aug 24, 2005 at 13:55 UTC
I fixed it while you were replying :)	[reply]
Re: how to find what's not there with a regex? by BrowserUk (Patriarch) on Aug 24, 2005 at 13:57 UTC
Updated: Simplified. #! perl -slw use strict; while( <DATA> ) { print "$1 : ", $2\|\|$3 while m[ (\w+) ## the name \s+=\s+ ## the = (?: ## Either ' ( [^']+ ) ' ## all the non-quotes between quotes \| ## or (.?) ## the minimum ) \s{2,} ## absorb the two or more spaces ]gx; } =results P:\test>junk drsubc : agauss(0, 1, 3) delm1 : 0 + 0.045udistm1 delm2 : 0 + 0.07udistm2 delm3 : 0 + 0.07udistm3 delm4 : 0 + 0.07udistm4 delmt : 0 + 0.07udistmt delml : 0.16u + 0.43udistml delam : 0.32u + 0.86udistam dele1 : 0 + 0.25udiste1 dele2 : 0 + 0.25udiste2 delma : 0.16u + 0.6udistma pmsxt : npmsxt + 12.5udpmsxt tih : 0.35u capct : 0.50u + 0.13uxdcapct capcti : 0.55u + 0.13uxdcapct m1t : 0.41u + 0.05uxdm1t m1ti : 0.36u + 0.05uxdm1t m2t : 0.48u + 0.057udm2t m3t : 0.48u + 0.057udm3t m4t : 0.48u + 0.057udm4t mtt : 0.48u + 0.057udmtt qtt : 0.242u + 0.0202udqtt htt : 0.242u + 0.0202udhtt mlt : 2.0u + 0.2udmlt amt : 4.0u + 0.4udamt e1t : 3.0u + 0.5ude1t e2t : 4.0u + 0.5uxde1mat mat : 4.0u + 0.4udmat m1m2t : 0.35u + 0.05udm1m2t =cut __DATA__ drsubc = agauss(0, 1, 3) delm1 = '0 + 0.045udistm1' + delm2 = '0 + 0.07udistm2' delm3 = '0 + 0.07udistm3' + delm4 = '0 + 0.07udistm4' delmt = '0 + + 0.07udistmt' delml = '0.16u + 0.43udistml' + delam = '0.32u + 0.86udistam' dele1 = '0 + 0.2 +5udiste1' dele2 = '0 + 0.25udiste2' + delma = '0.16u + 0.6udistma' pmsxt = 'npmsxt + 12.5 +udpmsxt' tih = 0.35u capct = '0.50u + + 0.13uxdcapct' capcti = '0.55u + 0.13uxdcapct' + m1t = '0.41u + 0.05uxdm1t' m1ti = '0.36u + 0.05uxdm1t' + m2t = '0.48u + 0.057udm2t' m3t = '0.4 +8u + 0.057udm3t' m4t = '0.48u + 0.057udm4t' + mtt = '0.48u + 0.057udmtt' qtt = '0.242u + +0.0202udqtt' htt = '0.242u + 0.0202udhtt' m +lt = '2.0u + 0.2udmlt' amt = '4.0u + 0.4udam +t' e1t = '3.0u + 0.5ude1t' e2t = ' +4.0u + 0.5uxde1mat' mat = '4.0u + 0.4udmat' + m1m2t = '0.35u + 0.05u*dm1m2t' [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^2: how to find what's not there with a regex? by samizdat (Vicar) on Aug 24, 2005 at 14:16 UTC
dnwrs = agauss('cnr_res/3',1,3) An even loonier case... thanks, all, for the help. I think I'm going to have to go back to the original multiline source and see if these are more identifiable there.	[reply]
Re^3: how to find what's not there with a regex? by BrowserUk (Patriarch) on Aug 24, 2005 at 15:05 UTC
Any other variations? #! perl -slw use strict; while( <DATA> ) { m[(\w+)\s+=\s+'?(.+)'?] and print "$1 : $2" for split /\s{2,}(?=\w+\s+=)/, $_; } __END__ P:\test>junk drsubc : agauss(0, 1, 3) delm1 : 0 + 0.045udistm1' dnwrs : agauss('cnr_res/3',1,3) delm2 : 0 + 0.07udistm2' delm3 : 0 + 0.07udistm3' delm4 : 0 + 0.07udistm4' delmt : 0 + 0.07udistmt' delml : 0.16u + 0.43udistml' delam : 0.32u + 0.86udistam' dele1 : 0 + 0.25udiste1' dele2 : 0 + 0.25udiste2' delma : 0.16u + 0.6udistma' pmsxt : npmsxt + 12.5udpmsxt' tih : 0.35u capct : 0.50u + 0.13uxdcapct' capcti : 0.55u + 0.13uxdcapct' m1t : 0.41u + 0.05uxdm1t' m1ti : 0.36u + 0.05uxdm1t' m2t : 0.48u + 0.057udm2t' m3t : 0.48u + 0.057udm3t' m4t : 0.48u + 0.057udm4t' mtt : 0.48u + 0.057udmtt' qtt : 0.242u + 0.0202udqtt' htt : 0.242u + 0.0202udhtt' mlt : 2.0u + 0.2udmlt' amt : 4.0u + 0.4udamt' e1t : 3.0u + 0.5ude1t' e2t : 4.0u + 0.5uxde1mat' mat : 4.0u + 0.4udmat' m1m2t : 0.35u + 0.05u*dm1m2t' [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^4: how to find what's not there with a regex? by samizdat (Vicar) on Aug 24, 2005 at 15:14 UTC
Re: how to find what's not there with a regex? by davidrw (Prior) on Aug 24, 2005 at 13:53 UTC
Maybe something like this (you could do %matches instead of @matches if desired, as well): `my @matches = $input =~ m/\b(\w+)\s+=( '.*?'\|( \S+)+)/sg;` [download] Match the LHS and then the equals sign, and then either a single-quoted string or a sequence of single_space-word sets.	[reply] [d/l]
Re: how to find what's not there with a regex? by QM (Parson) on Aug 24, 2005 at 15:17 UTC
How to find what's not there with a regex? <facetious_mode> Don't look! </facetious_mode> Sorry, couldn't resist. Must be the lack of sleep ;) -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re^2: how to find what's not there with a regex? by samizdat (Vicar) on Aug 29, 2005 at 16:05 UTC
no, that's how to not find what _is_ there... sorry, couldn't resist!	[reply]


P is for Practical
	PerlMonks