in reply to Need a Regular Expression that tests for words in different order and captures the values found.
As a latecomer to this thread, ++ to replies above as to syntax, applied to the logic of (some of?) OP's spec.
My problem is with the spec, "find a regular expression that finds 2 or 3 words on the same line (no matter what the order) and captures the values beside the matching words." which also includes this requirement: "to parse the file for lines containing "Fred Flintstone" "Barney Rubble" "Joe Rockhead"...."
#!/usr/bin/perl use strict; use warnings; # 817539 my $pattern = qr/ ^(\d) (?= .* \bfred \s+ (Flintstone) ) (?= .* \bbarney \s+ (Rubble) ) (?= .* \bjoe \s+ (Rockhead) ) /ix; while ( my $line = <DATA> ) { chomp $line; if ( $line =~ /$pattern/ ) { my $lineno = $1; my $company = join '_', $2, $3, $4, 'inc'; print "$lineno $company \n"; } elsif ( $line =~ /(\d).*/) { my $lineno = $1; print "$lineno, |$line|, does not match\n"; } } __DATA__ 1 bar Fred Flintstone Barney Rubble Joe Rockhead Alfred E Neuman 2 Joe Rockhead AE Neuman baz Fred Flintstone (does not contain name2) 3 Barney Rubble bat Fred Flintstone Joe Rockhead AE Neuman 4 Barney Rubble bat Joe Rockhead AE Neuman Fred Flintstone 5 Joe Jones bar Fred Flintstone Barney Rubble Alfred E Neuman(does not + contain Name3) 6 Barney Jones Barney Rubble Joe Rockhead Fred Smith (does not contai +n Name1 OR Name2) 7 Barney Rubble Fred Flintstone Joe Rockhead 8 Joe Rockhead Fred Flintstone Barney Rubble 9 Joe Rockhead Alfred Flintstone Barney Rubble (has Alfred sted Fred) 0 Joe Rockhead Fred Smith Barney Rubble (has Smith sted Flintstone)
Output:
1 Flintstone_Rubble_Rockhead_inc 2, |2 Joe Rockhead AE Neuman baz Fred Flintstone (does not contain nam +e2)|, does not match 3 Flintstone_Rubble_Rockhead_inc 4 Flintstone_Rubble_Rockhead_inc 5, |5 Joe Jones bar Fred Flintstone Barney Rubble Alfred E Neuman(does + not contain Name3)|, does not match 6, |6 Barney Jones Barney Rubble Joe Rockhead Fred Smith (does not co +ntain Name1 OR Name2)|, does not match 7 Flintstone_Rubble_Rockhead_inc 8 Flintstone_Rubble_Rockhead_inc 9, |9 Joe Rockhead Alfred Flintstone Barney Rubble (has Alfred sted Fr +ed)|, does not match 0, |0 Joe Rockhead Fred Smith Barney Rubble (has Smith sted Flintstone +)|, does not match
In other words, is OP's spec, "2 or 3 words on the same line (no matter what the order)" sound and reasonable (esp. in light of the restriction to "Rubble,..." in another part of the question?
I raise the question because it appears that OP is trying to construct company names (say, for example, "Flintstone_Rubble_Rockhead_Inc") without regard to the fact that "Flintstone_Rubble_Rockhead_Inc" is not the same company as "Rubble_Flintstone_Rockhead_Inc".
BTW, Lain78's use of alternation addresses my question about the significance of the order of names but veers off the One_True_Way (True_One_Way ??) in failing to use strict; use warnings;.
Update: Fixed missing i tag in 2nd quote; added "requirement" for clarity -- both in 2nd para.
Update 2: OP's update in Re 4, "parsing settings from a config file" casts the problem very differently (assuming the inconsistent ordering in the config is inconsequential) but highlights a thought for any who consider posing a question with the real need obfuscated in a scenario unrelated to your real purpose: Don't; you may be wasting your time and that of the Monks who try to help.
|
|---|