in reply to Searching Array To Hold RegEx Stack Is Order Dependant

The other way is to stitch the regexes together, so that you run a single regex per line, and then sort out which field(s) you've found when you get a match. Along the lines of:

use strict ; use warnings ; my %KLARF_REGEXP = ( '1.X' => { 'LOTID' => qr/LotID "(.+)";/i, 'DEVICEID' => qr/DeviceID "(\w+)";/i, 'STEPID' => qr/StepID "(.+)";/i, 'SLOTID' => qr/Slot (\d+);/i, 'DEFECTS' => qr/DefectList/i, 'RESULT_T' => qr/ResultTimestamp (.+);/i, 'WAFERID' => qr/WaferID "(.+)";/i, 'SETUPID' => qr/SetupID (.+);/i, 'OMARK' => qr/OrientationMarkLocation (.+);/i, 'DIEPITCH' => qr/DiePitch (.+);/i, 'CENTER' => qr/SampleCenterLocation (.+);/i, }, '1.8' => { 'LOTID' => qr/LotRecord "(.+)"/i, 'DEVICEID' => qr/DeviceID 1 \{"(\w+)"\}/i, 'STEPID' => qr/StepID 1 \{"(.+)"\}/i, 'SLOTID' => qr/Field SlotNumber 1 \{(\d+)\}/i, 'DEFECTS' => qr/DefectList/i, 'WAFERID' => qr/WaferRecord "(.+)"/i, 'RESULT_T' => qr/Field ResultTimestamp \d \{(.+)\ +}/i, 'SETUPID' => qr/Field RecipeID 3 \{(.+)\}/i, 'OMARK' => qr/Field OrientationMarkLocation 1 +\{(.+)\}/i, 'DIEPITCH' => qr/Field DiePitch \d \{(.+)\}/i, 'CENTER' => qr/Field SampleCenterLocation \d \{ +(.+)\}/i, }, ); my $vers = '1.8' ; my $r_items = $KLARF_REGEXP{$vers} ; my $r = join('|', values(%$r_items)) ; my $rx = qr/($r)/ ; my %result = () ; while (my $line = <DATA>) { ITEM: while ($line =~ m/$rx/g) { my $value = $1 ; $value =~ s/\s*\z// ; foreach my $id (keys %$r_items) { if ($value =~ m/$r_items->{$id}/) { $result{$id} = $1 ; # Worry about multiple values ? next ITEM ; } ; } ; die "Found '$value', but no match ...??" ; } ; } ; foreach my $id (keys %result) { print "$id = '", $result{$id} || '', "'\n" ; } ; __DATA__ gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg LotRecord "Don't Look Back" gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg DeviceID 1 {"The Device"} gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg StepID 1 {"Staircase"} Field SlotNumber 1 {497562} DefectList gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg WaferRecord "Sandwich" gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg Field ResultTimestamp 7 {25-Oct-1952} Field RecipeID 3 {First catch your rabbit} Field OrientationMarkLocation 1 {This way up} gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg gjhfgljhgwdchgdjlwhgjhdgsljh sg hjhg sljhg sljhg ljhjlhg sljg Field DiePitch 3 {Beetle} Field SampleCenterLocation 4 {Gravitas} ghcgjhcgjhghjgd sljh dsg ;kj sh;kj shkj dhk sjdh

I can think of ways to improve the inner loop, which is iterating over the item regexes -- but I wouldn't worry about that unless this is still not fast enough.


PS: incidentally, I'd look at the regexes and see if:

The very rough code above makes a mess of dealing with the Field DiePitch line, because the regex gets greedy !