in reply to Re^2: comparing columns and printing a result
in thread comparing columns and printing a result
I have taken a different approach to the hash based one of cdarke and have used regular expression matching instead. The regular expression is an alternation of the words found in $col1 and doing a global match against $col2 will find all matches. We are not interested in the text of the matches, just the number which is what the my $matches = () = ... construct achieves.
Note that I'm only reading your data from a HEREDOC and writing to a variable just to keep everything inside the script on my system. Just substitute normal files if you use some of this code.
use strict; use warnings; open my $inFH, q{<}, \ <<EOF or die qq{open: << HEREDOC: $!\n}; SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +) SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25 EOF my $outFile; open my $outFH, q{>}, \ $outFile or die qq{open: > \ $outFile: $!\n}; while( <$inFH> ) { chomp; my( $col1, $col2 ) = map uc, split m{;}; my $rxCol1 = do { local $" = q{|}; qr{@{ [ map quotemeta, split m{\s+}, $col1 ] }} }; my @col1Words = split m{\s+}, $col1; my $matches = () = $col2 =~ m{$rxCol1}g; print $outFH join( q{;}, $col1, $col2, scalar @col1Words, $matches ), qq{\n}; } close $inFH or die qq{close: << HEREDOC: $!\n}; close $outFH or die qq{close: > \ $outFile: $!\n}; print $outFile;
The output.
SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +);3;2 SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S;5;3 O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56;3;0 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI;3;0 KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK;4;3 VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E;3;2 VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49;5;2 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25;4;1
I hope this is helpful.
Cheers,
JohnGG
Update: I just noticed the bit about pipes in your data so added a quotemeta in the regex.
|
|---|