comment on

I have taken a different approach to the hash based one of cdarke and have used regular expression matching instead. The regular expression is an alternation of the words found in $col1 and doing a global match against $col2 will find all matches. We are not interested in the text of the matches, just the number which is what the my $matches = () = ... construct achieves.

Note that I'm only reading your data from a HEREDOC and writing to a variable just to keep everything inside the script on my system. Just substitute normal files if you use some of this code.

use strict;
use warnings;

open my $inFH, q{<}, \ <<EOF or die qq{open: << HEREDOC: $!\n};
SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST.           VIIRAOSA (PK-1
+)
SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN  VALME HOL0125624 TP 9501    E-S
+S
O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56
O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI
KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT          SA
+NDVIK
VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E
VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ  STANLEY 0-11-9
+21    (5KPL/PAK)  PITT. 49
KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25
EOF

my $outFile;
open my $outFH, q{>}, \ $outFile
   or die qq{open: > \ $outFile: $!\n};

while( <$inFH> )
{
    chomp;
    my( $col1, $col2 ) = map uc, split m{;};
    my $rxCol1 = do
       {
           local $" = q{|};
           qr{@{ [ map quotemeta, split m{\s+}, $col1 ] }}
       };
    my @col1Words    = split m{\s+}, $col1;
    my $matches = () = $col2 =~ m{$rxCol1}g;
    print $outFH
       join( q{;}, $col1, $col2, scalar @col1Words, $matches ),
       qq{\n};
}

close $inFH or die qq{close: << HEREDOC: $!\n};
close $outFH or die qq{close: > \ $outFile: $!\n};

print $outFile;
[download]

The output.

SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST.           VIIRAOSA (PK-1
+);3;2
SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN  VALME HOL0125624 TP 9501    E-S
+S;5;3
O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56;3;0
O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI;3;0
KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT          SA
+NDVIK;4;3
VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E;3;2
VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ  STANLEY 0-11-9
+21    (5KPL/PAK)  PITT. 49;5;2
KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25;4;1
[download]

I hope this is helpful.

Cheers,

JohnGG

Update: I just noticed the bit about pipes in your data so added a quotemeta in the regex.

In reply to Re^3: comparing columns and printing a result by johngg
in thread comparing columns and printing a result by slartsa

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.