I have taken a different approach to the hash based one of cdarke and have used regular expression matching instead. The regular expression is an alternation of the words found in $col1 and doing a global match against $col2 will find all matches. We are not interested in the text of the matches, just the number which is what the my $matches = () = ... construct achieves.

Note that I'm only reading your data from a HEREDOC and writing to a variable just to keep everything inside the script on my system. Just substitute normal files if you use some of this code.

use strict; use warnings; open my $inFH, q{<}, \ <<EOF or die qq{open: << HEREDOC: $!\n}; SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +) SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25 EOF my $outFile; open my $outFH, q{>}, \ $outFile or die qq{open: > \ $outFile: $!\n}; while( <$inFH> ) { chomp; my( $col1, $col2 ) = map uc, split m{;}; my $rxCol1 = do { local $" = q{|}; qr{@{ [ map quotemeta, split m{\s+}, $col1 ] }} }; my @col1Words = split m{\s+}, $col1; my $matches = () = $col2 =~ m{$rxCol1}g; print $outFH join( q{;}, $col1, $col2, scalar @col1Words, $matches ), qq{\n}; } close $inFH or die qq{close: << HEREDOC: $!\n}; close $outFH or die qq{close: > \ $outFile: $!\n}; print $outFile;

The output.

SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +);3;2 SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S;5;3 O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56;3;0 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI;3;0 KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK;4;3 VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E;3;2 VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49;5;2 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25;4;1

I hope this is helpful.

Cheers,

JohnGG

Update: I just noticed the bit about pipes in your data so added a quotemeta in the regex.


In reply to Re^3: comparing columns and printing a result by johngg
in thread comparing columns and printing a result by slartsa

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.