Re: comparing columns and printing a result

I am having problems understanding the expected results. For example, how can you get one hit from the last line?

KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25
[download]

So far as I can see, none of the "words" in the first column appear in the second. But it does depend on what you mean by a "word". Can you please explain the match criteria?

Update: This is what I came up with:

#!/usr/bin/perl 
use strict;
use warnings;  

my $readfile = 'blah.csv';
my $writefile = 'bleh.csv';

# Note that you were opening $writefile for READ
open my $fh, "<", $readfile or die "Unable to open $readfile: $!";
open my $wfh, ">", $writefile or die "Unable to open $writefile: $!";

foreach (<$fh>) { 
    
    $_ = uc $_;
    chomp;
    
    my ($col1, $col2) = split /;/;
    my @col1_words = split /\s+/, $col1;
    my @col2_words = split /\s+/, $col2;
    
    my %hash;
    @hash{@col1_words} = undef;
    
    my $found = 0;
    for my $word (@col2_words) {
        $found++ if exists $hash{$word}
    }
    
    print $wfh "$_;".@col1_words.";$found\n";
    
}

close ($fh);
close ($wfh);
[download]

Which produces:

SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST.           VIIRAOSA (PK-1
+);3;2
SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN  VALME HOL0125624 TP 9501    E-S
+S;5;3
O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56;3;0
O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI;3;0
KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT          SA
+NDVIK;4;2
VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E;3;0
VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ  STANLEY 0-11-9
+21    (5KPL/PAK)  PITT. 49;5;0
KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25;4;0
[download]

Further update corrected typo.

Comment on Re: comparing columns and printing a result Select or Download Code

Replies are listed 'Best First'.
Re^2: comparing columns and printing a result by slartsa (Initiate) on Jan 26, 2009 at 14:15 UTC
I'm sorry, I didn't explain this well. The word is to be searched within the other column as string, not as exact word. So here `KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25` [download] you can see "KAAVAIN" is part of the word "TASOKAAVAIN" so it would count as an occurance. I tested your code and it appears to find occurances only if exact word is found.	[reply] [d/l]
Re^3: comparing columns and printing a result by cdarke (Prior) on Jan 26, 2009 at 14:27 UTC
OK, except I don't see 2 hits on the second line, I see 3: `SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S` [download] TP, 9501, and E-SS. This is my version 2: #!/usr/bin/perl use strict; use warnings; my $readfile = 'blah.csv'; my $writefile = 'bleh.csv'; open my $fh, "<", $readfile or die "Unable to open $readfile: $!"; open my $wfh, ">", $writefile or die "Unable to open $writefile: $!"; foreach (<$fh>) { $_ = uc $_; chomp; my ($col1, $col2) = split /;/; my @col1_words = split /\s+/, $col1; my @col2_words = split /\s+/, $col2; my $found = 0; my $pattern = join ('\|', @col1_words); for my $col2_word (@col2_words) { $found++ if $col2_word =~ /$pattern/; } print $wfh "$_;".@col1_words.";$found\n"; } close ($fh); close ($wfh); [download]	[reply] [d/l] [select]
Re^4: comparing columns and printing a result by slartsa (Initiate) on Jan 26, 2009 at 14:42 UTC
Yeah there was three of them, my mistake, I'm sorry! Oh noes, I have pipes within my list and so running your program gives me an error, but I can always substitute them. Thank you very much! I didn't really ask for a complete program but apparently my code was so screwed up that it was the easiest approach... I need to give a look at your code a bit further so I might understand it some day.. Thank you!	[reply]
Re^5: comparing columns and printing a result by cdarke (Prior) on Jan 26, 2009 at 15:57 UTC
Re^4: comparing columns and printing a result by slartsa (Initiate) on Jan 27, 2009 at 07:25 UTC
Oops, problem, earlier I said there would be pipes within the text but the program also seems to have problems when there is unclosed brackets or "+" found.. and they occur rather often :( I understand your code now but I don't know how I could fix this issue. I could easily just replace those characters but I need to keep the original file as is.	[reply]