in reply to Re: comparing columns and printing a result
in thread comparing columns and printing a result

I'm sorry, I didn't explain this well. The word is to be searched within the other column as string, not as exact word. So here
KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25
you can see "KAAVAIN" is part of the word "TASOKAAVAIN" so it would count as an occurance. I tested your code and it appears to find occurances only if exact word is found.

Replies are listed 'Best First'.
Re^3: comparing columns and printing a result
by cdarke (Prior) on Jan 26, 2009 at 14:27 UTC
    OK, except I don't see 2 hits on the second line, I see 3:
    SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S
    TP, 9501, and E-SS.

    This is my version 2:
    #!/usr/bin/perl use strict; use warnings; my $readfile = 'blah.csv'; my $writefile = 'bleh.csv'; open my $fh, "<", $readfile or die "Unable to open $readfile: $!"; open my $wfh, ">", $writefile or die "Unable to open $writefile: $!"; foreach (<$fh>) { $_ = uc $_; chomp; my ($col1, $col2) = split /;/; my @col1_words = split /\s+/, $col1; my @col2_words = split /\s+/, $col2; my $found = 0; my $pattern = join ('|', @col1_words); for my $col2_word (@col2_words) { $found++ if $col2_word =~ /$pattern/; } print $wfh "$_;".@col1_words.";$found\n"; } close ($fh); close ($wfh);
      Yeah there was three of them, my mistake, I'm sorry! Oh noes, I have pipes within my list and so running your program gives me an error, but I can always substitute them. Thank you very much! I didn't really ask for a complete program but apparently my code was so screwed up that it was the easiest approach... I need to give a look at your code a bit further so I might understand it some day.. Thank you!
        The 'pipes' are part of the regular expression OR syntax. Using join in this way is a common trick to generate a list of alternatives: word|word|word|word. It should not be over-used because a long list can be slow: I assumed that that the number of alternatives was similar to your examples.
        That you have this character in your data should not affect this directly. However, any special RE character from the data that finds itself inside the RE will have to be 'escaped' (there are several ways of doing that, including \Q, quotemeta and qr).
      Oops, problem, earlier I said there would be pipes within the text but the program also seems to have problems when there is unclosed brackets or "+" found.. and they occur rather often :( I understand your code now but I don't know how I could fix this issue. I could easily just replace those characters but I need to keep the original file as is.