in reply to Re^2: comparing columns and printing a result
in thread comparing columns and printing a result

OK, except I don't see 2 hits on the second line, I see 3:
SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S
TP, 9501, and E-SS.

This is my version 2:
#!/usr/bin/perl use strict; use warnings; my $readfile = 'blah.csv'; my $writefile = 'bleh.csv'; open my $fh, "<", $readfile or die "Unable to open $readfile: $!"; open my $wfh, ">", $writefile or die "Unable to open $writefile: $!"; foreach (<$fh>) { $_ = uc $_; chomp; my ($col1, $col2) = split /;/; my @col1_words = split /\s+/, $col1; my @col2_words = split /\s+/, $col2; my $found = 0; my $pattern = join ('|', @col1_words); for my $col2_word (@col2_words) { $found++ if $col2_word =~ /$pattern/; } print $wfh "$_;".@col1_words.";$found\n"; } close ($fh); close ($wfh);

Replies are listed 'Best First'.
Re^4: comparing columns and printing a result
by slartsa (Initiate) on Jan 26, 2009 at 14:42 UTC
    Yeah there was three of them, my mistake, I'm sorry! Oh noes, I have pipes within my list and so running your program gives me an error, but I can always substitute them. Thank you very much! I didn't really ask for a complete program but apparently my code was so screwed up that it was the easiest approach... I need to give a look at your code a bit further so I might understand it some day.. Thank you!
      The 'pipes' are part of the regular expression OR syntax. Using join in this way is a common trick to generate a list of alternatives: word|word|word|word. It should not be over-used because a long list can be slow: I assumed that that the number of alternatives was similar to your examples.
      That you have this character in your data should not affect this directly. However, any special RE character from the data that finds itself inside the RE will have to be 'escaped' (there are several ways of doing that, including \Q, quotemeta and qr).
Re^4: comparing columns and printing a result
by slartsa (Initiate) on Jan 27, 2009 at 07:25 UTC
    Oops, problem, earlier I said there would be pipes within the text but the program also seems to have problems when there is unclosed brackets or "+" found.. and they occur rather often :( I understand your code now but I don't know how I could fix this issue. I could easily just replace those characters but I need to keep the original file as is.