in reply to Re: comparing columns and printing a result
in thread comparing columns and printing a result

This is an example of the source file:
SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +) SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25
This is supposed to be the goal:
SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +);3;2 SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S;5;2 O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56;3;0 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI;3;0 KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK;4;3 VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E;3;2 VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49;5;2 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25;4;1
Structure is: "word1 word2 word3;bla hword1bla word2h blah;3(words in 1st column);2(words found a match from 2nd column)" Ok I made changes in the code which now looks like this:
#!perl use strict; use warnings; my $readfile = 'blah.csv'; my $writefile = 'bleh.csv'; open my $fh, "<", $readfile; my $row = <$fh>; my $found = 0; my @cols; my (@col1,@col2); my @words = split(/\s+/,@col1); open my $wfh, ">", $writefile or die "yikes: $!"; while (<$fh>) { tr /a-z/A-Z/; chomp; my @cols = split /\;/; push @col1, $cols[0]; push @col2, $cols[1]; my @words = @words + 1; if ( @col2 =~ m/$words[$_](\d+)/ ) { $found++; } print $wfh "$row';'@words';'$found"; $found = 0; @words = 0; }
Now I get report:
Applying pattern match (m//) to @array will act on scalar(@array) at C +:\blah\vertailu2.pl line 27. Argument "LSAIDHA 2FA SFF ;ASD 2FA AASDA" isn't numeric in array eleme +nt at C:\blah\vertailu2.pl line 27, <$fh> Argument "3FASFL FAAL;DAOIADJAD" isn't numeric in array element at C:\ +blah\vertailu2.pl line 27, <$fh> line Argument "ASFD ADD AD7A ALUYAD;ADLIHADBA A DADASFD DADD" isn't numeric + in array element at C:\blah\vertailu2.pl line 27, <$fh> line 4.
I'm guessing I should somehow define the program to handle both numbers and letters. Don't know how to do it though..

Replies are listed 'Best First'.
Re^3: comparing columns and printing a result
by johngg (Canon) on Jan 26, 2009 at 16:16 UTC

    I have taken a different approach to the hash based one of cdarke and have used regular expression matching instead. The regular expression is an alternation of the words found in $col1 and doing a global match against $col2 will find all matches. We are not interested in the text of the matches, just the number which is what the my $matches = () = ... construct achieves.

    Note that I'm only reading your data from a HEREDOC and writing to a variable just to keep everything inside the script on my system. Just substitute normal files if you use some of this code.

    use strict; use warnings; open my $inFH, q{<}, \ <<EOF or die qq{open: << HEREDOC: $!\n}; SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +) SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25 EOF my $outFile; open my $outFH, q{>}, \ $outFile or die qq{open: > \ $outFile: $!\n}; while( <$inFH> ) { chomp; my( $col1, $col2 ) = map uc, split m{;}; my $rxCol1 = do { local $" = q{|}; qr{@{ [ map quotemeta, split m{\s+}, $col1 ] }} }; my @col1Words = split m{\s+}, $col1; my $matches = () = $col2 =~ m{$rxCol1}g; print $outFH join( q{;}, $col1, $col2, scalar @col1Words, $matches ), qq{\n}; } close $inFH or die qq{close: << HEREDOC: $!\n}; close $outFH or die qq{close: > \ $outFile: $!\n}; print $outFile;

    The output.

    SUUTIN STAMM 2/60AST;SUUTIN STAMM 2,0/60 AST. VIIRAOSA (PK-1 +);3;2 SUUTINKÄRKI SPRAY TP 9501 E-SS;SUUTIN VALME HOL0125624 TP 9501 E-S +S;5;3 O-RENGAS 790X12 FPM;UPPOPUMPPU PUMPEX KV56;3;0 O-RENGAS 99X7 FPM;RULLAUSPÄÄ B NORMAALI;3;0 KÄSISAHA SANDV 2600-22-XT L=22IN;KÄSISAHA 22 IN 2600-22-XT SA +NDVIK;4;3 VEITSI STANL 10-010;MATTOVEITSI STANLEY 2-10-099 99E;3;2 VEITSENTERÄ STANL 11-916 L=62MM SUORA;MATTOVEITSENTERÄ STANLEY 0-11-9 +21 (5KPL/PAK) PITT. 49;5;2 KAAVAIN TIZIT 620-20 H10;TASOKAAVAIN SANDVIK 620-25;4;1

    I hope this is helpful.

    Cheers,

    JohnGG

    Update: I just noticed the bit about pipes in your data so added a quotemeta in the regex.