Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^14: Addional "year" matching functionality in word matching script

by bms9nmh (Novice)
on Jun 30, 2016 at 15:15 UTC ( [id://1166930]=note: print w/replies, xml ) Need Help??


in reply to Re^13: Addional "year" matching functionality in word matching script
in thread Addional "year" matching functionality in word matching script

Ok, I've deciphered what the initial part of the script does, and I've added some stuff to it which I will post separately once I understand this last bit of the script. I just need some help with the last bit before I try and put everything I've learned together. I've put some comments in the code below about bits I'm confused about. This is the last bit of the script which does the match.
@titlewords = @new; #switch the @new array back to the name @titlewo +rds now that the exceptions are in place my $desired = 5; # Desired matching number of words my $matched = 0; # Why is this set to 0? How does it change dur +ing the comparison foreach my $csv2 (keys %csv2hash) { my $count = 0; #Again why is this set to 0 at this point? I can + see that it's used later and compared to $desired, but how does it i +ncrease in size past 0 during the operation? my $value = $csv2hash{$csv2}; # How does this represent the value +? There doesn't seem to be any code which counts the words here? foreach my $word (@titlewords) { my @matches = ( $value=~/\b$word\b/ig ); my $numIncsv2 = scalar(@matches); @matches = ( $title=~/\b$word\b/ig ); my $numIncsv1 = scalar(@matches); ++$count if $value =~ /\b$word\b/i; if ($count >= $desired || ($numIncsv1 >= $desired && $numI +ncsv2 >= $desired)) { $count = $desired+1; last; } } if ($count >= $desired) { print "$csv2\n"; ++$matched; } } print "$_\n\n" if $matched; } + close CSV1;

Replies are listed 'Best First'.
Re^15: Addional "year" matching functionality in word matching script
by Cow1337killr (Monk) on Jul 01, 2016 at 02:30 UTC

    Here is the program from your original post, unchanged except for numerous print statements.

    #!/usr/bin/perl # match5.pl perl match5.pl Test the entire program. # From http://www.perlmonks.org/?node_id=1166649 use strict; use warnings; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'The program has started.', "\n"; + # This code is for testing. my @csv2 = (); + open CSV2, "<csv2" or die; + @csv2=<CSV2>; + close CSV2; + + my %csv2hash = (); + for (@csv2) { + chomp; + my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title + $csv2hash{$_} = $title; + } + + open CSV1, "<csv1" or die; + while (<CSV1>) { + chomp; + my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title + my %words; + $words{$_}++ for split /\s+/, $title; #/ get words + ## Collect unique words + my @titlewords = keys(%words); + my @new; #add exception words which shouldn +'t be matched foreach my $t (@titlewords){ + push(@new, $t) if $t !~ /^(rare|vol|volume|issue|double|magazi +ne|mag)$/i; } print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '@new: ', join(", ", @new), "\n"; + # This code is for testing. @titlewords = @new; my $desired = 5; + my $matched = 0; + foreach my $csv2 (keys %csv2hash) { print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'xxxxxxxxxxxxxxxxxxxxxxxx At the top of the foreach my $csv2 + (keys %csv2hash) { outer loop xxxxxxxxxxxxxxxxxxxxxxxx', "\n"; # Th +is code is for testing. my $count = 0; + my $value = $csv2hash{$csv2}; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$value: ', $value, "\n"; + # This code is for testing. foreach my $word (@titlewords) { print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'xxxxxxxxxxxxxxxxxxxxxxxx At the top of the foreach +my $word (@titlewords) { inner loop xxxxxxxxxxxxxxxxxxxxxxxx', "\n"; + # This code is for testing. my @matches = ( $value=~/\b$word\b/ig ); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '@matches: ', join(", ", @matches), "\n"; + # This code is for testing. my $numIncsv2 = scalar(@matches); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$numIncsv2: ', $numIncsv2, "\n"; + # This code is for testing. @matches = ( $title=~/\b$word\b/ig ); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '@matches: ', join(", ", @matches), "\n"; + # This code is for testing. my $numIncsv1 = scalar(@matches); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$numIncsv1: ', $numIncsv1, "\n"; + # This code is for testing. ++$count if $value =~ /\b$word\b/i; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$count: ', $count, "\n"; + # This code is for testing. if ($count >= $desired || ($numIncsv1 >= $desired && $numI +ncsv2 >= $desired)) { $count = $desired+1; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$count: ', $count, "\n"; + # This code is for testing. last; + } + } + if ($count >= $desired) { print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print "$csv2\n"; + ++$matched; + } + } print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print "$_\n\n" if $matched; + } + close CSV1; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'The program has ended.', "\n"; + # This code is for testing. __END__

    Here is the input file named csv2.

    12278788, TV & SATELLITE WEEK 11 MAY GILLIAN ANDERSON DOCTOR WHO NOT R +ADIO TIMES , http://www.example.co.uk, 12

    Here is the input file named csv1.

    2523021356, RARE TV RADIO TIMES MAGAZINE DOCTOR WHO THE THREE 3 DOCTOR +S DR JON PERTWEE, http://www.example.co.uk, 12

    Here is the output.

    Feel free to ask further questions.

      edit**
        Actually it is starting to make more sense the longer I look at it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1166930]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-03-29 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found