in reply to Re^11: Addional "year" matching functionality in word matching script
in thread Addional "year" matching functionality in word matching script

Ok cheers, last question for a while while I try and learn some of this stuff, I'm currently watching some youtube videos on hash function, but what does the {$_} bit do in the  $csv2hash{$_} = $title part?
  • Comment on Re^12: Addional "year" matching functionality in word matching script
  • Download Code

Replies are listed 'Best First'.
Re^13: Addional "year" matching functionality in word matching script
by Cow1337killr (Monk) on Jun 28, 2016 at 20:36 UTC

    See http://perlmaven.com/the-default-variable-of-perl, for example. (There are many other webpages that one can visit to get similar tutorials. This one was at the top of the Google search results.)

    Also, you can Google "perl $_".

    The short answer is it is a special variable in Perl.

    Let us say, it is a special special variable in Perl.

    For example, it can come into play when reading files.

    Some beginners jump through hoops just to avoid using it.

    In the case of your program, the record that is read from the file ends up in $_. I had to mimic that behavior in my test program. I put the data (i.e., the one record) into a variable called $record just because I like the descriptive name $record. I could have named it $milkshake but I didn't. Then I said Oh The program expects this data to be in the special variable $_, so I put $record into $_. Otherwise, the rest of the code is from your program. I just took a section of your code out and made another program and tested it to make sure that it does what I think it does.

      Ok, I've deciphered what the initial part of the script does, and I've added some stuff to it which I will post separately once I understand this last bit of the script. I just need some help with the last bit before I try and put everything I've learned together. I've put some comments in the code below about bits I'm confused about. This is the last bit of the script which does the match.
      @titlewords = @new; #switch the @new array back to the name @titlewo +rds now that the exceptions are in place my $desired = 5; # Desired matching number of words my $matched = 0; # Why is this set to 0? How does it change dur +ing the comparison foreach my $csv2 (keys %csv2hash) { my $count = 0; #Again why is this set to 0 at this point? I can + see that it's used later and compared to $desired, but how does it i +ncrease in size past 0 during the operation? my $value = $csv2hash{$csv2}; # How does this represent the value +? There doesn't seem to be any code which counts the words here? foreach my $word (@titlewords) { my @matches = ( $value=~/\b$word\b/ig ); my $numIncsv2 = scalar(@matches); @matches = ( $title=~/\b$word\b/ig ); my $numIncsv1 = scalar(@matches); ++$count if $value =~ /\b$word\b/i; if ($count >= $desired || ($numIncsv1 >= $desired && $numI +ncsv2 >= $desired)) { $count = $desired+1; last; } } if ($count >= $desired) { print "$csv2\n"; ++$matched; } } print "$_\n\n" if $matched; } + close CSV1;

        Here is the program from your original post, unchanged except for numerous print statements.

        #!/usr/bin/perl # match5.pl perl match5.pl Test the entire program. # From http://www.perlmonks.org/?node_id=1166649 use strict; use warnings; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'The program has started.', "\n"; + # This code is for testing. my @csv2 = (); + open CSV2, "<csv2" or die; + @csv2=<CSV2>; + close CSV2; + + my %csv2hash = (); + for (@csv2) { + chomp; + my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title + $csv2hash{$_} = $title; + } + + open CSV1, "<csv1" or die; + while (<CSV1>) { + chomp; + my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title + my %words; + $words{$_}++ for split /\s+/, $title; #/ get words + ## Collect unique words + my @titlewords = keys(%words); + my @new; #add exception words which shouldn +'t be matched foreach my $t (@titlewords){ + push(@new, $t) if $t !~ /^(rare|vol|volume|issue|double|magazi +ne|mag)$/i; } print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '@new: ', join(", ", @new), "\n"; + # This code is for testing. @titlewords = @new; my $desired = 5; + my $matched = 0; + foreach my $csv2 (keys %csv2hash) { print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'xxxxxxxxxxxxxxxxxxxxxxxx At the top of the foreach my $csv2 + (keys %csv2hash) { outer loop xxxxxxxxxxxxxxxxxxxxxxxx', "\n"; # Th +is code is for testing. my $count = 0; + my $value = $csv2hash{$csv2}; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$value: ', $value, "\n"; + # This code is for testing. foreach my $word (@titlewords) { print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'xxxxxxxxxxxxxxxxxxxxxxxx At the top of the foreach +my $word (@titlewords) { inner loop xxxxxxxxxxxxxxxxxxxxxxxx', "\n"; + # This code is for testing. my @matches = ( $value=~/\b$word\b/ig ); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '@matches: ', join(", ", @matches), "\n"; + # This code is for testing. my $numIncsv2 = scalar(@matches); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$numIncsv2: ', $numIncsv2, "\n"; + # This code is for testing. @matches = ( $title=~/\b$word\b/ig ); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '@matches: ', join(", ", @matches), "\n"; + # This code is for testing. my $numIncsv1 = scalar(@matches); print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$numIncsv1: ', $numIncsv1, "\n"; + # This code is for testing. ++$count if $value =~ /\b$word\b/i; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$count: ', $count, "\n"; + # This code is for testing. if ($count >= $desired || ($numIncsv1 >= $desired && $numI +ncsv2 >= $desired)) { $count = $desired+1; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print '$count: ', $count, "\n"; + # This code is for testing. last; + } + } + if ($count >= $desired) { print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print "$csv2\n"; + ++$matched; + } + } print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print "$_\n\n" if $matched; + } + close CSV1; print "File: ", __FILE__, " Line: ", __LINE__, "\n"; + # This code is for testing. print 'The program has ended.', "\n"; + # This code is for testing. __END__

        Here is the input file named csv2.

        12278788, TV & SATELLITE WEEK 11 MAY GILLIAN ANDERSON DOCTOR WHO NOT R +ADIO TIMES , http://www.example.co.uk, 12

        Here is the input file named csv1.

        2523021356, RARE TV RADIO TIMES MAGAZINE DOCTOR WHO THE THREE 3 DOCTOR +S DR JON PERTWEE, http://www.example.co.uk, 12

        Here is the output.

        Feel free to ask further questions.