in reply to Re^10: Addional "year" matching functionality in word matching script
in thread Addional "year" matching functionality in word matching script

%csv2hash is a hash.

The programmer is loading the record into the key of the hash, %csv2hash. The value of the hash is the title.

The test program below demonstrates just that small task for one record. The programmer (i.e., the programmer that wrote this program originally) populates %csv2hash with all of the records in the file named csv2.

#!/usr/bin/perl # csv2hash.pl perl csv2hash.pl A test program. # From http://www.perlmonks.org/?node_id=1166649 use strict; use warnings; my %csv2hash = (); + my $record = q[12278788, TV & SATELLITE WEEK 11 MAY 2013 GILLIAN ANDER +SON DOCTOR WHO NOT RADIO TIMES , http://www.example.co.uk, 12]; $_ = $record; my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title + $csv2hash{$_} = $title; use Data::Dumper; print Dumper(%csv2hash); __END__

Replies are listed 'Best First'.
Re^12: Addional "year" matching functionality in word matching script
by bms9nmh (Novice) on Jun 28, 2016 at 16:47 UTC
    Ok cheers, last question for a while while I try and learn some of this stuff, I'm currently watching some youtube videos on hash function, but what does the {$_} bit do in the  $csv2hash{$_} = $title part?

      See http://perlmaven.com/the-default-variable-of-perl, for example. (There are many other webpages that one can visit to get similar tutorials. This one was at the top of the Google search results.)

      Also, you can Google "perl $_".

      The short answer is it is a special variable in Perl.

      Let us say, it is a special special variable in Perl.

      For example, it can come into play when reading files.

      Some beginners jump through hoops just to avoid using it.

      In the case of your program, the record that is read from the file ends up in $_. I had to mimic that behavior in my test program. I put the data (i.e., the one record) into a variable called $record just because I like the descriptive name $record. I could have named it $milkshake but I didn't. Then I said Oh The program expects this data to be in the special variable $_, so I put $record into $_. Otherwise, the rest of the code is from your program. I just took a section of your code out and made another program and tested it to make sure that it does what I think it does.

        Ok, I've deciphered what the initial part of the script does, and I've added some stuff to it which I will post separately once I understand this last bit of the script. I just need some help with the last bit before I try and put everything I've learned together. I've put some comments in the code below about bits I'm confused about. This is the last bit of the script which does the match.
        @titlewords = @new; #switch the @new array back to the name @titlewo +rds now that the exceptions are in place my $desired = 5; # Desired matching number of words my $matched = 0; # Why is this set to 0? How does it change dur +ing the comparison foreach my $csv2 (keys %csv2hash) { my $count = 0; #Again why is this set to 0 at this point? I can + see that it's used later and compared to $desired, but how does it i +ncrease in size past 0 during the operation? my $value = $csv2hash{$csv2}; # How does this represent the value +? There doesn't seem to be any code which counts the words here? foreach my $word (@titlewords) { my @matches = ( $value=~/\b$word\b/ig ); my $numIncsv2 = scalar(@matches); @matches = ( $title=~/\b$word\b/ig ); my $numIncsv1 = scalar(@matches); ++$count if $value =~ /\b$word\b/i; if ($count >= $desired || ($numIncsv1 >= $desired && $numI +ncsv2 >= $desired)) { $count = $desired+1; last; } } if ($count >= $desired) { print "$csv2\n"; ++$matched; } } print "$_\n\n" if $matched; } + close CSV1;