http://qs1969.pair.com?node_id=1166651


in reply to Addional "year" matching functionality in word matching script

Consider extracting all the things that look like a year number, and changing the matching logic to first check for the equivalence of the years and then falling back on the five words equivalence. Something like:

my $year_left = ...; my $year_right = ...; my $have_years = ($year_left and $year_right); my $equal_years = ($year_left == $year_right); my $five_words_result = ...; # you already have this above my $final_result; if( ! $have_years ) { $final_result = $five_words_result; } elsif( $equal_years ) { $final_result = $five_words_result; } elsif( ! $equal_years ) { $final_result = undef; # without matching years, things are never +equal } else { die "This should never happen. left=$year_left, right=$year_right, + have_years=$have_years, equal_years=$equal_years"; };

Maybe it would be good to put this logic, together with the logic for finding five or more matching words, into its own function. Consider passing to that function the title on the left side and the title on the right side, and possibly already the extracted year numbers.

  • Comment on Re: Addional "year" matching functionality in word matching script
  • Download Code

Replies are listed 'Best First'.
Re^2: Addional "year" matching functionality in word matching script
by bms9nmh (Novice) on Jun 27, 2016 at 11:31 UTC
    Thanks for the response, I should mention that I'm a noob to perl and my original code came from copying from other scripts and help from others. The bits in the code that say = ...; is this what I would actually write in the script or do I need to put something else in the space. Sorry, I'm a bit lost! I usually use bash and started using perl recently!

      perlsyn and perlfunc will get you the definition of stuff you don't understand. perldebtut will get you started with the Perl debugger, which can be a really useful tool to learn how stuff works; you can try out commands interactively. Anything else, ask! We were all noobs once.

      But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

      Sorry - yes, the ... parts are logic that I left out.

      For finding the year within a title, you already have similar logic and it's not too hard to write code that finds things looking like a year within another string.

      For determining the five-matching-words logic, you already have that logic in your script, you just need to assign the result of that logic to a variable before doing further checks like available or matching years.

Re^2: Addional "year" matching functionality in word matching script
by bms9nmh (Novice) on Jun 27, 2016 at 14:31 UTC
    edit: sorry just realized you answered this in previous reply

      Yes, I chose $year_left as the name for the variable holding the year number on the left side of the comparison, and $year_right for the right side of the comparison.

      You can look at a string and guess if it contains a four digit year by using the following code for example:

      my $str = 'this is some text 1989 blah'; my $year; $year = $1 if( $str =~ /\b((?:19|20)\d\d)\b/ );

      The regular expression looks at whether the string contains at least one number with four digits that starts with 19 or 20, and sets $year to the first such number. You could put that code into a subroutine as follows to allow for easy reuse:

      sub find_year { my( $str ) = @_; my $year; $year = $1 if( $str =~ /\b((?:19|20)\d\d)\b/ ); return $year }
        Ok, I've been trying to break things down with the info you have provided, intially with a simpler problem so I can work out whats going on. I'm trying to print the year from the title in a file called csv3 (which contains 1989 in the title) using the following code but it isn't printing anything, what am I doing wrong?
        #!/bin/perl open CSV3, "<csv3" or die; while (<CSV3>) { chomp; my ($title) = $_ =~ /^.+?,\s*([^,]+?),/; #/ match the title my %words; $words{$_}++ for split /\s+/, $title; #/ get words ## Collect unique words #+ # my @titlewords = keys(%words); + my @titlewords = keys(%words); #print "$title" } sub find_year { my( $str ) = @_; my $year; $year = $1 if( $str =~ /\b((?:19|20)\d\d)\b/ ); return $year } &find_year ($title);
        I would have thought the   &find_year ($title); would have taken the title and applied the subroutine to this string, picking out the 1989?