in reply to Regex word boundries
You're trying to match /\bmy term 3\b/ against each "word" (groups of non-space characters) of line.
Even if the line contains my term 3,
'my' =~ /\bmy term 3\b/ will be false,
'term' =~ /\bmy term 3\b/ will be false, and
'3' =~ /\bmy term 3\b/ will be false.
Just eliminate the foreach my $word (@word_array) loop, leaving its body in place.
Other problems:
$word_count isn't reset to 0 for each term like it should.
You don't check if the two arguments were provided.
Some style tips for better readability:
@array_1 and @array_2 are meaningless names.
$pathway_abstracts and $terms are not much better. There's not even a hint that these are mearly the names of the file that contain the information instead of the information itself.
for (my $j = 0; $j < @array_2; $j++)
is less readable than
for my $j (0 .. $#array_2)
In this case, you don't even need $j, so I'd recommend
for my $term (@array_2)
Same for the $i loop.
for my $line (@array_1)
The second chomp does nothing. The first chomp already did the deed. Remember, $key in for my $key (@array) is a alias to the array element. Any change to $key will affect the array element to which it is linked.
The placement of chomp is odd. I'd move it to where the array is read in. Change
my @array_2 = <IN2>;
to
chomp( my @array_2 = <IN2> );
Why have two loops going iterating over @array_2? Move
$term_score{$key} = 0;
into the second loop.
$x = $x + $y;
can be written much more simply as
$x += $y;
scalar isn't needed when already in scalar context.
Simplified solution:
#!/usr/bin/perl -w use strict; use warnings; use Getopt::Long qw( GetOptions ); # Get user input. my $abstracts_file; my $terms_file; GetOptions( "pathway_abstracts=s" => \$abstracts_file, "terms_file=s" => \$terms_file, ); # ...Needs error checking here... # Load pathway abstract. my $file; { open(my $fh, '<', $abstracts_file) or die("Unable to open abstracts file \"$abstracts_file\": $!\n"); local $/; $file = <$fh>; } # Load terms into array. my @terms; { open(my $fh, '<', $terms_file) or die("Unable to open terms file \"$terms_file\": $!\n"); chomp( @terms = <$fh> ); } print("Term\t| "); print("Number\t| "); print("Frequency\n"); print("----------------------------------------------------\n"); # Find out how many words are in abstracts. my $word_count = () = split(' ', $file); for my $term (@terms) { # Count the number of times the search term matches. my $score = () = $file =~ /\b\Q$term\E\b/g; my $freq = $score / $word_count; print("$term\t$score\t$freq\n"); }
Update: Added more tips.
Update: Added solution.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex word boundries
by MonkPaul (Friar) on Oct 19, 2007 at 12:40 UTC | |
by ikegami (Patriarch) on Oct 19, 2007 at 13:25 UTC | |
by MonkPaul (Friar) on Oct 29, 2007 at 15:24 UTC | |
by ikegami (Patriarch) on Oct 29, 2007 at 15:55 UTC |