in reply to count number of overlapping words in a document
I'm not quite sure from your code exactly what you are aiming at but I'd approach the task with a hash for the words in the first file. From there I'd construct a regex with capturing alternation of the keys of that hash surrounded by word boundaries to avoid false hits. I'd then slurp the whole of the second file into a single variable and do a global regex match, incrementing the values of the hash when a match was found and captured in $1.
$ perl -Mstrict -Mwarnings -E ' open my $wordsFH, q{<}, \ <<EOF or die $!; cat dog EOF my %words = map { chomp; $_ => 0 } <$wordsFH>; my $rxWords = do { local $" = q{ | }; qr{(?x) \b ( @{ [ keys %words ] } ) \b }; }; say qq{Regex is $rxWords}; open my $textFH, q{<}, \ <<EOF or die $!; The cat scattered doggerel words over the poor dog as it doggedly ignored the catastrophe the cat was causing EOF my $text = do { local $/; <$textFH>; }; $words{ $1 } ++ while $text =~ m{$rxWords}g; say qq{$_ => $words{ $_ }} for sort keys %words;' Regex is (?^u:(?x) \b ( dog | cat ) \b ) cat => 2 dog => 1 $
I hope this is helpful but ask further if I have misunderstood or anything is unclear.
Cheers,
JohnGG
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: count number of overlapping words in a document
by dmarcel (Initiate) on Sep 16, 2014 at 18:08 UTC | |
by johngg (Canon) on Sep 17, 2014 at 11:44 UTC | |
by dmarcel (Initiate) on Sep 18, 2014 at 08:39 UTC |