in reply to Word density

Here are two basic regexps. One that's inclusive:

$word = qr/ [[:alpha:]] # Start with a letter. (?: [:^space:]* # Hyphens, apostrophes, etc [[:alpha:]] # Don't end on a punctuation mark. )? # Catch single letter words. /x;

One that's restrictive:

$word = qr/ [[:alpha:]] # Start with a letter. (?: [[:alpha:]'-]+ # Allowed characters. [:alpha:] # Don't end on a punctuation mark. )? # Catch single letter words. /x;

Here's how you use them:

my $last1; my $last2; while ($content =~ /($word)/g) { my $word = $1; ++$hash{ $word }; ++$hash{ "$last1 $word"} if defined $last1; ++$hash{"$last2 $last1 $word"} if defined $last2; $last2 = $last1; $last1 = $word; }

Update: Instead of just returning the data, I've updated my code to actually process it.

Replies are listed 'Best First'.
Re^2: Word density
by Anonymous Monk on Mar 19, 2006 at 19:20 UTC
    For your second sample code, is there an error somewhere? When I run it, my arrays are returned as ARRAY(0x18740bc), etc.

    Your second code is much easier to understand, thank you!

      No, that's correct. What was the the second code (shown below) returned pairs of words as an array. You might want to take a peek at Data::Dumper to easily display structures. In any case, I've since modified my post such that the code does what you want.

      my @words; while ($content =~ /($word)/g) { push(@words, $1); } my @words_bi; my @words_tri; foreach (0..$#words) { next if $_ < 1; push(@words_bi, [ @words[$_-1 .. $_] ] ); next if $_ < 2; push(@words_tri, [ @words[$_-2 .. $_] ] ); }
        The code I am running is
        #!/usr/bin/perl use warnings; use strict; my $content = qq(Three blind mice. Three blind mice. See how they ru +n. See how they run. The butcher's wife came after them with a knif +e, three blind mice.); my @words = split(/\s+/, $content); my @words_bi; foreach (@words) { print "$_\n\n"; } foreach (0..$#words) { next if $_ < 1; push(@words_bi, [ @words[$_-1 .. $_] ] ); } foreach (@words_bi) { print "$_\n\n"; }
        Which is resulting in
        C:\Documents and Settings\admin\Desktop>perl words.pl Three blind mice. Three blind mice. See how they run. See how they run. The butcher's wife came after them with a knife, three blind mice. ARRAY(0x225a20) ARRAY(0x1853c64) ARRAY(0x184780c) ARRAY(0x18477dc) ARRAY(0x18477ac) ARRAY(0x184777c) ARRAY(0x18475c0) ARRAY(0x1847590) ARRAY(0x1847560) ARRAY(0x1847530) ARRAY(0x1847500) ARRAY(0x18474d0) ARRAY(0x18474a0) ARRAY(0x1847470) ARRAY(0x1847440) ARRAY(0x1847410) ARRAY(0x18473e0) ARRAY(0x18473b0) ARRAY(0x1847380) ARRAY(0x1847350) ARRAY(0x1847320) ARRAY(0x18472f0) ARRAY(0x18472c0) ARRAY(0x1847290) ARRAY(0x184720c) C:\Documents and Settings\admin\Desktop>
        Thanks!