downer has asked for the wisdom of the Perl Monks concerning the following question:

this is a quick and easy one. I am trying to grab all the words of an html page, in order. of course, rather than use my own heuristics, i'd like to use an established package. here is my code:
#$contents = contents of HTML page my $parsed = $scrubber->scrub($contents); my $splitter = new Lingua::EN::Splitter; my @words = $splitter->words($parsed); foreach my $x (@words) { print "$x\n"; }
the print statement just gives ARRAY(0x1864870). same if i try to print in quotes. what am i doing wrong?

Replies are listed 'Best First'.
Re: Lingua Splitter
by FunkyMonk (Bishop) on Nov 04, 2007 at 22:01 UTC
    This year-old bug report says qq{The "words" and "paragraphs" methods return references but the documentation portrays them as returning lists}, so try
    my @words = @{ $splitter->words($parsed) };

    or

    my $words = $splitter->words($parsed); foreach my $x (@$words) { print "$x\n"; }