Many of us have written (and posted) code to build a "prefix" hash (a.k.a Trie indexes), but I wanted a quick command line to do this for word tokens over a set of short phrases. Allowing that Data::Dumper output would suffice, the solution was pretty short. (updated to fix the link to CPAN)
perl -MData::Dumper -lne '$i=\%h; for(split){$$i{$_}{N}++; $i=$$i{$_}} END{print Dumper(\%h)}'

Replies are listed 'Best First'.
Re: One-liner to build a Trie
by graff (Chancellor) on Oct 30, 2007 at 06:00 UTC
    To elaborate a bit: the task was to look for frequent patterns in headlines of news articles, in hopes of identifying repetitive daily (hourly, weekly) reports that would tend to have formulaic content -- i.e. to isolate groups of headlines like:
    TOP STORIES FOR WEDNESDAY, JUNE 14 TOP STORIES FOR TUESDAY, JULY 27 ... STOCK INDEXES, NYSE 11:00 GMT STOCK INDEXES, NASDAQ 13:00 GMT ... BASEBALL: FINAL SCORES ... ...
    Those are made-up examples, but you get the idea: find bunches of headlines in a multi-year archive that start with the same words. Of course, the same code (with a different split regex) would create the more typical character-based trie structure.