One-liner to build a Trie

Many of us have written (and posted) code to build a "prefix" hash (a.k.a Trie indexes), but I wanted a quick command line to do this for word tokens over a set of short phrases. Allowing that Data::Dumper output would suffice, the solution was pretty short. (updated to fix the link to CPAN)

perl -MData::Dumper -lne '$i=\%h; for(split){$$i{$_}{N}++; $i=$$i{$_}}
 END{print Dumper(\%h)}'
[download]

Comment on One-liner to build a Trie Download Code

Replies are listed 'Best First'.
Re: One-liner to build a Trie by graff (Chancellor) on Oct 30, 2007 at 06:00 UTC
To elaborate a bit: the task was to look for frequent patterns in headlines of news articles, in hopes of identifying repetitive daily (hourly, weekly) reports that would tend to have formulaic content -- i.e. to isolate groups of headlines like: `TOP STORIES FOR WEDNESDAY, JUNE 14 TOP STORIES FOR TUESDAY, JULY 27 ... STOCK INDEXES, NYSE 11:00 GMT STOCK INDEXES, NASDAQ 13:00 GMT ... BASEBALL: FINAL SCORES ... ...` [download] Those are made-up examples, but you get the idea: find bunches of headlines in a multi-year archive that start with the same words. Of course, the same code (with a different split regex) would create the more typical character-based trie structure.	[reply] [d/l]

Replies are listed 'Best First'.

Re: One-liner to build a Trie
by graff (Chancellor) on Oct 30, 2007 at 06:00 UTC

TOP STORIES FOR WEDNESDAY, JUNE 14
TOP STORIES FOR TUESDAY, JULY 27
...
STOCK INDEXES, NYSE 11:00 GMT
STOCK INDEXES, NASDAQ 13:00 GMT
...
BASEBALL: FINAL SCORES ...
...
[download]

[reply]
[d/l]