I'd be inclined to use a hash containing pairs of counts and following word hashes. The following works for generating data sets of arbitary orders:

use strict; use warnings; use Data::Dump::Streamer; use constant ORDER => 3; my %chains; while (<DATA>) { my @words = split; my @chain; for (@words) { push @chain, $_; next if @chain < ORDER; shift @chain if @chain > ORDER; my $root; for my $index (0 .. ORDER - 1) { if (defined $root) { ++$root->[1]{$chain[$index]}[0]; $root = $root->[1]{$chain[$index]}; } else { ++$chains{$chain[$index]}[0]; $root = $chains{$chain[$index]}; } } } } Dump (\%chains); __DATA__ I will not eat them in a bar. I will not eat them in a car. I will not eat them Sam I am. I will not eat green eggs and ham!

Prints:

$HASH1 = { eat => [ 4, { green => [ 1, { eggs => [ 1 ] } ], them => [ 3, { in => [ 2 ], Sam => [ 1 ] } ] } ], eggs => [ 1, { and => [ 1, { "ham!" => [ 1 ] } ] } ], green => [ 1, { eggs => [ 1, { and => [ 1 ] } ] } ], I => [ 4, { will => [ 4, { not => [ 4 ] } ] } ], in => [ 2, { a => [ 2, { "bar." => [ 1 ], "car." => [ 1 ] } ] } ], not => [ 4, { eat => [ 4, { green => [ 1 ], them => [ 3 ] } ] } ], Sam => [ 1, { I => [ 1, { "am." => [ 1 ] } ] } ], them => [ 3, { in => [ 2, { a => [ 2 ] } ], Sam => [ 1, { I => [ 1 ] } ] } ], will => [ 4, { not => [ 4, { eat => [ 4 ] } ] } ] };

DWIM is Perl's answer to Gödel

In reply to Re: Trying to select the best data structure by GrandFather
in thread Trying to select the best data structure by chinamox

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.