> ..where I would even start.

Hello Speed_Freak, you question is confusing me: too much data, no code at all, no code from your part, no expected results and I do not really well understand this subgroups and the goal..

But since you are asking where to start.. know your data is a good suggestion and and another good quote sounds like: when you know deeply your data, then algorithm is a matter of simply implementation.

So where to start? ordering => array and indexing => hash

I mean that when you are processing your data you split up elements and fill a datastructure that suits your needs. So the basic is a simple loop that consumes lines of data:

use strict; use warnings; while (<DATA>){ chomp; my @ele = split /\s/,$_;

Now that you has @ele you need to coherce it to your logic: so supposing you need to store which ID ( $ele[0] ) has $ele[1] + $ele[2] you can indexing the $ele[1] $ele[2] presence and use it as key of an hash and pushing IDs as values of an anonymous array:

use strict; use warnings; my %res; while (<DATA>){ chomp; my @ele = split /\s/,$_; push @{ $res{"$ele[1] $ele[2]"} }, $ele[0]; } __DATA__ 1 monkey cow hammer nail 2 monkey sheep hammer nail 3 dog cat hammer nail 4 monkey cow hammer nail

this leads you to a datastructure like: ("dog cat", [3], "monkey sheep", [2], "monkey cow", [1, 4])

If you just need to know which ID has monkey you'll loop keys of the hash searching the pattern monkey as in:

foreach my $key (keys %res){ if ($key =~ /monkey/) { print "monkey [occurence in $key] found in IDs:", (join ', ', @{$ +res{$key}}), "\n";

This is my where to start

L*

PS perldsc and (2004)Using Perl for Statistics: Data Processing and Statistical Computing as readmore suggestions.

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

In reply to Re: Would Perl be a good choice for this? by Discipulus
in thread Would Perl be a good choice for this? by Speed_Freak

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.