here's a question for the best of you ...
maybe this is an algorythm question, i'm not sure,
i've got a pattern that may change slightly from one
situation to another, let's call it title

this title can be passed from one person to another or can change slightly and still be the same thing.

i've got to run a match on that title over a period
of time and run a calculation based on a data-element
at that period of time.
I'm using a multi dimentional hash of arrays

here is the code for my that statement which should give a better understanding of the data-structure and what i'm trying to do ...

sub Compress { foreach my $KEYmonth (sort keys %{$_[0]}) { foreach my $KEYcat (sort keys %{$_[0]{$KEYmonth}}) {#cat= category foreach my $KEYsubcat (sort keys %{$_[0]{$KEYmonth}{$KEYcat}}) { $i=0; foreach my $value (@{$_[0]{$KEYmonth}{$KEYcat}{$KEYsubcat}}) { + $OUTPUT{'YTD'}{$KEYcat}{$KEYsubcat}[$i++] += $value; + } } } } }

i do this over a number of months for many different
categories, and sub-categories. Unfortunately, in one (or two)
of the situations the subcategory-keys are variable enough
to prevent a positive match.

the data could be:
 "INITIAL.LASTNAME(x)(LONG NAME TITLE)" one month and:
 "INTL.SOMEOTHERNAME(x)(LONG NAME TITLE)" or:
 "INITIAL.LASTNAME(x)(LONG NAME. TITLE)" the next.

that's a mightly long setup for the question but i'm
trying to make it both specific to my problem and generalizable
to the world (sortof like something you could put into a textbook
example if it was well answered, eh merlyn ? *jk* ;)

the question to this is then, how would one go about designing a
matching process that could calculate a confidence interval for NEAR-MATCHES
and in the event of finding this close match use it (and its key) for the
calculation.

this is intended to be a fun interesting exercise, i've tried to
write a NEAR-MATCH function before (and failed miserably).
I'm looking to learn some process or method for tackling
this sort of problem (which i can only assume others will have had experience with)
but i might not necessarily do it (i'm allowed to tell them 'it can't be done in
situation x').


In reply to pattern matching with heuristics by Buckaroo Buddha

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.