OK, so here is a problem I am working on. I have an ordered list of statute numbers, along with charge level, charge degree and criminal category. The list is ordered by statute. I am looking to find the smallest grouping of a set of charges that will uniquely identify the set. For example
__DATA__ 0024.118.3A F T TFF 0024.118.3B F T TFF 0024.118.3C F T TFF 0024.118.3D F T TFF 0024.118.4 F F TFF should (and does) produce the following groups 0024FT containing 0024.118.3A,B,C,D TFF 0024FF containing 0024.118.4 TFF (coincidently) the degree causes the change in grouping
The problem I'm having is in grouping when the statute has different degrees and/or categories. For example

Update: cleaned up code to make it more readable and configured it to run as a complete example

Update: Thanks to all for your comments. I have a working solution now. Go Monks ...

__DATA__ 0039.04 N N OTH 0039.205.2 F T OTH 0039.205.6 F T TFF should produce 0039NN containing 0039.04 OTH 0039.205.2FT containing 0039.205.2 OTH 0039.205.6FT containing 0039.205.6 TFF where the change in casetype changes the grouping but instead the code produces 0039NN containing 0039.04 OTH 0039.205.2FT containing 0039.205.2 OTH 0039FT containing 0039.205.6 TFF
This is no good as it implies an overlap between the two statute/casetype pairs when there isn't.

I need to somehow refer back to previous groups but only within a specific subset of the statute groups.

Update:
The grouping criteria is fairly straightforward. Statutes are not always reported exactly as they are written in the books. Since I am only interested in the criminal case types and not the statutes themselves, I would like to find the smalles portion of the statute that I can use to determine the case type. For example, all statutes that begin with 0024 and are third degree felonies can be placed in the TFF case type. There is a lower limit on the group size in that it must contain at least the first four characters (the chapter) of the statute being classified and may contain all of the characters. So, for my first example, 0024 is the smallest group I can find. On the other hand, for statutes in chapter 0039, there is no way to categorize the statute without using the complete statute value ie 0039.205.2 => OTH or 0039.205.6 => TFF.

Here is the code I have so far. Any suggestions will be most appreciated.

note: code rewritten so that it should run a complete example (and be clearer to read)
#! usr/bin/perl # Compiler directives and Includes use strict; use warnings; use diagnostics; use Data::Dumper; # Global declarations; # Constants and pragmas use constant MINSTATLEN03 => scalar 4; #main() { my $db01 = ''; my %statgrp = (); my $table = 'pgmsvcs.statgrp_ld'; my $rptfh = ''; my $rtncd = 0; $rtncd = FindStatGrps(\%statgrp, $table); exit; } #end main() sub FindStatGrps{ my $statgrp = shift; my $table = shift; my $rtncd = 0; my @statlst01 = (); push @statlst01, [ split /\s+/ ] while(<DATA>); my %grplst = (); my %placedstats = (); STATUTE: foreach my $idx (0..$#statlst01){ my $statute = $statlst01[$idx][0].$statlst01[$idx][1]; next STATUTE if(exists($placedstats{$statute})); # skip if statute # already classifie +d GROUP: for my $i (MINSTATLEN03..length($statlst01[$idx][0])){ next if substr($statute,$i-1,1) eq '.'; # skip if subgrp ends on # a subfield seperator (.) my $grpformatch = substr($statute,0,$i); my $levdeg = $statlst01[$idx][1]; # initialize category bins my %srsgrp = (CM=>[],NCM=>[] ,SO=>[],ROB=>[],OTH=>[],BURG=>[], TFF=>[],WC=>[],PROP=>[],DRG=>[],MISD=>[],OTH=>[], DNC=>[]); my $j = $idx; do{ # load indiv stats into respective bins # does indiv stat belong to group? if($statlst01[$j]->[0] =~ /^$grpformatch/){ if($statlst01[$j]->[1] eq $levdeg){ push @{$srsgrp{$statlst01[$j]->[2]}}, $statlst01[$j]; } }else{ # if statute does not match, stop looking $j = @statlst01; # since list is ordered } }while(++$j < @statlst01); # check if more than one bin is occupied my $nrgrps = 0; (scalar @{$srsgrp{$_}} > 0) && $nrgrps++ foreach (keys %srsgrp); if($nrgrps > 1){ # 2 or more bins are occupied next GROUP; # try inclrease group and try again } elsif($nrgrps == 1){ # only 1 bin occupied -- good my $srscat = undef; # determin bin name ($srscat || ((scalar @{$srsgrp{$_}} > 0) && ($srscat = $_))) foreach (keys %srsgrp); + my $grp = $grpformatch.$levdeg; # define group key $statgrp->{$grp} = []; foreach (@{$srsgrp{$srscat}}){ # save indiv statute data push @{$statgrp->{$grp}}, $_; # & updt list of already $placedstats{$_->[0].$_->[1]} = 1; # classified statutes } next STATUTE; # group found, go to next statute } else{ # zerp bins are occupied -- error die "error!!"; # at least one bin should be occupied } } } print Dumper($statgrp); return $rtncd; } #end FindStatGrps() __DATA__ 0024.118.3A FT TFF 0024.118.3B FT TFF 0024.118.3C FT TFF 0024.118.3D FT TFF 0024.118.4 FF TFF 0039.04 NN OTH 0039.205.2 FT OTH 0039.205.6 FT TFF 409.176.12A FT TFF 409.176.12B FT TFF 409.176.12C FT TFF 409.176.12D FT OTH 409.176.12E FT OTH

PJ
use strict; use warnings; use diagnostics; (if needed)

In reply to Creating Minimal Subgroups in a List of Characters by periapt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.