comment on

OK, so here is a problem I am working on. I have an ordered list of statute numbers, along with charge level, charge degree and criminal category. The list is ordered by statute. I am looking to find the smallest grouping of a set of charges that will uniquely identify the set. For example

__DATA__
0024.118.3A F T TFF     
0024.118.3B F T TFF
0024.118.3C F T TFF
0024.118.3D F T TFF
0024.118.4  F F TFF

should (and does) produce the following groups

0024FT containing 0024.118.3A,B,C,D  TFF
0024FF containing 0024.118.4         TFF (coincidently)
the degree causes the change in grouping
[download]

The problem I'm having is in grouping when the statute has different degrees and/or categories. For example

Update: cleaned up code to make it more readable and configured it to run as a complete example

Update: Thanks to all for your comments. I have a working solution now. Go Monks ...

__DATA__
0039.04    N N OTH
0039.205.2 F T OTH
0039.205.6 F T TFF

should produce

0039NN       containing 0039.04    OTH
0039.205.2FT containing 0039.205.2 OTH
0039.205.6FT containing 0039.205.6 TFF
where the change in casetype changes the grouping

but instead the code produces

0039NN       containing 0039.04    OTH
0039.205.2FT containing 0039.205.2 OTH
0039FT       containing 0039.205.6 TFF
[download]

This is no good as it implies an overlap between the two statute/casetype pairs when there isn't.

I need to somehow refer back to previous groups but only within a specific subset of the statute groups.

Update:
The grouping criteria is fairly straightforward. Statutes are not always reported exactly as they are written in the books. Since I am only interested in the criminal case types and not the statutes themselves, I would like to find the smalles portion of the statute that I can use to determine the case type. For example, all statutes that begin with 0024 and are third degree felonies can be placed in the TFF case type. There is a lower limit on the group size in that it must contain at least the first four characters (the chapter) of the statute being classified and may contain all of the characters. So, for my first example, 0024 is the smallest group I can find. On the other hand, for statutes in chapter 0039, there is no way to categorize the statute without using the complete statute value ie 0039.205.2 => OTH or 0039.205.6 => TFF.

Here is the code I have so far. Any suggestions will be most appreciated.

note: code rewritten so that it should run a complete example (and be clearer to read)

#! usr/bin/perl
# Compiler directives and Includes
use strict;
use warnings;
use diagnostics;
use Data::Dumper;

# Global declarations;

# Constants and pragmas
use constant MINSTATLEN03 => scalar 4;

#main()
{
    my $db01    = '';
    my %statgrp = ();
    my $table   = 'pgmsvcs.statgrp_ld';
    my $rptfh   = '';
    my $rtncd   = 0;
    
    $rtncd = FindStatGrps(\%statgrp, $table);
    exit;
} #end main()

sub FindStatGrps{
my $statgrp = shift;
my $table   = shift;

my $rtncd     = 0;
my @statlst01 = ();

push @statlst01, [ split /\s+/ ] while(<DATA>);

my %grplst = ();
my %placedstats = ();

STATUTE: foreach my $idx (0..$#statlst01){

  my $statute = $statlst01[$idx][0].$statlst01[$idx][1];
  
  next STATUTE if(exists($placedstats{$statute})); # skip if statute
                                                   # already classifie
+d

  GROUP: for my $i (MINSTATLEN03..length($statlst01[$idx][0])){
  
    next if substr($statute,$i-1,1) eq '.'; # skip if subgrp ends on
                                            # a subfield seperator (.)
    my $grpformatch = substr($statute,0,$i);
    my $levdeg = $statlst01[$idx][1];

                                            # initialize category bins
    my %srsgrp = (CM=>[],NCM=>[] ,SO=>[],ROB=>[],OTH=>[],BURG=>[],
                  TFF=>[],WC=>[],PROP=>[],DRG=>[],MISD=>[],OTH=>[],
                  DNC=>[]);

    my $j = $idx;
    do{                     # load indiv stats into respective bins
                            # does indiv stat belong to group?
      if($statlst01[$j]->[0] =~ /^$grpformatch/){
        if($statlst01[$j]->[1] eq $levdeg){
          push @{$srsgrp{$statlst01[$j]->[2]}}, $statlst01[$j];
        }
      }else{                # if statute does not match, stop looking
          $j = @statlst01;  # since list is ordered
      }
    }while(++$j < @statlst01);
                                                        
                            # check if more than one bin is occupied
    my $nrgrps = 0;
    (scalar @{$srsgrp{$_}} > 0) && $nrgrps++ foreach (keys %srsgrp);

    if($nrgrps > 1){        # 2 or more bins are occupied
      next GROUP;           # try inclrease group and try again
    }
    elsif($nrgrps == 1){    # only 1 bin occupied -- good
      my $srscat = undef;                    # determin bin name
      ($srscat || ((scalar @{$srsgrp{$_}} > 0) && ($srscat = $_))) 
                                               foreach (keys %srsgrp);
+ 

      my $grp = $grpformatch.$levdeg;        # define group key
      $statgrp->{$grp} = [];

      foreach (@{$srsgrp{$srscat}}){       # save indiv statute data
        push @{$statgrp->{$grp}}, $_;      # & updt list of already
        $placedstats{$_->[0].$_->[1]} = 1; # classified statutes
      }
      next STATUTE;         # group found, go to next statute
    }
    else{                   # zerp bins are occupied -- error
      die "error!!";        # at least one bin should be occupied
    }
  }
}
print Dumper($statgrp);
return $rtncd;
} #end FindStatGrps()

__DATA__
0024.118.3A FT TFF
0024.118.3B FT TFF
0024.118.3C FT TFF
0024.118.3D FT TFF
0024.118.4  FF TFF
0039.04     NN OTH
0039.205.2  FT OTH
0039.205.6  FT TFF
409.176.12A FT TFF
409.176.12B FT TFF
409.176.12C FT TFF
409.176.12D FT OTH
409.176.12E FT OTH
[download]

PJ
use strict; use warnings; use diagnostics; (if needed)

In reply to Creating Minimal Subgroups in a List of Characters by periapt

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.