Re: Regular Expressions

Please use <code> tags around your data and code so that it's properly formatted. Here's the sample data that you posted (I removed empty lines for space):

Header Line One
***-***  0     0         ***-MBO  0    0
2TO-T/V  0     0         2TO-T/O  0    0
POC-CNU  1285  0         POC-A/M  0    15567
Header Line Two
***-***  0     0         ***-MBO  0    0
2TO-T/V  0     0         2TO-T/O  0    0
POC-CNU  1285  0         POC-A/M  0    15567
[download]

Here's sample code (untested) that does what you explained, storing each line in a hash using the first 7 characters as the key, and checking for duplicates:

my $data={};
my $file;
while(<>) {
  chomp;
  # Skip blank lines
  next if /^\s*$/;
  if (/^Header/) {
    $data->{$_}={};  # Create a new first-level hash.
    $file=$_;
    next;
  }
  if (/^([a-zA-Z0-9*/]{3}-\S{3})\s+/) {
    my $key=$1;
    if ($file) {
      # Check for duplicates.
      if (exists($data->{$file}->{$key})) {
        warn "Duplicate key $key in $file: $_\n";
        next;
      }
      $data->{$file}->{$key}=$_;
    }
    else {
      warn "Line found before a header line: $_\n";
    }
  } else {
    # Reject improper lines
    warn "Badly formatted line found, ignoring: $_\n";
  }
}
[download]

This stores the data in a structure like this:

$data->{Header Line One}->
        {***-***} -> "***-*** 0 0 ..etc"
        {2TO-T/V} -> "..."
        ...
      ->{Header Line Two}->
        ....
[download]

This may not be precisely what you want, but it should give you an idea of one way of doing it.

--ZZamboni

Comment on Re: Regular Expressions Select or Download Code