Re^5: gathering of some elements of a list

Now we're talking!

Given your output requirements makes things a whole lot easier. For example, you can write a "top-down" program which gives success or failure, depending on whether the computed results match the expected results:

#!/usr/bin/perl

use strict;
use warnings;

# Data
my $p_expected_output = [
    '0601|3|NORM|2|ALLO|XLF753|U|0045|0050|',
    '0603|5|NORM|2|ALLO|ADR2CG||0430|0438|',
    '0604|6|NORM|2|ALLO|AF681VC|i U|0500|0510|',
    '0605|7|NORM|2|ALLO|AF651PQ|i|0515|0523|',
    '0606|8|NORM|2|ALLO|AF713BR|i|0445|0453|',
    '0607|9|NORM|2|ALLO|AFR100M|i|0520|0533|',
    '0609|11|NORM|2|ALLO|GJT775|i E|2300|2315|',
    '0610|12|NORM|2|ALLO|AF661WN|i|0450|0500|0500|',
];


my $p_input = [
    '0601 3      NORM 2  ALLO XLF753         U 0045 0050 ',
    '0603 5      NORM 2  ALLO ADR2CG           0430 0438 ',
    '0604 6      NORM 2  ALLO AF681VC  i     U 0500 0510 ',
    '0605 7      NORM 2  ALLO AF651PQ  i       0515 0523 ',
    '0606 8      NORM 2  ALLO AF713BR  i       0445 0453 ',
    '0607 9      NORM 2  ALLO AFR100M  i       0520 0533 ',
    '0609 11     NORM 2  ALLO GJT775   i E     2300 2315 ',
    '0610 12     NORM 2  ALLO AF661WN  i       0450 0500 0500 ',
];


# Main program
my $p_output = generate_output($p_input);

if (arrays_match($p_output, $p_expected_output)) {
    print "Success!\n";
} else {
    print "Generated output does NOT match expected output :-(\n";
}
[download]

(Note that I added '|' to then end of the last string in the last array, so that line will conform to the same pattern as the others).

Now all that's left is to write the subroutines arrays_match, and generate_output. Here is one way:

# Subroutines
sub arrays_match {
    my ($p1, $p2) = @_;
    if (@$p1 != @$p2) {
        return 0;        # Lengths don't match
    }
    for (my $i = 0; $i < @$p1; $i++) {
        if ($p1->[$i] ne $p2->[$i]) {
            # Elements don't match
            printf "Mismatch at line %d:\n", $i + 1;
            printf "  Array1:  '%s'\n", $p1->[$i];
            printf "  Array2:  '%s'\n", $p2->[$i];
            return 0;
        }
    }
    return 1;    # Arrays match
}


sub generate_output {
    my ($parray) = @_;
    my @output = ( );
    foreach my $line (@$parray) {
        push @output, transform_line($line);
    }
    return [ @output ];
}    


sub transform_line {
    my $line = shift;
    $line =~ s/\s+$//;    # Trim trailing whitespace

    # Assume the line is 6 words, followed (optionally) by any
    # combination of { i, E, U } (with optional space around them),
    # followed (optionally) by more words.
    #
    if ($line !~ s/^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+//
+) {
        die "Didn't get expected 6 words in line:\n$line\n";
    }
    my $output = join('|', $1, $2, $3, $4, $5, $6);
    if ($line =~ s/((i\s*)?(E\s*)?(U\s*)?)//) {
        # Split on whitespace, and then put the optional { i, E, U }
        # back together, with a single space between each.
        my $optarg = join(' ', split(/\s+/, $1));
        $output .= "|$optarg";
    }
    while ($line =~ s/^\s*(\S+)//) {
        my $word = $1;
        $output .= "|$1";
    }
    $output .= "|";

    return $output;
}
[download]

Above, I've put some diagnostics in arrays_match, to show where the expected vs. computed arrays differ (if they do).

I've also created a subroutine transform_line, called by generate_output, which contains the entirety of the parsing algorithm. I think it makes the logic easier to see when it's separated like that, and it makes it easier to rewrite as well (if your requirements should change). Having an algorithm that is as general as possible gives you the added flexibility of working with supersets of the original data, as long as they meet the criteria established in the algorithm.

s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Comment on Re^5: gathering of some elements of a list Select or Download Code