in reply to Regex problem

You only seem to be interested in the data that matches a single age followed by three decimal numbers. The regex in the following code looks for this pattern and extracts it into an array. Once you have it in this form you can manipulate it as you like.
while (<FILE>) { my @data; if (@data = /\b(\d+)\b\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)/) { print "@data\n"; } }

Updated I was on the right track but I hadn't spent enough time on the problem. I have corrected the issue for more than one set of data per line and added a rather ugly bit to capture the ' and over' part in the data...

my %info; while (<>) { my @data; while (/\b(\d+)\b(?: and over)?\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\. +\d+)/gc) { $info{$1} = [$2,$3,$4]; } } foreach (sort {$a<=>$b} keys %info) { print "$_ ", join(' ', @{$info{$_}}), "\n"; }

Replies are listed 'Best First'.
Re^2: Regex problem
by Roy Johnson (Monsignor) on Oct 18, 2005 at 16:14 UTC
    Except there are some tricks. The data is formatted to contain (often) two data sets on the same line. And he does seem to want "90 and over" to show up. I got pretty good results with this:
    my (@age, @number_of_males, @number_of_females); while (<FILE>) { if (/\b\d\d?(?:\d+\.\d+\s*){3}\s/) { s/under\s+/</gi; s/\s+and over/+/gi; my @ar = split; while (@ar > 3) { (my ($age, undef, $males, $females), @ar) = @ar; next if $age =~ /\d\D\d/; push @age, $age; push @number_of_males, $males; push @number_of_females, $females; } } } print "$age[$_], $number_of_males[$_], $number_of_females[$_]\n" for sort {$age[$a] <=> $age[$b] } 0..$#age;

    Caution: Contents may have been coded under pressure.