Dear Monks,

I have a flat file that takes the basic form of: <background>

Table 3 Mid-1983 Population Estimates: England and Wales; single ye +ar of age and sex; estimated resident population, + revised in light of the results of the 2001 Census1 + England and Wales Thousands Age Persons Males Females Persons Males Fema +les All ages 49,617.0 24,133.3 25,483.7 0-4 3,116.4 1,597.3 1,519.1 50-54 2,748.0 1,367. +5 1,380.5 0 622.7 319.1 303.6 50 530.6 265.9 264.6 1 618.0 316.8 301.2 51 548.1 274.9 273.2 2 628.4 321.6 306.8 52 558.8 278.5 280.3 3 634.9 325.3 309.6 53 559.3 276.0 283.3 4 612.4 314.4 298.0 54 551.2 272.2 279.1 5-9 2,921.0 1,499.9 1,421.2 55-59 2,783.0 1,363. +8 1,419.2 5 563.2 289.3 273.9 55 546.8 269.9 276.9 6 553.7 284.6 269.1 56 554.3 273.4 280.8 7 577.9 296.6 281.3 57 561.9 276.2 285.7 8 602.2 309.1 293.1 58 558.4 272.5 286.0 9 624.1 320.3 303.8 59 561.6 271.8 289.8 10-14 3,670.5 1,882.5 1,787.9 60-64 2,823.6 1,33 +5.9 1,487.7 10 671.3 345.2 326.1 60 563.6 270.4 293.2 11 713.6 366.6 347.0 61 590.2 282.0 308.2 12 759.4 388.9 370.5 62 611.9 289.2 322.7 13 749.5 384.5 365.0 63 634.3 297.7 336.6 14 776.7 397.3 379.4 64 423.6 196.6 227.1 15-19 4,120.8 2,111.9 2,008.9 65-69 2,283.7 1,03 +0.8 1,252.8 15 782.0 400.5 381.5 65 401.3 184.1 217.2 16 811.2 417.5 393.7 66 442.2 202.2 240.0 17 829.3 425.8 403.5 67 467.6 212.7 254.9 18 849.1 435.1 414.0 68 489.5 218.7 270.7 19 849.2 433.0 416.1 69 483.2 213.2 270.0 20-24 3,922.3 1,976.0 1,946.3 70-74 2,136.5 903. +8 1,232.7 20 829.9 419.4 410.5 70 465.5 203.5 262.0 21 812.0 408.8 403.2 71 442.5 190.8 251.7 22 790.4 398.3 392.2 72 423.8 179.3 244.5 23 756.1 381.0 375.1 73 411.4 171.0 240.4 24 733.9 368.6 365.3 74 393.3 159.2 234.1 25-29 3,412.7 1,718.1 1,694.6 75-79 1,580.0 592. +1 987.9 25 716.0 359.4 356.6 75 370.0 146.5 223.4 26 694.5 348.2 346.4 76 340.0 131.3 208.7 27 676.6 341.3 335.3 77 314.3 117.0 197.4 28 657.0 331.7 325.2 78 292.2 105.2 187.0 29 668.7 337.5 331.2 79 263.4 92.1 171.3 30-34 3,403.3 1,710.6 1,692.7 80-84 933.7 289.4 + 644.2 30 666.9 336.3 330.6 80 239.5 80.7 158.8 31 656.1 329.8 326.3 81 211.2 68.2 143.0 32 669.9 337.3 332.6 82 186.6 57.2 129.4 33 692.6 347.5 345.1 83 163.1 47.1 116.0 34 717.8 359.7 358.1 84 133.2 36.2 97.0 35-39 3,580.0 1,796.0 1,784.0 85-89 408.3 99.3 + 309.0 35 770.2 384.8 385.4 85 112.6 29.3 83.3 36 842.3 422.4 419.9 86 96.6 24.0 72.6 37 660.4 331.2 329.2 87 80.7 19.2 61.6 38 651.7 327.6 324.1 88 66.1 15.2 50.9 39 655.3 329.9 325.3 89 52.3 11.6 40.7 40-44 2,862.5 1,443.2 1,419.3 90 and over 163.2 +31.8 131.3 40 625.6 315.8 309.8 41 572.3 288.7 283.6 42 530.6 267.3 263.2 Under 16 10,490.0 5,380.2 + 5,109.7 43 565.1 285.2 279.8 44 569.0 286.1 282.9 Under 18 12,130.5 6,223.6 + 5,906.9 45-49 2,747.7 1,383.3 1,364.3 16-44 20,519.6 10, +355.4 10,164.2 45 567.8 286.3 281.5 46 558.8 281.7 277.0 45-64/59* 9,614.5 5,450.5 + 4,164.0 47 551.6 277.5 274.2 48 542.6 273.2 269.5 65/60** 8,993.0 2,947.2 +6,045.7 49 526.8 264.7 262.2 and over

I want to take the number of females and males for each age. The code that I have produced to do this is like:

my @files; my $line; my $file; while (<*.txt>) { next if $_ eq 'out.txt'; $file = $_; open (FILE, "$file"); my $line_number = 0; my $run; LINE: while (<FILE>){ my $line = $_; $line_number = $line_number +1; $run = 1; if ($line_number == 1){ $run = 1 if $line !~ m/^Table\s/; # The function of this line wa +s eliminated in order to allow the first line to be blanks. ie now a +ll text files are run last LINE; } else { $run = 1; } last LINE; } if ($run == 1){ push (@files, $file); } else { #Do nothing } } print "@files"; my $line_B; my $FH; my $batch; my @age; my @number_of_males; my @number_of_females; my $out = "out.txt"; open (OUT, "+>$out"); my $year; foreach my $filename ( @files ) { if ($filename =~ /^(\d{4})/){ $year = $1; } open ( FILE, "<", $filename ) or die( "Couldn't open $filename: $!" ); my $batch = "\n"; INNER: while ((<FILE>)) { s/^$//; s/\d{1,3}-\d{1,3}\b.{1,300}$//; s/Under\b.{1,500}//; s/revised in light of the results of the 2001 Census1//; s/England\band\bWales//; s/Thousands//; s/Age\bPersons\bMales\bFemales\bPersons\bMales\bFemales//; s/^\D.*$//; s/(A-Z|a-z)//; s/^\D.*$//; s/These//; s/\sare//; s/\sfinal//; s/\srevised//; s/\sestimates.//; s/\sThey//; s/\sreplace//; s/\sthe//; s/\sinterim//; s/\srevised//; s/\spopulation//; s/\sestimates//; s/\sthat//; s/\swere//; s/\spublished//; s/\son//; s/\s10//; s/\sOctober\s2002\sat\snational\slevel\sfor\sEngland\sand\sWales\. +//; s/and\sover//; s/Table\b3\bMid\-\d\d\d\d\bPopulation\bEstimates\:\bEngland\band\ +bWales;\bsingle\byear\bof\bage\band\bsex\;\bestimated\bresident\bpopu +lation\,//; s/revised\bin\blight\bof\bthe\bresults\bof\bthe\b2001\bCensus1//; s/Thousands//; s/Age\bPersons\bMales\bFemales\bPersons\bMales\bFemales//; s/^$//; s/(\d{1,4}\.\d)(\d\.\d{1,4})/$1\t$2/; s/^1\s*$//; s/65\/60.*//; if ($_ =~ /([\d|\.]*)\t([\d|\.]*)\t([\d|\.]*)\t([\d|\.]*)\t([\d|\. +]*)\t([\d|\.]*)\t([\d|\.]*)\t([\d|\.]*)\t([\d|\.]*)/){ # I think tha +t the main problem is here push (@age, $1); push (@number_of_males, $3); push (@number_of_females, $4); push (@age, $6); push (@number_of_males, $8); push (@number_of_females, $9); } $batch .= $_; last INNER if eof; ##|| m/^\s*go\s*$/i; } print "\n@age\n\n"; print "\n@number_of_males\n\n"; print "\n@number_of_females\n\n"; my $counter; for($counter = 0; $counter <= @age; $counter++) { if ($age[$counter]) { print OUT $year."\t"; print OUT $age[$counter]; print OUT "\t1\t"; print OUT $number_of_males[$counter]; print OUT "\n"; print OUT $year."\t"; print OUT $age[$counter]; print OUT "\t2\t"; print OUT $number_of_females[$counter]; print OUT "\n"; } } undef(@age); undef(@number_of_males); undef(@number_of_females); print $batch; sleep 2; }

The problem is that I don't have figures returned for the ages 0, 42, 44, 46, 48 and > 90. Can anyone please suggest changes that will rectify this or create easier to follow code.

In reply to Regex problem by Win

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.