in reply to Re: parsing text file
in thread parsing text file

In the text file, I've the summary of the tabular data like below to extract and store it in AOA

Summary of This Bill Period Charges Account No. Wahtel Number One Time Mo +nthly Call Value Roaming + Discounts Taxes Total Charges Ch +arges Charges Added + Charges + Services 1022289833 544.91 + 0.00 + 0.00 0.00 544.91 1022289744 8008102935 0.00 + 199.00 70.50 3.20 0.00 + -9.70 27.09 290.09 1022290146 8008102942 0.00 + 199.00 63.80 0.00 0.00 + -3.80 26.70 285.70 1022290145 8008102930 0.00 + 199.00 207.80 3.20 1.20 + -120.00 30.04 321.24 1022289844 8008102943 0.00 + 199.00 5.50 9.00 0.00 + 0.00 21.98 235.48 1022290156 8008102954 0.00 + 199.00 283.40 0.40 11.20 + -51.80 45.53 487.73 1022290048 8008102949 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 1022290051 8008102950 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.49 219.49 1022290246 8008102956 0.00 + 199.00 96.20 3.00 16.80 + -1.70 32.26 345.56 1022290050 8008102933 0.00 + 199.00 316.80 0.00 0.00 + -13.80 51.70 553.70 1022290151 8008102939 0.00 + 199.00 162.40 0.00 0.00 + -0.90 37.12 397.62 1022289947 8008102952 0.00 + 199.00 350.40 6.00 0.00 + -44.90 52.57 563.07 1022289843 8008102931 0.00 + 199.00 5.10 0.00 0.00 + -2.10 20.81 222.81 1022290248 8008102947 0.00 + 199.00 231.20 3.00 0.00 + -0.20 44.60 477.60 1022290249 8008102945 0.00 + 199.00 37.70 0.00 0.00 + -8.70 23.49 251.49 1022290245 8008102932 0.00 + 199.00 96.70 0.00 0.00 + -4.70 29.97 320.97 1022289946 8008102944 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 1022290153 8008102948 0.00 + 199.00 74.90 0.00 0.00 + -1.40 28.09 300.59 1022290046 8008102958 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 1022290150 8008102946 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 1022290247 8008102957 0.00 + 199.00 251.85 85.80 0.00 + -30.40 52.15 558.40 1022290149 8008102941 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 1022290047 8008102936 0.00 + 199.00 188.80 0.00 5.20 + -14.30 39.00 417.70 1022290052 8008102959 0.00 + 199.00 31.40 0.00 0.00 + -11.90 22.51 241.01 1022290154 8008102953 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 1022290155 8008102951 0.00 + 199.00 98.30 3.00 1.40 + -33.80 27.61 295.51 1022290045 8008102934 0.00 + 199.00 40.00 0.00 0.00 + -9.00 23.71 253.71 1022290152 8008102938 0.00 + 199.00 17.90 0.00 0.00 + -6.90 21.63 231.63 1022290049 8008102955 0.00 + 199.00 18.00 0.00 0.00 + 0.00 22.36 239.36 1022290147 8008102940 0.00 + 199.00 352.10 0.00 0.00 + -2.10 56.55 605.55 1022386743 8008102960 0.00 + 179.74 0.00 0.00 0.00 + 0.00 18.52 198.26 1022426128 9550675471 0.00 + 192.58 168.90 0.00 0.00 + -0.90 37.17 397.75 1022487768 8008203930 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93 1022487767 8008203925 0.00 + 154.06 59.20 0.00 0.00 + -9.40 21.00 224.86 1022487588 8008203934 0.00 + 154.06 100.10 10.10 0.00 + -82.10 18.76 200.92 1022487587 8008203926 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93 1022487769 8008203928 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93 1022487586 8008203927 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93 1022487434 8008203931 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93 1022487766 8008203929 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93 1022487770 8008203936 0.00 + 154.06 0.00 0.00 0.00 + 0.00 15.87 169.93

Replies are listed 'Best First'.
Re^3: parsing text file
by Utilitarian (Vicar) on Jun 03, 2011 at 13:13 UTC
    First open the file and while there are lines in it split them into an array then add this array to your global AoA.
    The relevant commands are open, while, split and push

    We don't do your job, though it looks as though MidLifeXis is doing your managers.

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      Sorry that I could not post the whole code

      Here is the thing I've tried out.

      Problem is that I'm unable to match the Numbers inside the data. Please suggest the best way to do this

      #!/usr/bin/perl use strict; # Array to hold the text file data my @chunks; open(CONF, $txtFile) || die "cannot find the text file\n"; while(<CONF>){ chomp; if (/^#/){ # Must be a comment, skip it next; } elsif (/^\s*$/) { # Only contains whitespace, skip it # could be blank lines next; } elsif (/^M/){ # Contains dos/mac control characters my @lines = split /^M/, $_; for ( my $i = 0 ; $i <= $#lines ; $i++ ){ push(@chunks, $lines[$i]); } } else { # assumed to be a normal data line # trim trailing and leading spaces for parsing the data later $_ =~ s/^\s+|\s+$//g; push(@chunks, $_); } #print "Found data for ", scalar(@chunks), " lines in $csv\n\n"; } close(CONF); # Get the Array Index where the 'Summary of This Bill Period Charges' +text is located my $index = indexArray('Summary of This Bill Period Charges', @chunks) +; # skip next 4 lines $index = $index + 4; foreach ( $index .. @chunks ) { if ( $data =~m/(\d+)\s{2,}(\d+)\s{2,}((\d|-)?(\d|,)*\.?\d*)\s{2,}( +(\d|-)?(\d|,)*\.?\d*)\s{2,}/ ) { print $5; } } # Thanks to a post which gave me this snippet sub indexArray{ my ($text, @data) = @_; for( 1..@data ) { ; if ( $data[$_] =~ m/$text/ig ) { ; return $_-1; } } -1 }
        Okay

        A number of things.

        One: $data has not been defined. I think you want to look at $chunks[$index]

        Two: The indices for @chunks goes from 1 to @chunks-1. So be careful!

        Three: In your regular expression, you have captures within captures. Note that (abc(def)(ghi))(xyz) will match the following:

        $1 = abcdefghi $2 = def $3 = ghi $4 = xyz
        So, most of the time, your $5 returns undef.

        Four: I am uncertain if you know what it is that you are matching. See the following code and results. It will show what you are matching, and what you could be matching if you used a simpler regex. I don't know if this is what you want, but it should get you started in the right direction.

        Code

        #!/usr/bin/perl use strict; use warnings; my @chunks = <DATA>; foreach my $i ( 0 .. @chunks-1 ) { my $data = $chunks[$i]; no warnings; if ( $data =~m/(\d+)\s{2,}(\d+) \s{2,}((\d|-)?(\d|,)*\.?\d*)\s{2,} +((\d|-)?(\d|,)*\.?\d*)\s{2,}/) { #/ ) { print "<$1> <$2> <$3> <$4> <$5> <$6> <$7> <$8>\n"; } } print "\n\n"; foreach my $i ( 0 .. @chunks-1 ) { my $data = $chunks[$i]; no warnings; if ( $data =~m/(\d+)\s+(\d+)\s+(-?\d*,?\d*\.?\d*)\s+(-?\d*,?\d*\.? +\d*)\s+/) { #/ ) { print "<$1> <$2> <$3> <$4>\n"; } } __DATA__ 1022289744 8008102935 221.00 + 199.00 70.50 3.20 0.00 + -9.70 27.09 290.09 1022290146 8008102942 0.00 + 199.00 63.80 0.00 0.00 + -3.80 26.70 285.70 1022290145 8008102930 0.00 + 199.00 207.80 3.20 1.20 + -120.00 30.04 321.24 1022289844 8008102943 0.00 + 199.00 5.50 9.00 0.00 + 0.00 21.98 235.48 1022290156 8008102954 0.00 + 199.00 283.40 0.40 11.20 + -51.80 45.53 487.73 1022290048 8008102949 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50
        Results
        <1022289744> <8008102935> <221.00> <2> <1> <199.00> <1> <9> <1022290146> <8008102942> <0.00> <0> <> <199.00> <1> <9> <1022290145> <8008102930> <0.00> <0> <> <199.00> <1> <9> <1022289844> <8008102943> <0.00> <0> <> <199.00> <1> <9> <1022290156> <8008102954> <0.00> <0> <> <199.00> <1> <9> <1022290048> <8008102949> <0.00> <0> <> <199.00> <1> <9> <1022289744> <8008102935> <221.00> <199.00> <1022290146> <8008102942> <0.00> <199.00> <1022290145> <8008102930> <0.00> <199.00> <1022289844> <8008102943> <0.00> <199.00> <1022290156> <8008102954> <0.00> <199.00> <1022290048> <8008102949> <0.00> <199.00>
        Good luck!