in reply to Re^3: parsing text file
in thread parsing text file

Sorry that I could not post the whole code

Here is the thing I've tried out.

Problem is that I'm unable to match the Numbers inside the data. Please suggest the best way to do this

#!/usr/bin/perl use strict; # Array to hold the text file data my @chunks; open(CONF, $txtFile) || die "cannot find the text file\n"; while(<CONF>){ chomp; if (/^#/){ # Must be a comment, skip it next; } elsif (/^\s*$/) { # Only contains whitespace, skip it # could be blank lines next; } elsif (/^M/){ # Contains dos/mac control characters my @lines = split /^M/, $_; for ( my $i = 0 ; $i <= $#lines ; $i++ ){ push(@chunks, $lines[$i]); } } else { # assumed to be a normal data line # trim trailing and leading spaces for parsing the data later $_ =~ s/^\s+|\s+$//g; push(@chunks, $_); } #print "Found data for ", scalar(@chunks), " lines in $csv\n\n"; } close(CONF); # Get the Array Index where the 'Summary of This Bill Period Charges' +text is located my $index = indexArray('Summary of This Bill Period Charges', @chunks) +; # skip next 4 lines $index = $index + 4; foreach ( $index .. @chunks ) { if ( $data =~m/(\d+)\s{2,}(\d+)\s{2,}((\d|-)?(\d|,)*\.?\d*)\s{2,}( +(\d|-)?(\d|,)*\.?\d*)\s{2,}/ ) { print $5; } } # Thanks to a post which gave me this snippet sub indexArray{ my ($text, @data) = @_; for( 1..@data ) { ; if ( $data[$_] =~ m/$text/ig ) { ; return $_-1; } } -1 }

Replies are listed 'Best First'.
Re^5: parsing text file
by Sandy (Curate) on Jun 06, 2011 at 20:29 UTC
    Okay

    A number of things.

    One: $data has not been defined. I think you want to look at $chunks[$index]

    Two: The indices for @chunks goes from 1 to @chunks-1. So be careful!

    Three: In your regular expression, you have captures within captures. Note that (abc(def)(ghi))(xyz) will match the following:

    $1 = abcdefghi $2 = def $3 = ghi $4 = xyz
    So, most of the time, your $5 returns undef.

    Four: I am uncertain if you know what it is that you are matching. See the following code and results. It will show what you are matching, and what you could be matching if you used a simpler regex. I don't know if this is what you want, but it should get you started in the right direction.

    Code

    #!/usr/bin/perl use strict; use warnings; my @chunks = <DATA>; foreach my $i ( 0 .. @chunks-1 ) { my $data = $chunks[$i]; no warnings; if ( $data =~m/(\d+)\s{2,}(\d+) \s{2,}((\d|-)?(\d|,)*\.?\d*)\s{2,} +((\d|-)?(\d|,)*\.?\d*)\s{2,}/) { #/ ) { print "<$1> <$2> <$3> <$4> <$5> <$6> <$7> <$8>\n"; } } print "\n\n"; foreach my $i ( 0 .. @chunks-1 ) { my $data = $chunks[$i]; no warnings; if ( $data =~m/(\d+)\s+(\d+)\s+(-?\d*,?\d*\.?\d*)\s+(-?\d*,?\d*\.? +\d*)\s+/) { #/ ) { print "<$1> <$2> <$3> <$4>\n"; } } __DATA__ 1022289744 8008102935 221.00 + 199.00 70.50 3.20 0.00 + -9.70 27.09 290.09 1022290146 8008102942 0.00 + 199.00 63.80 0.00 0.00 + -3.80 26.70 285.70 1022290145 8008102930 0.00 + 199.00 207.80 3.20 1.20 + -120.00 30.04 321.24 1022289844 8008102943 0.00 + 199.00 5.50 9.00 0.00 + 0.00 21.98 235.48 1022290156 8008102954 0.00 + 199.00 283.40 0.40 11.20 + -51.80 45.53 487.73 1022290048 8008102949 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50
    Results
    <1022289744> <8008102935> <221.00> <2> <1> <199.00> <1> <9> <1022290146> <8008102942> <0.00> <0> <> <199.00> <1> <9> <1022290145> <8008102930> <0.00> <0> <> <199.00> <1> <9> <1022289844> <8008102943> <0.00> <0> <> <199.00> <1> <9> <1022290156> <8008102954> <0.00> <0> <> <199.00> <1> <9> <1022290048> <8008102949> <0.00> <0> <> <199.00> <1> <9> <1022289744> <8008102935> <221.00> <199.00> <1022290146> <8008102942> <0.00> <199.00> <1022290145> <8008102930> <0.00> <199.00> <1022289844> <8008102943> <0.00> <199.00> <1022290156> <8008102954> <0.00> <199.00> <1022290048> <8008102949> <0.00> <199.00>
    Good luck!