Re^4: parsing text file

Sorry that I could not post the whole code

Here is the thing I've tried out.

Problem is that I'm unable to match the Numbers inside the data. Please suggest the best way to do this

#!/usr/bin/perl
use strict;

# Array to hold the text file data
my @chunks;

open(CONF, $txtFile) || die "cannot find the text file\n";

while(<CONF>){
    chomp;
    if (/^#/){
        # Must be a comment, skip it
        next;
    }
    elsif (/^\s*$/) {
        # Only contains whitespace, skip it
        # could be blank lines
        next;
    }
    elsif (/^M/){
        # Contains dos/mac control characters
        my @lines = split /^M/, $_;
        for ( my $i = 0 ; $i <= $#lines ; $i++ ){
            push(@chunks, $lines[$i]);
        }
    }
    else {
        # assumed to be a normal data line
        # trim trailing and leading spaces for parsing the data later
        $_ =~ s/^\s+|\s+$//g;
        push(@chunks, $_);
    }

    #print "Found data for ", scalar(@chunks), " lines in $csv\n\n";
}

close(CONF);

# Get the Array Index where the 'Summary of This Bill Period Charges' 
+text is located
my $index = indexArray('Summary of This Bill Period Charges', @chunks)
+;

# skip next 4 lines
$index = $index + 4;

foreach ( $index .. @chunks ) {
    if ( $data =~m/(\d+)\s{2,}(\d+)\s{2,}((\d|-)?(\d|,)*\.?\d*)\s{2,}(
+(\d|-)?(\d|,)*\.?\d*)\s{2,}/ ) { print $5; }
}

# Thanks to a post which gave me this snippet
sub indexArray{
    my ($text, @data) = @_;

    for( 1..@data ) {
        ;
        if ( $data[$_] =~ m/$text/ig ) {
            ;
            return $_-1;
        }
    }

    -1
}
[download]

Comment on Re^4: parsing text file Download Code

Replies are listed 'Best First'.
Re^5: parsing text file by Sandy (Curate) on Jun 06, 2011 at 20:29 UTC
Okay A number of things. One: `$data` has not been defined. I think you want to look at `$chunks[$index]` Two: The indices for `@chunks` goes from `1` to `@chunks-1`. So be careful! Three: In your regular expression, you have captures within captures. Note that (abc(def)(ghi))(xyz) will match the following: `$1 = abcdefghi $2 = def $3 = ghi $4 = xyz` [download] So, most of the time, your `$5` returns undef. Four: I am uncertain if you know what it is that you are matching. See the following code and results. It will show what you are matching, and what you could be matching if you used a simpler regex. I don't know if this is what you want, but it should get you started in the right direction. Code #!/usr/bin/perl use strict; use warnings; my @chunks = <DATA>; foreach my $i ( 0 .. @chunks-1 ) { my $data = $chunks[$i]; no warnings; if ( $data =~m/(\d+)\s{2,}(\d+) \s{2,}((\d\|-)?(\d\|,)\.?\d)\s{2,} +((\d\|-)?(\d\|,)\.?\d)\s{2,}/) { #/ ) { print "<$1> <$2> <$3> <$4> <$5> <$6> <$7> <$8>\n"; } } print "\n\n"; foreach my $i ( 0 .. @chunks-1 ) { my $data = $chunks[$i]; no warnings; if ( $data =~m/(\d+)\s+(\d+)\s+(-?\d,?\d\.?\d)\s+(-?\d,?\d\.? +\d)\s+/) { #/ ) { print "<$1> <$2> <$3> <$4>\n"; } } __DATA__ 1022289744 8008102935 221.00 + 199.00 70.50 3.20 0.00 + -9.70 27.09 290.09 1022290146 8008102942 0.00 + 199.00 63.80 0.00 0.00 + -3.80 26.70 285.70 1022290145 8008102930 0.00 + 199.00 207.80 3.20 1.20 + -120.00 30.04 321.24 1022289844 8008102943 0.00 + 199.00 5.50 9.00 0.00 + 0.00 21.98 235.48 1022290156 8008102954 0.00 + 199.00 283.40 0.40 11.20 + -51.80 45.53 487.73 1022290048 8008102949 0.00 + 199.00 0.00 0.00 0.00 + 0.00 20.50 219.50 [download] Results <1022289744> <8008102935> <221.00> <2> <1> <199.00> <1> <9> <1022290146> <8008102942> <0.00> <0> <> <199.00> <1> <9> <1022290145> <8008102930> <0.00> <0> <> <199.00> <1> <9> <1022289844> <8008102943> <0.00> <0> <> <199.00> <1> <9> <1022290156> <8008102954> <0.00> <0> <> <199.00> <1> <9> <1022290048> <8008102949> <0.00> <0> <> <199.00> <1> <9> <1022289744> <8008102935> <221.00> <199.00> <1022290146> <8008102942> <0.00> <199.00> <1022290145> <8008102930> <0.00> <199.00> <1022289844> <8008102943> <0.00> <199.00> <1022290156> <8008102954> <0.00> <199.00> <1022290048> <8008102949> <0.00> <199.00> [download] Good luck!	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: parsing text file
by Sandy (Curate) on Jun 06, 2011 at 20:29 UTC

A number of things.

One: $data has not been defined. I think you want to look at $chunks[$index]

Two: The indices for @chunks goes from 1 to @chunks-1. So be careful!

Three: In your regular expression, you have captures within captures. Note that (abc(def)(ghi))(xyz) will match the following:

$1 = abcdefghi
$2 = def
$3 = ghi
$4 = xyz
[download]

$5

Four: I am uncertain if you know what it is that you are matching. See the following code and results. It will show what you are matching, and what you could be matching if you used a simpler regex. I don't know if this is what you want, but it should get you started in the right direction.

Code

#!/usr/bin/perl
use strict;
use warnings;

my @chunks = <DATA>;

foreach my $i ( 0 .. @chunks-1 ) {
    my $data = $chunks[$i];
    no warnings;
    if ( $data =~m/(\d+)\s{2,}(\d+) \s{2,}((\d|-)?(\d|,)*\.?\d*)\s{2,}
+((\d|-)?(\d|,)*\.?\d*)\s{2,}/) { #/ ) { 
        print "<$1> <$2> <$3> <$4> <$5> <$6> <$7> <$8>\n"; 
    }
}
print "\n\n";
foreach my $i ( 0 .. @chunks-1 ) {
    my $data = $chunks[$i];
    no warnings;
    if ( $data =~m/(\d+)\s+(\d+)\s+(-?\d*,?\d*\.?\d*)\s+(-?\d*,?\d*\.?
+\d*)\s+/) { #/ ) { 
        print "<$1> <$2> <$3> <$4>\n"; 
    }
}

__DATA__
1022289744                 8008102935                 221.00          
+   199.00                70.50                3.20               0.00
+              -9.70              27.09             290.09
1022290146                 8008102942                  0.00           
+  199.00                63.80                0.00               0.00 
+             -3.80              26.70             285.70
1022290145                 8008102930                  0.00           
+  199.00              207.80                 3.20               1.20 
+          -120.00               30.04             321.24
1022289844                 8008102943                  0.00           
+  199.00                 5.50                9.00               0.00 
+              0.00              21.98             235.48
1022290156                 8008102954                  0.00           
+  199.00              283.40                 0.40             11.20  
+            -51.80              45.53             487.73
1022290048                 8008102949                  0.00           
+  199.00                 0.00                0.00               0.00 
+              0.00              20.50             219.50
[download]

<1022289744> <8008102935> <221.00> <2> <1> <199.00> <1> <9>
<1022290146> <8008102942> <0.00> <0> <> <199.00> <1> <9>
<1022290145> <8008102930> <0.00> <0> <> <199.00> <1> <9>
<1022289844> <8008102943> <0.00> <0> <> <199.00> <1> <9>
<1022290156> <8008102954> <0.00> <0> <> <199.00> <1> <9>
<1022290048> <8008102949> <0.00> <0> <> <199.00> <1> <9>


<1022289744> <8008102935> <221.00> <199.00>
<1022290146> <8008102942> <0.00> <199.00>
<1022290145> <8008102930> <0.00> <199.00>
<1022289844> <8008102943> <0.00> <199.00>
<1022290156> <8008102954> <0.00> <199.00>
<1022290048> <8008102949> <0.00> <199.00>
[download]

[reply]
[d/l]
[select]