david_lyon has asked for the wisdom of the Perl Monks concerning the following question:

Still very stuck on this one if anyone can provide more code/help it would be most appreciated. Thanks Dave

Hi I have a parsing issue that I was wondering if someone can please help me out. Thanks for your consideration Dave If I have a data file like below:

$start $stop $id1 $id2 1 100 1 2 101 200 1 2 201 300 1 0 301 400 1 0 401 500 0 2 501 600 0 2 601 700 0 0 701 800 1 1 801 900 1 2

I want to go down each column $id1 and $id2 and replacing the zeros '0' in a block of zeros with a number only if the number before and after the end zero blocks are the same, eg: the results would be:

1 100 1 2 101 200 1 2 201 300 1 2 301 400 1 2 401 500 1 2 501 600 1 2 601 700 1 0 701 800 1 1 801 900 1 2
here is some code I started off:
while(<>){ chomp; ($start, $stop, $id1, $id2)=split(/\t/,$_); $old_id1=$id1; $old_id2=$id2; }

Replies are listed 'Best First'.
Re: filling in the zeros...please help
by roboticus (Chancellor) on May 15, 2012 at 13:08 UTC

    david_lyon:

    Don't try to do it all in one pass, it'll just get too confusing. When you're just starting out, simpler is better. So use one pass to load the array, one pass to patch up the array, and then one to print the array. That way, you can more easily debug each pass. When you get more used to perl, you can start merging passes when you want.

    # Read the data table my @table; while (<>) { chomp; my ($start, $stop, $id, $id2) = split /\t/, $_; push @table, [ $start, $stop, $id1, $id2 ]; } # Patch up the data for (my $i=1; $i < $#table; ++$i) { # check surrounding items and patch the data, as needed . . . } # Print the new table for my $ar (@table) { my ($start, $stop, $id, $id2) = @$ar; print ..... }

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thank you roboticus that seems like what I am looking for. Can you give me an example of how to patch up the data, thats where I have been stuck on for some time. Thanks for your help

      # Patch up the data for (my $i=1; $i < $#table; ++$i) { # check surrounding items and patch the data, as needed . . . }
Re: filling in the zeros...please help
by jaredor (Priest) on May 15, 2012 at 18:16 UTC

    If you can hold everything in memory, this should come close to doing what you want. It has the following assumptions/behviors:

    • Each row has the same number of whitespace delimited fields (columns).
    • Zeroes starting a column or ending a column are meant to be zeroes after patching.
    • #!/usr/bin/env perl use Modern::Perl; use Data::Dump qw(pp); my %cols; my @patch_cols = qw(2 3); while (<DATA>) { my @row = split; push @{$cols{$_}}, $row[$_] for (0..$#row); } for my $col (@patch_cols) { my @fwd = @{$cols{$col}}; my @bwd = reverse @fwd; for my $i (1..$#fwd) { $fwd[$i] = $fwd[$i-1] if $fwd[$i] == 0; $bwd[$i] = $bwd[$i-1] if $bwd[$i] == 0; } @bwd = reverse @bwd; $fwd[$_] == $bwd[$_] and $cols{$col}->[$_] = $fwd[$_] for (0..$#fw +d); } say pp(\%cols); say pp(\@patch_cols); __DATA__ 1 100 1 2 101 200 1 2 201 300 1 0 301 400 1 0 401 500 0 2 501 600 0 2 601 700 0 0 701 800 1 1 801 900 1 2
      Thanks jaredor very much it helped me a lot.
      
      
      Do you know or anyone else know how to output the results in the same format at the input file ie tab delimited?
      
      Thanks so much again
      Dave
      

        Dave,

        You have a hash of arrays; each array is the same length; each array represents a column; the hash key for an array is the number of the column (starting at zero) that it represents. Because of this nice regularity, you can use a for loop like above and print out each row (starting from 0) with a simple print statement. (Actually, it would be even simpler to have an array of arrays, but not that much simpler for it to be worthwhile for me to go back and edit code from my mobile device.)

        And since I'm not using a "real" keyboard at the moment, I'm not going to write out any lines of code. However, I probably shouldn't: You need to have the perl skills to do this relatively simple task. If you can't do this, then you probably don't understand what the above code is doing. You need to understand that first.

Re: filling in the zeros...please help
by Kenosis (Priest) on May 15, 2012 at 19:31 UTC

    Thank you, David, for posting the transformation rules; that helped. jaredor provided an excellent solution for you. Here's an option that uses a rules-embedded s///eg on the two columns as strings:

    use Modern::Perl; my ( @table, $id1, $id2, $i ); push @table, [ (split) ] while <DATA>; do { $id1 .= $_->[2]; $id2 .= $_->[3] } for @table; $id1 =~ s/(?<=(.))(0+)(?=(.))/$1==$3?$1 x length $2:$2/eg; $id2 =~ s/(?<=(.))(0+)(?=(.))/$1==$3?$1 x length $2:$2/eg; $i = 0; say "$_->[0]\t$_->[1]\t" . substr( $id1, $i, 1 ) . "\t" . substr( $id2, $i++, 1 ) for @table; __DATA__ 1 100 1 2 101 200 1 2 201 300 1 0 301 400 1 0 401 500 0 2 501 600 0 2 601 700 0 0 701 800 1 1 801 900 1 2

    Output:

    1 100 1 2 101 200 1 2 201 300 1 2 301 400 1 2 401 500 1 2 501 600 1 2 601 700 1 0 701 800 1 1 801 900 1 2

    This solution uses capture results, from within positive look-ahead and positive look-behind assertions, to check the two zero-enclosing digits for sameness by executing a ternary statement whose result is substituted for the enclosed zero(s).

      Thank you very much Kenosis. This is very useful and helped a lot.
Re: filling in the zeros...please help
by Kenosis (Priest) on May 15, 2012 at 14:52 UTC

    david lyon,

    roboticus provided an excellent framework. Can you help me understand or detail the rules for transforming the zeros into non-zero values? Perhaps a little psudocode would help.

      
      HI
      
       Kenosis thanks for your help, so from the above data file input example, then column 3 we have:
      
      1,1,1,1,0,0,0,1,1
      which the 3 zeros can get transformed to 1,1,1 because its surrounded by the same number '1' ie:
      1,1,1,1,1,1,1,1,1
      
      in the next column we have
      2,2,0,0,2,2,0,1,2
      only the 2 zeros in between the 2' can be changed since its surrounded by the same number '2' ie:
      2,2,2,2,2,2,0,1,2
      The zero at the 7th element cant be changed to zero since its surrounded by not the same number '2' and '1'
      
      
      The file has dozens of columns and hundreds of rows and I have to apply this rule  as I walk down each column.
      
      The rule is to fill in zero block in each column only where the numbers before the start and after the end of the zero blocks are the same.
      
      
      Thanks again for everyones help and would appreciate any code.
      
      
Re: filling in the zeros...please help
by Kenosis (Priest) on May 16, 2012 at 01:45 UTC

    You're most welcome, David. Am glad it helped.