david_lyon:
Don't try to do it all in one pass, it'll just get too confusing. When you're just starting out, simpler is better. So use one pass to load the array, one pass to patch up the array, and then one to print the array. That way, you can more easily debug each pass. When you get more used to perl, you can start merging passes when you want.
# Read the data table
my @table;
while (<>) {
chomp;
my ($start, $stop, $id, $id2) = split /\t/, $_;
push @table, [ $start, $stop, $id1, $id2 ];
}
# Patch up the data
for (my $i=1; $i < $#table; ++$i) {
# check surrounding items and patch the data, as needed
. . .
}
# Print the new table
for my $ar (@table) {
my ($start, $stop, $id, $id2) = @$ar;
print .....
}
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] |
# Patch up the data
for (my $i=1; $i < $#table; ++$i) {
# check surrounding items and patch the data, as needed
. . .
}
| [reply] [d/l] |
If you can hold everything in memory, this should come close to doing what you want. It has the following assumptions/behviors:
- Each row has the same number of whitespace delimited fields (columns).
- Zeroes starting a column or ending a column are meant to be zeroes after patching.
#!/usr/bin/env perl
use Modern::Perl;
use Data::Dump qw(pp);
my %cols;
my @patch_cols = qw(2 3);
while (<DATA>) {
my @row = split;
push @{$cols{$_}}, $row[$_] for (0..$#row);
}
for my $col (@patch_cols) {
my @fwd = @{$cols{$col}};
my @bwd = reverse @fwd;
for my $i (1..$#fwd) {
$fwd[$i] = $fwd[$i-1] if $fwd[$i] == 0;
$bwd[$i] = $bwd[$i-1] if $bwd[$i] == 0;
}
@bwd = reverse @bwd;
$fwd[$_] == $bwd[$_] and $cols{$col}->[$_] = $fwd[$_] for (0..$#fw
+d);
}
say pp(\%cols);
say pp(\@patch_cols);
__DATA__
1 100 1 2
101 200 1 2
201 300 1 0
301 400 1 0
401 500 0 2
501 600 0 2
601 700 0 0
701 800 1 1
801 900 1 2
| [reply] [d/l] [select] |
Thanks jaredor very much it helped me a lot.
Do you know or anyone else know how to output the results in the same format at the input file ie tab delimited?
Thanks so much again
Dave
| [reply] |
Dave,
You have a hash of arrays; each array is the same length; each array represents a column; the hash key for an array is the number of the column (starting at zero) that it represents. Because of this nice regularity, you can use a for loop like above and print out each row (starting from 0) with a simple print statement. (Actually, it would be even simpler to have an array of arrays, but not that much simpler for it to be worthwhile for me to go back and edit code from my mobile device.)
And since I'm not using a "real" keyboard at the moment, I'm not going to write out any lines of code. However, I probably shouldn't: You need to have the perl skills to do this relatively simple task. If you can't do this, then you probably don't understand what the above code is doing. You need to understand that first.
| [reply] |
Thank you, David, for posting the transformation rules; that helped. jaredor provided an excellent solution for you. Here's an option that uses a rules-embedded s///eg on the two columns as strings:
use Modern::Perl;
my ( @table, $id1, $id2, $i );
push @table, [ (split) ] while <DATA>;
do { $id1 .= $_->[2]; $id2 .= $_->[3] } for @table;
$id1 =~ s/(?<=(.))(0+)(?=(.))/$1==$3?$1 x length $2:$2/eg;
$id2 =~ s/(?<=(.))(0+)(?=(.))/$1==$3?$1 x length $2:$2/eg;
$i = 0;
say "$_->[0]\t$_->[1]\t"
. substr( $id1, $i, 1 ) . "\t"
. substr( $id2, $i++, 1 )
for @table;
__DATA__
1 100 1 2
101 200 1 2
201 300 1 0
301 400 1 0
401 500 0 2
501 600 0 2
601 700 0 0
701 800 1 1
801 900 1 2
Output:
1 100 1 2
101 200 1 2
201 300 1 2
301 400 1 2
401 500 1 2
501 600 1 2
601 700 1 0
701 800 1 1
801 900 1 2
This solution uses capture results, from within positive look-ahead and positive look-behind assertions, to check the two zero-enclosing digits for sameness by executing a ternary statement whose result is substituted for the enclosed zero(s). | [reply] [d/l] [select] |
Thank you very much Kenosis. This is very useful and helped a lot.
| [reply] |
david lyon,
roboticus provided an excellent framework. Can you help me understand or detail the rules for transforming the zeros into non-zero values? Perhaps a little psudocode would help.
| [reply] |
HI
Kenosis thanks for your help, so from the above data file input example, then column 3 we have:
1,1,1,1,0,0,0,1,1
which the 3 zeros can get transformed to 1,1,1 because its surrounded by the same number '1' ie:
1,1,1,1,1,1,1,1,1
in the next column we have
2,2,0,0,2,2,0,1,2
only the 2 zeros in between the 2' can be changed since its surrounded by the same number '2' ie:
2,2,2,2,2,2,0,1,2
The zero at the 7th element cant be changed to zero since its surrounded by not the same number '2' and '1'
The file has dozens of columns and hundreds of rows and I have to apply this rule as I walk down each column.
The rule is to fill in zero block in each column only where the numbers before the start and after the end of the zero blocks are the same.
Thanks again for everyones help and would appreciate any code.
| [reply] |
You're most welcome, David. Am glad it helped.
| [reply] |