thank s for the code.
though the lines 1,2,3 aline well, there should not be a TAB in the beggining of line 0 and group of wordstext with spaces X should
be alined from the line containing most of them (max TAB), for example (ex 0) for the input (in 0)
text with spaces 1 text with spaces 2 text with
+spaces 3 text with spaces 4 line 0
text with spaces 1 text with spaces 2 text with spaces 3
+ text with spaces 4 line 1
text with spaces 1 text with spaces 2 text with spaces 3
+ text with spaces 4 line 2
text with spaces 1 text with spaces 2 te
+xt with spaces 3 text with spaces 4 line 3
There are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4
and line X, X being 0,1,2,3) in each for the 4 lines.
Here is a 4 * 4 array (ar 0) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges
1 4 4 2
1 3 3 4
3 1 2 1
4 4 1 3
the max is 4 for each of the columns
the max number of TABs for each of the columns is 4 4 4 4
there should be 4 TABs between text with spaces 1 and text with spaces 2 , text with spaces 2 and text with spaces 3
text with spaces 3 and text with spaces 4 and text with spaces 4 and line X X being 0,1,2,3
and the output should be
text with spaces 1 text with spaces 2 te
+xt with spaces 3 text with spaces 4 lin
+e 0
text with spaces 1 text with spaces 2 te
+xt with spaces 3 text with spaces 4 lin
+e 1
text with spaces 1 text with spaces 2 te
+xt with spaces 3 text with spaces 4 lin
+e 2
text with spaces 1 text with spaces 2 te
+xt with spaces 3 text with spaces 4 lin
+e 3
the above example (ex 0) (in 0) is a bit unclear as all TAB ranges contain 4 TABs. In the following example (ex 1) (in 1) temp9.txt TAB ranges (max TAB) have a different
number of TABs between group of wordstext with spaces X
text with spaces 1 text with spaces 2 text with spac
+es 3 text with spaces 4 line 0
text with spaces 1 text with spaces 2 text with spaces 3
+ text with spaces 4 line 1
text with spaces 1 text with spaces 2 text with
+spaces 3 text with spaces 4 line 2
text with spaces 1 text with spaces 2 text with spaces 3 text
+ with spaces 4 line 3
As in above example (ex 0), there are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between
text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines.
Here is a 4 * 4 array (ar 1) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges
1 3 3 2
1 3 5 3
2 4 2 1
1 1 1 2
the max number of TABs for each of the columns is 2 4 5 3 as opposed to 4 4 4 4 for the above array (ar 0)
and thus output should be
text with spaces 1 text with spaces 2 text with spaces 3 text
+ with spaces 4 line 0
text with spaces 1 text with spaces 2 text with spaces 3 text
+ with spaces 4 line 1
text with spaces 1 text with spaces 2 text with spaces 3 text
+ with spaces 4 line 2
text with spaces 1 text with spaces 2 text with spaces 3 text
+ with spaces 4 line 3
I have printed the begin/end of each TAB range using print $-[0], ' ', $+[0], ' '; line 20
0 1 19 20 38 41 59 62 80 82
18 19 37 40 58 63 81 84
18 20 38 42 60 62 80 81
18 19 37 38 56 57 75 77
I have printed the maximum column numbers for both TAB bebin and end for each of the TAB ranges between group of wordstext with spaces X
# printing max number of TABs for each of the columns begin ($max_b) a
+nd end ($max_e)
print "max begin\t: " ;
for ($x=0;$x<=$nbr_tab;$x++) {
print $max_b[$x], ' ';
}
print "\nmax end\t\t: ";
for ($x=0;$x<=$nbr_tab;$x++) {
print $max_e[$x], ' ';
}
for debugging only and outputs
max begin : 18 38 60 81 80
max end : 20 42 63 84 82
here text with spaces 1,2,3,4 for all 4 lines 0,1,2,3 all have the same width,that is,18, but in real life text with spaces have variable length as in
here lines 2,3,4 taken from https://www.poetryfoundation.org/poems/55038/phrases (lak of insperation after line 1)
Bob, the rabbit jump above the fence Jack, the cat hid under the po
+rch of the red house Rex, the dog ran after Jack the birds
+fly When the world is reduced to a single dark wood for our tw
+o pairs of dazzled eyes to a musical house for our clear understan
+ding then I shall find you
When we are very strong who draws back? very happy
+ who collapses from ridicule When we are very bad what can
+ they do to us.
The taste of ashes in the air the smell of wood sweating in the hea
+rth steeped flowers the devastation of paths
+ drizzle over the canals in the fields why not already pla
+ythings and incense?
Arousing a pleasant taste of Chinese ink a black powder gently rain
+s on my night I lower the jets of the chandelier throw myse
+lf on the bed and turning toward thedark I see you
+ O my daughters and queens!
should output as follows temp11.txt formatted using Ubuntu Mousepad
Bob, the rabbit jump above the fence Jack, the cat hid under th
+e porch of the red house Rex, the dog ran after Jack the bi
+rds fly When the world is reduced to a single dark wood fo
+r our two pairs of dazzled eyes to a musical house for our clear u
+nderstanding then I shall find you
When we are very strong who draws back?
+ very happy who collapses from ridicule When
+ we are very bad what can they do
+to us.
The taste of ashes in the air the smell of wood sweating in
+ the hearth steeped flowers the devastation of
+paths drizzle over the canals in the fields
+ why not already playthings and incense?
Arousing a pleasant taste of Chinese ink a black powder gently rain
+s on my night I lower the jets of the chandelier throw
+myself on the bed and turning toward thedark
+ I see you O my daughters and queens!
I would like TABs to aline at the rightmost/maximum column number (between $max_b[] and $max_e[]) for each TAB ranges between group of wordstext with spaces X
remember arrays @max_b and @max_e contain the beginning ending column numbers of TABs found using
$max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ;
$max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;
and insert missing TABs (to aline with the longest line) using
print $out_fh $_ =~ y/\t/\t/rs;
I have tried
print $out_fh $_ =~ y/\t/\t{3}/rs;
just to test the use of {} for making the output 3 tabs wide but got the same output.
I have also tried
print $out_fh $_ =~ tr/\t/\t/rs;
I have read https://perldoc.perl.org/perlre on perl regex but can t figure out where to insert $max_b[$tab_index] and $max_e[$tab_index]
within y/\t/\t/rs or tr/\t/\t/rs
I have changed the following lines in the code
my $infile = 'pm_11140114_tab_align_even.dat';
my $outfile = 'pm_11140114_tab_align_even.out';
for
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
because I need to read the input and output filenames from command line arguement $ARGV[0] $ARGV[1]
and added
close $in_fh;
close $out_fh;
I guess you forgot to close files
my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);
for my array of TABs
I also added
while(<$in_fh>) { ... }
at the begginning to read the file before writing to the output file in the second
while(<$in_fh>) { ... }
in order to get the positions of the begin/end of each TAB range inside $max_b and $max_e arrays
before repositioning the file pointer at the beginning using
seek $in_fh,0,0;
I have commented
# use 5.014;
# use warnings;
# use autodie;
because I have useless warnings
Parentheses missing around "my" list at 02-00.pl line 9.
Global symbol "$nbr_tab" requires explicit package name (did you forge
+t to declare "my $nbr_tab"?) at 02-00.pl line 9.
Global symbol "$valid_line" requires explicit package name (did you fo
+rget to declare "my $valid_line"?) at 02-00.pl line 9.
Global symbol "$valid_line" requires explicit package name (did you fo
+rget to declare "my $valid_line"?) at 02-00.pl line 13.
Global symbol "@max" requires explicit package name (did you forget to
+ declare "my @max"?) at 02-00.pl line 17.
Global symbol "@max" requires explicit package name (did you forget to
+ declare "my @max"?) at 02-00.pl line 17.
Global symbol "$nbr_tab" requires explicit package name (did you forge
+t to declare "my $nbr_tab"?) at 02-00.pl line 20.
Global symbol "$nbr_tab" requires explicit package name (did you forge
+t to declare "my $nbr_tab"?) at 02-00.pl line 20.
Execution of 02-00.pl aborted due to compilation errors.
I don t know where to put ($max_e[]) , rightmost/maximum column number within
print $out_fh $_ =~ y/\t/\t/rs;
I am sorry for this but I m quite new to perl and regex and its quite confusing
thank s in advance
here is the re-written code
# use 5.014;
# use warnings;
# use autodie;
$infile = $ARGV[0];
$outfile = $ARGV[1];
my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);
open my $in_fh, '<', $infile;
open my $out_fh, '>', $outfile;
while(<$in_fh>) {
# print "$_\n";
if (/[a-zA-Z0-9]/) {
$valid_line++;
$max_tab = 0;
while (/\t+/g) {
# print $-[0], ' ', $+[0], ' ';
$max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ;
$max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;
print $max_b[$max_tab], ' ', $max_e[$max_tab], ' ', $max_t
+ab, ' ';
$max_tab++;
}
# print "\n";
$nbr_tab = $max_tab if $nbr_tab < $max_tab;
}
}
# printing max number of TABs for each of the columns begin ($max_b) a
+nd end ($max_e) DEBUG
print "max begin\t: " ;
for ($x=0;$x<=$nbr_tab;$x++) {
print $max_b[$x], ' ';
}
print "\nmax end\t\t: ";
for ($x=0;$x<=$nbr_tab;$x++) {
print $max_e[$x], ' ';
}
seek $in_fh,0,0;
while (<$in_fh>) {
print $out_fh $_ =~ y/\t/\t/rs;
}
close $in_fh;
close $out_fh;
|