thank s for the code. though the lines 1,2,3 aline well, there should not be a TAB in the beggining of line 0 and group of wordstext with spaces X should be alined from the line containing most of them (max TAB), for example (ex 0) for the input (in 0)
text with spaces 1 text with spaces 2 text with +spaces 3 text with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 1 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 2 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 line 3
There are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 0) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges
1 4 4 2 1 3 3 4 3 1 2 1 4 4 1 3
the max is 4 for each of the columns the max number of TABs for each of the columns is 4 4 4 4 there should be 4 TABs between text with spaces 1 and text with spaces 2 , text with spaces 2 and text with spaces 3 text with spaces 3 and text with spaces 4 and text with spaces 4 and line X X being 0,1,2,3 and the output should be
text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 0 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 1 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 2 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 3
the above example (ex 0) (in 0) is a bit unclear as all TAB ranges contain 4 TABs. In the following example (ex 1) (in 1) temp9.txt TAB ranges (max TAB) have a different number of TABs between group of wordstext with spaces X
text with spaces 1 text with spaces 2 text with spac +es 3 text with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 1 text with spaces 1 text with spaces 2 text with +spaces 3 text with spaces 4 line 2 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 3
As in above example (ex 0), there are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 1) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges
1 3 3 2 1 3 5 3 2 4 2 1 1 1 1 2
the max number of TABs for each of the columns is 2 4 5 3 as opposed to 4 4 4 4 for the above array (ar 0) and thus output should be
text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 1 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 2 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 3
I have printed the begin/end of each TAB range using print  $-[0], ' ', $+[0], ' '; line 20
0 1 19 20 38 41 59 62 80 82 18 19 37 40 58 63 81 84 18 20 38 42 60 62 80 81 18 19 37 38 56 57 75 77
I have printed the maximum column numbers for both TAB bebin and end for each of the TAB ranges between group of wordstext with spaces X
# printing max number of TABs for each of the columns begin ($max_b) a +nd end ($max_e) print "max begin\t: " ; for ($x=0;$x<=$nbr_tab;$x++) { print $max_b[$x], ' '; } print "\nmax end\t\t: "; for ($x=0;$x<=$nbr_tab;$x++) { print $max_e[$x], ' '; }
for debugging only and outputs
max begin : 18 38 60 81 80 max end : 20 42 63 84 82
here text with spaces 1,2,3,4 for all 4 lines 0,1,2,3 all have the same width,that is,18, but in real life text with spaces have variable length as in here lines 2,3,4 taken from https://www.poetryfoundation.org/poems/55038/phrases (lak of insperation after line 1)
Bob, the rabbit jump above the fence Jack, the cat hid under the po +rch of the red house Rex, the dog ran after Jack the birds +fly When the world is reduced to a single dark wood for our tw +o pairs of dazzled eyes to a musical house for our clear understan +ding then I shall find you When we are very strong who draws back? very happy + who collapses from ridicule When we are very bad what can + they do to us. The taste of ashes in the air the smell of wood sweating in the hea +rth steeped flowers the devastation of paths + drizzle over the canals in the fields why not already pla +ythings and incense? Arousing a pleasant taste of Chinese ink a black powder gently rain +s on my night I lower the jets of the chandelier throw myse +lf on the bed and turning toward thedark I see you + O my daughters and queens!
should output as follows temp11.txt formatted using Ubuntu Mousepad
Bob, the rabbit jump above the fence Jack, the cat hid under th +e porch of the red house Rex, the dog ran after Jack the bi +rds fly When the world is reduced to a single dark wood fo +r our two pairs of dazzled eyes to a musical house for our clear u +nderstanding then I shall find you When we are very strong who draws back? + very happy who collapses from ridicule When + we are very bad what can they do +to us. The taste of ashes in the air the smell of wood sweating in + the hearth steeped flowers the devastation of +paths drizzle over the canals in the fields + why not already playthings and incense? Arousing a pleasant taste of Chinese ink a black powder gently rain +s on my night I lower the jets of the chandelier throw +myself on the bed and turning toward thedark + I see you O my daughters and queens!
I would like TABs to aline at the rightmost/maximum column number (between $max_b[] and $max_e[]) for each TAB ranges between group of wordstext with spaces X remember arrays @max_b and @max_e contain the beginning ending column numbers of TABs found using
$max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ; $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;
and insert missing TABs (to aline with the longest line) using
print $out_fh $_ =~ y/\t/\t/rs;
I have tried
print $out_fh $_ =~ y/\t/\t{3}/rs;
just to test the use of {} for making the output 3 tabs wide but got the same output. I have also tried
print $out_fh $_ =~ tr/\t/\t/rs;
I have read https://perldoc.perl.org/perlre on perl regex but can t figure out where to insert $max_b[$tab_index] and $max_e[$tab_index] within y/\t/\t/rs or tr/\t/\t/rs
I have changed the following lines in the code
my $infile = 'pm_11140114_tab_align_even.dat'; my $outfile = 'pm_11140114_tab_align_even.out';
for
my $infile = $ARGV[0]; my $outfile = $ARGV[1];
because I need to read the input and output filenames from command line arguement $ARGV[0] $ARGV[1] and added
close $in_fh; close $out_fh;
I guess you forgot to close files
my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);
for my array of TABs I also added
while(<$in_fh>) { ... }
at the begginning to read the file before writing to the output file in the second
while(<$in_fh>) { ... }
in order to get the positions of the begin/end of each TAB range inside $max_b and $max_e arrays before repositioning the file pointer at the beginning using
seek $in_fh,0,0;
I have commented
# use 5.014; # use warnings; # use autodie;
because I have useless warnings
Parentheses missing around "my" list at 02-00.pl line 9. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 9. Global symbol "$valid_line" requires explicit package name (did you fo +rget to declare "my $valid_line"?) at 02-00.pl line 9. Global symbol "$valid_line" requires explicit package name (did you fo +rget to declare "my $valid_line"?) at 02-00.pl line 13. Global symbol "@max" requires explicit package name (did you forget to + declare "my @max"?) at 02-00.pl line 17. Global symbol "@max" requires explicit package name (did you forget to + declare "my @max"?) at 02-00.pl line 17. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 20. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 20. Execution of 02-00.pl aborted due to compilation errors.
I don t know where to put ($max_e[]) , rightmost/maximum column number within
print $out_fh $_ =~ y/\t/\t/rs;
I am sorry for this but I m quite new to perl and regex and its quite confusing thank s in advance here is the re-written code
# use 5.014; # use warnings; # use autodie; $infile = $ARGV[0]; $outfile = $ARGV[1]; my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e); open my $in_fh, '<', $infile; open my $out_fh, '>', $outfile; while(<$in_fh>) { # print "$_\n"; if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab = 0; while (/\t+/g) { # print $-[0], ' ', $+[0], ' '; $max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ; $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ; print $max_b[$max_tab], ' ', $max_e[$max_tab], ' ', $max_t +ab, ' '; $max_tab++; } # print "\n"; $nbr_tab = $max_tab if $nbr_tab < $max_tab; } } # printing max number of TABs for each of the columns begin ($max_b) a +nd end ($max_e) DEBUG print "max begin\t: " ; for ($x=0;$x<=$nbr_tab;$x++) { print $max_b[$x], ' '; } print "\nmax end\t\t: "; for ($x=0;$x<=$nbr_tab;$x++) { print $max_e[$x], ' '; } seek $in_fh,0,0; while (<$in_fh>) { print $out_fh $_ =~ y/\t/\t/rs; } close $in_fh; close $out_fh;

In reply to Re^2: misalined TABs using substr,LAST_MATCH_START/END,regex by perl_boy
in thread misalined TABs using substr,LAST_MATCH_START/END,regex by perl_boy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.