in reply to Re: misalined TABs using substr,LAST_MATCH_START/END,regex
in thread misalined TABs using substr,LAST_MATCH_START/END,regex
There are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 0) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB rangestext with spaces 1 text with spaces 2 text with +spaces 3 text with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 1 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 2 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 line 3
the max is 4 for each of the columns the max number of TABs for each of the columns is 4 4 4 4 there should be 4 TABs between text with spaces 1 and text with spaces 2 , text with spaces 2 and text with spaces 3 text with spaces 3 and text with spaces 4 and text with spaces 4 and line X X being 0,1,2,3 and the output should be1 4 4 2 1 3 3 4 3 1 2 1 4 4 1 3
the above example (ex 0) (in 0) is a bit unclear as all TAB ranges contain 4 TABs. In the following example (ex 1) (in 1) temp9.txt TAB ranges (max TAB) have a different number of TABs between group of wordstext with spaces Xtext with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 0 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 1 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 2 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 3
As in above example (ex 0), there are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 1) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB rangestext with spaces 1 text with spaces 2 text with spac +es 3 text with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 1 text with spaces 1 text with spaces 2 text with +spaces 3 text with spaces 4 line 2 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 3
the max number of TABs for each of the columns is 2 4 5 3 as opposed to 4 4 4 4 for the above array (ar 0) and thus output should be1 3 3 2 1 3 5 3 2 4 2 1 1 1 1 2
I have printed the begin/end of each TAB range using print $-[0], ' ', $+[0], ' '; line 20text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 1 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 2 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 3
I have printed the maximum column numbers for both TAB bebin and end for each of the TAB ranges between group of wordstext with spaces X0 1 19 20 38 41 59 62 80 82 18 19 37 40 58 63 81 84 18 20 38 42 60 62 80 81 18 19 37 38 56 57 75 77
for debugging only and outputs# printing max number of TABs for each of the columns begin ($max_b) a +nd end ($max_e) print "max begin\t: " ; for ($x=0;$x<=$nbr_tab;$x++) { print $max_b[$x], ' '; } print "\nmax end\t\t: "; for ($x=0;$x<=$nbr_tab;$x++) { print $max_e[$x], ' '; }
here text with spaces 1,2,3,4 for all 4 lines 0,1,2,3 all have the same width,that is,18, but in real life text with spaces have variable length as in here lines 2,3,4 taken from https://www.poetryfoundation.org/poems/55038/phrases (lak of insperation after line 1)max begin : 18 38 60 81 80 max end : 20 42 63 84 82
should output as follows temp11.txt formatted using Ubuntu MousepadBob, the rabbit jump above the fence Jack, the cat hid under the po +rch of the red house Rex, the dog ran after Jack the birds +fly When the world is reduced to a single dark wood for our tw +o pairs of dazzled eyes to a musical house for our clear understan +ding then I shall find you When we are very strong who draws back? very happy + who collapses from ridicule When we are very bad what can + they do to us. The taste of ashes in the air the smell of wood sweating in the hea +rth steeped flowers the devastation of paths + drizzle over the canals in the fields why not already pla +ythings and incense? Arousing a pleasant taste of Chinese ink a black powder gently rain +s on my night I lower the jets of the chandelier throw myse +lf on the bed and turning toward thedark I see you + O my daughters and queens!
I would like TABs to aline at the rightmost/maximum column number (between $max_b[] and $max_e[]) for each TAB ranges between group of wordstext with spaces X remember arrays @max_b and @max_e contain the beginning ending column numbers of TABs found usingBob, the rabbit jump above the fence Jack, the cat hid under th +e porch of the red house Rex, the dog ran after Jack the bi +rds fly When the world is reduced to a single dark wood fo +r our two pairs of dazzled eyes to a musical house for our clear u +nderstanding then I shall find you When we are very strong who draws back? + very happy who collapses from ridicule When + we are very bad what can they do +to us. The taste of ashes in the air the smell of wood sweating in + the hearth steeped flowers the devastation of +paths drizzle over the canals in the fields + why not already playthings and incense? Arousing a pleasant taste of Chinese ink a black powder gently rain +s on my night I lower the jets of the chandelier throw +myself on the bed and turning toward thedark + I see you O my daughters and queens!
and insert missing TABs (to aline with the longest line) using$max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ; $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;
I have triedprint $out_fh $_ =~ y/\t/\t/rs;
just to test the use of {} for making the output 3 tabs wide but got the same output. I have also triedprint $out_fh $_ =~ y/\t/\t{3}/rs;
I have read https://perldoc.perl.org/perlre on perl regex but can t figure out where to insert $max_b[$tab_index] and $max_e[$tab_index] within y/\t/\t/rs or tr/\t/\t/rsprint $out_fh $_ =~ tr/\t/\t/rs;
formy $infile = 'pm_11140114_tab_align_even.dat'; my $outfile = 'pm_11140114_tab_align_even.out';
because I need to read the input and output filenames from command line arguement $ARGV[0] $ARGV[1] and addedmy $infile = $ARGV[0]; my $outfile = $ARGV[1];
I guess you forgot to close filesclose $in_fh; close $out_fh;
for my array of TABs I also addedmy ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);
at the begginning to read the file before writing to the output file in the secondwhile(<$in_fh>) { ... }
in order to get the positions of the begin/end of each TAB range inside $max_b and $max_e arrays before repositioning the file pointer at the beginning usingwhile(<$in_fh>) { ... }
I have commentedseek $in_fh,0,0;
because I have useless warnings# use 5.014; # use warnings; # use autodie;
I don t know where to put ($max_e[]) , rightmost/maximum column number withinParentheses missing around "my" list at 02-00.pl line 9. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 9. Global symbol "$valid_line" requires explicit package name (did you fo +rget to declare "my $valid_line"?) at 02-00.pl line 9. Global symbol "$valid_line" requires explicit package name (did you fo +rget to declare "my $valid_line"?) at 02-00.pl line 13. Global symbol "@max" requires explicit package name (did you forget to + declare "my @max"?) at 02-00.pl line 17. Global symbol "@max" requires explicit package name (did you forget to + declare "my @max"?) at 02-00.pl line 17. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 20. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 20. Execution of 02-00.pl aborted due to compilation errors.
I am sorry for this but I m quite new to perl and regex and its quite confusing thank s in advance here is the re-written codeprint $out_fh $_ =~ y/\t/\t/rs;
# use 5.014; # use warnings; # use autodie; $infile = $ARGV[0]; $outfile = $ARGV[1]; my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e); open my $in_fh, '<', $infile; open my $out_fh, '>', $outfile; while(<$in_fh>) { # print "$_\n"; if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab = 0; while (/\t+/g) { # print $-[0], ' ', $+[0], ' '; $max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ; $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ; print $max_b[$max_tab], ' ', $max_e[$max_tab], ' ', $max_t +ab, ' '; $max_tab++; } # print "\n"; $nbr_tab = $max_tab if $nbr_tab < $max_tab; } } # printing max number of TABs for each of the columns begin ($max_b) a +nd end ($max_e) DEBUG print "max begin\t: " ; for ($x=0;$x<=$nbr_tab;$x++) { print $max_b[$x], ' '; } print "\nmax end\t\t: "; for ($x=0;$x<=$nbr_tab;$x++) { print $max_e[$x], ' '; } seek $in_fh,0,0; while (<$in_fh>) { print $out_fh $_ =~ y/\t/\t/rs; } close $in_fh; close $out_fh;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: misalined TABs using substr,LAST_MATCH_START/END,regex
by kcott (Archbishop) on Jan 12, 2022 at 01:11 UTC |