perl_boy has asked for the wisdom of the Perl Monks concerning the following question:

I m trying to aline a text file so that TABs are alined from line to line. For example, the following input file (those are TABs not SPACEs)
foo	bar		baz	booz		qaaz			abc
foo		bar	baz		booz	qaaz	abc	123
foo		bar		baz		booz	qaaz		abc
SHOULD output to
foo		bar		baz		booz		qaaz		abc
foo		bar		baz		booz		qaaz		abc	123
foo		bar		baz		booz		qaaz		abc
I read the file twice. The first time to get max position of each TAB stop for each line that I put into $max array. If a line has a Nth TAB stop bigger than any previous line then $max[$max_tab] is augmented
$max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] ;$max_tab++; at line 16
then I go back to the begginning of that file using seek F0,0,0 and insert extra TABs for the lines that are missing and then I add portion line text after that TAB stop at line 30
print substr("\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t", 0, ($max[$max_tab++] +- $+[0]) / 8), substr($_, $-[0], $+[0]);

However the output looks like portion of the previous text of each line gets repeated,that is,previous words gets pushed onto future words (ex: babaz, qabooz, abcqaaz) and TABs don t aline
foobar		babaz	booz		qabooz		qaaz			abcqaaz			abcabc
foobar	baz	baz		booz	qabooz	qaaz	abc	123qaaz	abc	123abc	123													123
foobar		bazbaz		booz	qaabooz	qaaz		abcqaaz		abcabc

I have printed where the TABs begin/end using $-[0] and $+[0] (LAST_MATCH_START/@- and LAST_MATCH_END/@+) at line 15
print $-[0], ' ', $+[0], ' ';
and get the right indexes
3 4 7 9 12 13 17 19 23 26 
3 5 8 9 12 14 18 19 23 24 27 28 
3 5 8 10 13 15 19 20 24 26 

I have read the substr perl function (lines 7 and 33)
https://www.geeksforgeeks.org/perl-substr-function/
https://perldoc.perl.org/functions/substr
https://www.tutorialspoint.com/perl/perl_substr.htm
https://squareperl.com/en/function/substr
and give the right arguements i m trying to figure out what s wrong
thank s for your help
here is the code
$valid_line==0; $nbr_line=$ARGV[1]; # format the first digit @_[0], length @_[1] wide with leading 0s sub format { return substr("00000000", 0, (@_[1] - length(@_[0]))) . @_[0]; } open(F0, $ARGV[0]); while(<F0>) { if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab = 0; while (/\t+/g) { # print $-[0], ' ', $+[0], ' '; $max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] ;$max_tab +++; } # print "\n"; $nbr_tab = $max_tab if $nbr_tab < $max_tab; } } $max_line=$nbr_line+$valid_line; seek F0,0,0; while(<F0>) { s/\r//;chop; $max_tab = 0; while (/[^\t]+/g) { print substr("\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t", 0, ($max[$ma +x_tab++] - $+[0]) / 8), substr($_, $-[0], $+[0]); } print "\n"; } close F0;

Replies are listed 'Best First'.
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by choroba (Cardinal) on Jan 03, 2022 at 20:46 UTC
    I'd approach this differently. Instead of playing with magic variables that make the code hard to read, try to describe the algorithm in a simple way:
    #!/usr/bin/perl use warnings; use strict; my $input = << "EOF"; foo\tbar\t\tbaz\tbooz\t\tqaaz\t\t\tabc foo\t\tbar\tbaz\t\tbooz\tqaaz\tabc\t123 foo\t\tbar\t\tbaz\t\tbooz\tqaaz\t\tabc EOF open my $in, '<', \$input or die $!; my @tab_counts; while (<$in>) { my $i = 0; for my $tab_count (map length, /(\t+)/g) { $tab_counts[$i] = $tab_count if $tab_count > ($tab_counts[$i] +|| 0); ++$i; } } push @tab_counts, 0; # No tab after the last field. seek $in, 0, 0; while (<$in>) { my $i = 0; print $_, "\t" x $tab_counts[$i++] for /\S+/g; print "\n"; }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by tybalt89 (Monsignor) on Jan 04, 2022 at 02:00 UTC

    Don't worry about the indexes, just fix the tabs.

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11140114 use warnings; my $input = << "EOF"; foo\tbar\t\tbaz\tbooz\t\tqaaz\t\t\tabc foo\t\tbar\tbaz\t\tbooz\tqaaz\tabc\t123 foo\t\tbar\t\tbaz\t\tbooz\tqaaz\t\tabc EOF print "$input\n"; open my $in, '<', \$input or die $!; my @tabs; while( <$in> ) { my $index = 0; $tabs[$index++] |= $& while /\t+/g; } seek $in, 0, 0; while( <$in> ) { my $index = 0; print s/\t+/$tabs[$index++]/gr; }
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by kcott (Archbishop) on Jan 04, 2022 at 11:22 UTC

    G'day perl_boy,

    [A couple of notes on presentation: It's good that you've put code within <code>...</code> tags; please also do the same for data and program output (e.g. error messages) so that we can see a verbatim copy of what you're seeing — HTML can modify what you write, e.g. by collapsing whitespace into a single space, which can make a huge difference in many cases. Please also "linkify" URLs; in this case, changing URL to [URL] would have sufficed; again, this helps us to help you (see "What shortcuts can I use for linking to other information?" for more details about that).]

    You've shown your input data as having the same number of characters in each column (columns 1, 2 & 3 have 3 characters: foo, bar, baz; columns 4 & 5 have 4 characters: booz & qaaz; and so on). This could be a realistic representation of your data; for instance, order numbers, product codes, client IDs, and so on, are likely to have the same lengths. If this is the case, the following is a much simpler solution.

    #!/usr/bin/env perl use 5.014; use warnings; use autodie; my $infile = 'pm_11140114_tab_align_even.dat'; my $outfile = 'pm_11140114_tab_align_even.out'; { open my $in_fh, '<', $infile; open my $out_fh, '>', $outfile; while (<$in_fh>) { print $out_fh $_ =~ y/\t/\t/rs; } }

    pm_11140114_tab_align_even.dat:

    foo bar baz booz qaaz + abc foo bar baz booz qaaz abc 123 foo bar baz booz qaaz + abc

    pm_11140114_tab_align_even.out:

    foo bar baz booz qaaz abc foo bar baz booz qaaz abc 123 foo bar baz booz qaaz abc

    Note that this uses the /r option which was introduced in Perl 5.14: "perl5140delta: Non-destructive substitution". If you're using an older version of Perl, change use 5.014; to use strict; and the print statement will need to be split into two statements:

    y/\t/\t/s; print $out_fh $_;

    This gives exactly the same result.

    Your "SHOULD output to" shows two tabs between columns (except for "abc\t123" which I'm going to assume is just a typo). Because y///r and s///r can be chained, you can change

    print $out_fh $_ =~ y/\t/\t/rs;

    to

    print $out_fh $_ =~ y/\t/\t/rs =~ s/\t/\t\t/gr;

    Now, pm_11140114_tab_align_even.out will be:

    foo bar baz booz qaaz + abc foo bar baz booz qaaz + abc 123 foo bar baz booz qaaz + abc

    For older Perls, you'll need to split the print statement into three statements:

    y/\t/\t/s; s/\t/\t\t/g; print $out_fh $_;

    Again, this gives exactly the same result.

    Please either advise whether the input data in you OP is representative or, if not, provide something more realistic such that we can provide better help.

    It would also be useful to know what you intend to do with the output; e.g. print to screen, write to a plain text file, use for CSV, generate an HTML table, etc. With this information, we may be able to provide different (better) advice.

    — Ken

      Or just:

      perl -pe 's/\t+/\t\t/g' infile >outfile
      thank s for the code. though the lines 1,2,3 aline well, there should not be a TAB in the beggining of line 0 and group of wordstext with spaces X should be alined from the line containing most of them (max TAB), for example (ex 0) for the input (in 0)
      text with spaces 1 text with spaces 2 text with +spaces 3 text with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 1 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 2 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 line 3
      There are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 0) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges
      1 4 4 2 1 3 3 4 3 1 2 1 4 4 1 3
      the max is 4 for each of the columns the max number of TABs for each of the columns is 4 4 4 4 there should be 4 TABs between text with spaces 1 and text with spaces 2 , text with spaces 2 and text with spaces 3 text with spaces 3 and text with spaces 4 and text with spaces 4 and line X X being 0,1,2,3 and the output should be
      text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 0 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 1 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 2 text with spaces 1 text with spaces 2 te +xt with spaces 3 text with spaces 4 lin +e 3
      the above example (ex 0) (in 0) is a bit unclear as all TAB ranges contain 4 TABs. In the following example (ex 1) (in 1) temp9.txt TAB ranges (max TAB) have a different number of TABs between group of wordstext with spaces X
      text with spaces 1 text with spaces 2 text with spac +es 3 text with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 + text with spaces 4 line 1 text with spaces 1 text with spaces 2 text with +spaces 3 text with spaces 4 line 2 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 3
      As in above example (ex 0), there are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 1) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges
      1 3 3 2 1 3 5 3 2 4 2 1 1 1 1 2
      the max number of TABs for each of the columns is 2 4 5 3 as opposed to 4 4 4 4 for the above array (ar 0) and thus output should be
      text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 0 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 1 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 2 text with spaces 1 text with spaces 2 text with spaces 3 text + with spaces 4 line 3
      I have printed the begin/end of each TAB range using print  $-[0], ' ', $+[0], ' '; line 20
      0 1 19 20 38 41 59 62 80 82 18 19 37 40 58 63 81 84 18 20 38 42 60 62 80 81 18 19 37 38 56 57 75 77
      I have printed the maximum column numbers for both TAB bebin and end for each of the TAB ranges between group of wordstext with spaces X
      # printing max number of TABs for each of the columns begin ($max_b) a +nd end ($max_e) print "max begin\t: " ; for ($x=0;$x<=$nbr_tab;$x++) { print $max_b[$x], ' '; } print "\nmax end\t\t: "; for ($x=0;$x<=$nbr_tab;$x++) { print $max_e[$x], ' '; }
      for debugging only and outputs
      max begin : 18 38 60 81 80 max end : 20 42 63 84 82
      here text with spaces 1,2,3,4 for all 4 lines 0,1,2,3 all have the same width,that is,18, but in real life text with spaces have variable length as in here lines 2,3,4 taken from https://www.poetryfoundation.org/poems/55038/phrases (lak of insperation after line 1)
      Bob, the rabbit jump above the fence Jack, the cat hid under the po +rch of the red house Rex, the dog ran after Jack the birds +fly When the world is reduced to a single dark wood for our tw +o pairs of dazzled eyes to a musical house for our clear understan +ding then I shall find you When we are very strong who draws back? very happy + who collapses from ridicule When we are very bad what can + they do to us. The taste of ashes in the air the smell of wood sweating in the hea +rth steeped flowers the devastation of paths + drizzle over the canals in the fields why not already pla +ythings and incense? Arousing a pleasant taste of Chinese ink a black powder gently rain +s on my night I lower the jets of the chandelier throw myse +lf on the bed and turning toward thedark I see you + O my daughters and queens!
      should output as follows temp11.txt formatted using Ubuntu Mousepad
      Bob, the rabbit jump above the fence Jack, the cat hid under th +e porch of the red house Rex, the dog ran after Jack the bi +rds fly When the world is reduced to a single dark wood fo +r our two pairs of dazzled eyes to a musical house for our clear u +nderstanding then I shall find you When we are very strong who draws back? + very happy who collapses from ridicule When + we are very bad what can they do +to us. The taste of ashes in the air the smell of wood sweating in + the hearth steeped flowers the devastation of +paths drizzle over the canals in the fields + why not already playthings and incense? Arousing a pleasant taste of Chinese ink a black powder gently rain +s on my night I lower the jets of the chandelier throw +myself on the bed and turning toward thedark + I see you O my daughters and queens!
      I would like TABs to aline at the rightmost/maximum column number (between $max_b[] and $max_e[]) for each TAB ranges between group of wordstext with spaces X remember arrays @max_b and @max_e contain the beginning ending column numbers of TABs found using
      $max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ; $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;
      and insert missing TABs (to aline with the longest line) using
      print $out_fh $_ =~ y/\t/\t/rs;
      I have tried
      print $out_fh $_ =~ y/\t/\t{3}/rs;
      just to test the use of {} for making the output 3 tabs wide but got the same output. I have also tried
      print $out_fh $_ =~ tr/\t/\t/rs;
      I have read https://perldoc.perl.org/perlre on perl regex but can t figure out where to insert $max_b[$tab_index] and $max_e[$tab_index] within y/\t/\t/rs or tr/\t/\t/rs
      I have changed the following lines in the code
      my $infile = 'pm_11140114_tab_align_even.dat'; my $outfile = 'pm_11140114_tab_align_even.out';
      for
      my $infile = $ARGV[0]; my $outfile = $ARGV[1];
      because I need to read the input and output filenames from command line arguement $ARGV[0] $ARGV[1] and added
      close $in_fh; close $out_fh;
      I guess you forgot to close files
      my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);
      for my array of TABs I also added
      while(<$in_fh>) { ... }
      at the begginning to read the file before writing to the output file in the second
      while(<$in_fh>) { ... }
      in order to get the positions of the begin/end of each TAB range inside $max_b and $max_e arrays before repositioning the file pointer at the beginning using
      seek $in_fh,0,0;
      I have commented
      # use 5.014; # use warnings; # use autodie;
      because I have useless warnings
      Parentheses missing around "my" list at 02-00.pl line 9. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 9. Global symbol "$valid_line" requires explicit package name (did you fo +rget to declare "my $valid_line"?) at 02-00.pl line 9. Global symbol "$valid_line" requires explicit package name (did you fo +rget to declare "my $valid_line"?) at 02-00.pl line 13. Global symbol "@max" requires explicit package name (did you forget to + declare "my @max"?) at 02-00.pl line 17. Global symbol "@max" requires explicit package name (did you forget to + declare "my @max"?) at 02-00.pl line 17. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 20. Global symbol "$nbr_tab" requires explicit package name (did you forge +t to declare "my $nbr_tab"?) at 02-00.pl line 20. Execution of 02-00.pl aborted due to compilation errors.
      I don t know where to put ($max_e[]) , rightmost/maximum column number within
      print $out_fh $_ =~ y/\t/\t/rs;
      I am sorry for this but I m quite new to perl and regex and its quite confusing thank s in advance here is the re-written code
      # use 5.014; # use warnings; # use autodie; $infile = $ARGV[0]; $outfile = $ARGV[1]; my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e); open my $in_fh, '<', $infile; open my $out_fh, '>', $outfile; while(<$in_fh>) { # print "$_\n"; if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab = 0; while (/\t+/g) { # print $-[0], ' ', $+[0], ' '; $max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ; $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ; print $max_b[$max_tab], ' ', $max_e[$max_tab], ' ', $max_t +ab, ' '; $max_tab++; } # print "\n"; $nbr_tab = $max_tab if $nbr_tab < $max_tab; } } # printing max number of TABs for each of the columns begin ($max_b) a +nd end ($max_e) DEBUG print "max begin\t: " ; for ($x=0;$x<=$nbr_tab;$x++) { print $max_b[$x], ' '; } print "\nmax end\t\t: "; for ($x=0;$x<=$nbr_tab;$x++) { print $max_e[$x], ' '; } seek $in_fh,0,0; while (<$in_fh>) { print $out_fh $_ =~ y/\t/\t/rs; } close $in_fh; close $out_fh;

        Much of what you wrote in reply to my post seems to have no bearing whatsoever on my post. For instance, "there should not be a TAB in the beggining ...": none of the output I showed had a tab at the beginning of any line. I'm going to ignore all such content. Please ensure you're replying to the correct post; and, for clarity, add some indication showing to what your response refers (e.g. You wrote X; I think Y).

        Where and how you get your filenames is entirely up to you. I used hard-coded filenames for demo purposes only. I often use a prefix of pm_NODE-ID_ for demo files: it provides unique names as well as a reference back to the associated PM node. If you're reading from @ARGV, you should include some sanity checking; in this instance, check that @ARGV has two elements, with the first being a valid file. Also take a look at Getopt::Long.

        "I have tried print $out_fh $_ =~ y/\t/\t{3}/rs;"

        That's not how transliteration works. See y/// and consider:

        $ perl -E 'my $x = "A\tB\t\tC"; say $x; say $x =~ y/\tABC/\t{3}/rs;' A B C { 3 }
        "I guess you forgot to close files"

        No, I certainly did not forget to do that. I declared, and used, lexical filehandles in the smallest scope possible (the anonymous block). Perl automatically closes files at the end of that scope.

        I also didn't forget to check for I/O exceptions. Again, Perl does this for me via the autodie pragma.

        I have commented

        # use 5.014; # use warnings; # use autodie;

        because I have useless warnings ...

        That's a very bad move and I strongly recommend that you do not do this. Parentheses missing around ... is the only warning; all the rest are errors (note the Execution of 02-00.pl aborted due to compilation errors. as the last line). Furthermore, none of those messages are "useless"!

        As you're not checking for I/O exceptions, you should definitely use the autodie pragma and let Perl do it for you.

        — Ken

Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by Marshall (Canon) on Jan 05, 2022 at 03:31 UTC
    Just for fun, I present code for a different, but related problem.
    Usually the idea is to compress as many columns onto the page as possible. This means removing tabs instead of adding them except when necessary for column alignment and allowing for a least one blank space between columns.

    My code using tabs within code tags doesn't render exactly right on Perl Monks and I don't know why.

    use strict; use warnings; print "123456789\n"; print "\t1\n"; __END__ 123456789 1
    The "1" should be underneath the 9, but with my browser it is not.

    Be that presentation problem be as it may, this code adjusts the tabs correctly for the given input, at least as viewed with my program editor. I think I handled the "off by one" situation correctly, mileage varies.

    use strict; use warnings; use Data::Dump qw(dump dd); $|=1; my $input2 = << "EOF"; foo bar baz booz qaaz abc foo bar baz booz qaaz abc 123 foo bar thisis15chars15 booz qaaz abc EOF use constant {TAB_SPACES =>8}; # normal default is 8 ####### # Table 2 is more complex - reduce tabs when possible, # add tabs when needed. # # As each line is read, the maximum required width of each column # is calculated. # # Table is stored in @table2 without separators. # # Reformatted table is output assuming TAB_SPACES # open my $input2_fh, "<", \$input2 or die "$!"; print "********\nTable2 input in raw form:\n********\n"; my @table2; my @max_chars; while (<$input2_fh>) { print; chomp; my $i=0; my @tokens; foreach my $field (@tokens = split /\t+/,$_) { $max_chars[$i] //= 0; $max_chars[$i] = length $field if (length $field > $max_chars[ +$i]); $i++; } push (@table2,[@tokens]); } print "\nData dump of Table2:\n"; dd \@table2; print "\n******\nReformatted Table:\n*****\n"; foreach my $row_ref (@table2) { my $i = 0; my @line = @$row_ref; while (defined (my $field = shift @line)) { my $alignment_spaces = $max_chars[$i]-length($field); my $n_tabs = int($alignment_spaces/TAB_SPACES)+1; print "".$field, (@line) ? "\t" x $n_tabs : "\n"; $i++; } } __END__ ******** Table2 input in raw form: ******** foo bar baz booz qaaz abc foo bar baz booz qaaz abc 123 foo bar thisis15chars15 booz qaaz abc Data dump of Table2: [ ["foo", "bar", "baz", "booz", "qaaz", "abc"], ["foo", "bar", "baz", "booz", "qaaz", "abc", 123], ["foo", "bar", "thisis15chars15", "booz", "qaaz", "abc"], ] ****** Reformatted Table: ***** foo bar baz booz qaaz abc foo bar baz booz qaaz abc 123 foo bar thisis15chars15 booz qaaz abc
    When run on my computer, all of the "booz" column lines up.
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by perl_boy (Novice) on Jan 21, 2022 at 15:15 UTC
    Thank's for your help with regex.
    The program has been finished and some features have been added, those are reference line numbers at the beginning/ending of each line in the output file. TABs have been changed to SPACEs for portability with Unicode.
    The program read the input file to collect the maximum line reference number (max_line), number of line containing text (regex /[a-zA-Z0-9]/) in the input file (valid_line) , the number of group of words read from the input file for the max array (nbr_max_tab) and the respective TABs stop position in max array.


    variable name variable type line number in program description
    max_line scalar number {16 27} the maximum line reference number when writing to output file by adding nbr_line to valid_line
    max_tab scalar number {3 8 10 11 13 22 25} current index in the max array
    nbr_line scalar number {3 16 23 29} current line number when writing line reference numbers
    nbr_max_tab scalar number {3 13} number of index numbers (size) in the max array
    valid_line scalar number {3 8 16} number of line containing text (regex /[a-zA-Z0-9]/) in the input file
    max array number {3 10 25} array containing the maximum TAB stop column number for each group of words read from the input file
    ARGV array misc {1 3 4 23 27} array containing command line arguements passed to the program, those are
    0 input file to read from
    1 output file to write to
    2 number of 0s to prepend to line numbers when writing line reference numbers
    3 starting number for line numbers when writing line reference numbers 
    4 number of SPACEs between reference numbers {2 3} and line content ($_) when writing to output file (1)
    
    LAST_MATCH_START/@-/$-[0] array number 25 column where TABs start within line and between group of words
    LAST_MATCH_END/@+/$+[0] array number {10 25} column where TABs end within line and between group of words
    $_ scalar misc 25 last line read from the input file
    {F0 F1} scalar pointer {4 6 17 19 23 25 27 28 32} position in input/output files




    The input.txt and output.txt file have used Ubuntu Mousepad http://www.xfce.org/ having vertical scrolling which vi lacks input have been purposely misaligned and containing empty lines
    input.txt
    Bob, the rabbit jump above the fence Jack, the cat hid under th +e porch of the red house Rex, the dog ran after Jack th +e birds fly When the world is reduced to a single dark wood fo +r our two pairs of dazzled eyes to a musical house for our cle +ar understanding then I shall find you When we are very strong who draws back? very happy + who collapses from ridicule When we are very bad what + can they do to us. The taste of ashes in the air the smell of wood sweating in the + hearth steeped flowers the devastation of paths + drizzle over the canals in the fields why not already playthi +ngs and incense? Arousing a pleasant taste of Chinese ink a black powder gently +rains on my night I lower the jets of the chandelier th +row myself on the bed and turning toward thedark I see +you O my daughters and queens!

    output line reference number at begin/end and group of words are well alined
    output.txt
    00 Bob, the rabbit jump above the fence Jack, the cat hid +under the porch of the red house + Rex, the dog ran after Jack + the birds fly + + When the w +orld is reduced to a single dark wood for our two pairs of dazzled ey +es + +to a musical house for our clear understanding + + + then I shal +l find you {00 .. 03} 01 When we are very strong who draws back? + + very happy + who collapses fr +om ridicule + When we ar +e very bad + + +what can they do to us. + + + + {00 .. 03} 02 The taste of ashes in the air the smell of wood +sweating in the hearth + steeped flowers + the devastation +of paths + drizzle ov +er the canals in the fields + + +why not already playthings and incense? + + + + {00 .. 03} 03 Arousing a pleasant taste of Chinese ink a black powder gen +tly rains on my night + I lower the jets of the chandelier + throw myself on +the bed + and turnin +g toward thedark + + +I see you + + + O my daught +ers and queens! {00 .. 03}

    shell command to call the program with the right arguement
    usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NUMBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_NUMBER_AND_LINE> ex: perl format-pre-post-nbr-SPACE.pl input.txt output-0.txt 2 0 8

    code

    die "usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NU +MBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_ +NUMBER_AND_LINE>\n" if $#ARGV < 4; $valid_line=$nbr_max_tab=0;$max[0]=0;$nbr_line=$ARGV[3]; open(F0, $ARGV[0]); open(F1, ">$ARGV[1]"); while(<F0>) { if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab=1; while (/\t+/g) { $max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] || $max[$ +max_tab] eq ""; $max_tab++; } $nbr_max_tab = $max_tab if $nbr_max_tab < $max_tab; } } $valid_line--;$max_line=$nbr_line+$valid_line; seek F0,0,0; while(<F0>) { s/\r//;chop; if (/[a-zA-Z0-9]/) { $max_tab=1; print F1 "0" x ($ARGV[2] - length($nbr_line)), $nbr_line, " " +x $ARGV[4]; while (/[^\t]+/g) { print F1 substr($_, $-[0], ($+[0] - $-[0])), " " x ($max[ +$max_tab++] - ($+[0] - $-[0])); } print F1 " " x $ARGV[4], "{", "0" x ($ARGV[2] - length($ARGV[3 +])), $ARGV[3], " .. ", "0" x ($ARGV[2] - length($max_line)), $max_lin +e, "}"; print F1 "\n"; $nbr_line++; } } close F0;close F1;

    group of words : text NOT containing TABs /[^\t]/