misalined TABs using substr,LAST_MATCH

#!/usr/bin/perl
use warnings;
use strict;

my $input = << "EOF";
foo\tbar\t\tbaz\tbooz\t\tqaaz\t\t\tabc
foo\t\tbar\tbaz\t\tbooz\tqaaz\tabc\t123
foo\t\tbar\t\tbaz\t\tbooz\tqaaz\t\tabc
EOF

open my $in, '<', \$input or die $!;

my @tab_counts;
while (<$in>) {
    my $i = 0;
    for my $tab_count (map length, /(\t+)/g) {
        $tab_counts[$i] = $tab_count if $tab_count > ($tab_counts[$i] 
+|| 0);
        ++$i;
    }
}
push @tab_counts, 0;  # No tab after the last field.

seek $in, 0, 0;
while (<$in>) {
    my $i = 0;
    print $_, "\t" x $tab_counts[$i++] for /\S+/g;
    print "\n";
}
[download]

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

[reply]
[d/l]
[select]

Don't worry about the indexes, just fix the tabs.

#!/usr/bin/perl

use strict; # https://perlmonks.org/?node_id=11140114
use warnings;

my $input = << "EOF";
foo\tbar\t\tbaz\tbooz\t\tqaaz\t\t\tabc
foo\t\tbar\tbaz\t\tbooz\tqaaz\tabc\t123
foo\t\tbar\t\tbaz\t\tbooz\tqaaz\t\tabc
EOF
print "$input\n";

open my $in, '<', \$input or die $!;
my @tabs;
while( <$in> )
  {
  my $index = 0;
  $tabs[$index++] |= $& while /\t+/g;
  }
seek $in, 0, 0;
while( <$in> )
  {
  my $index = 0;
  print s/\t+/$tabs[$index++]/gr;
  }
[download]

[reply]
[d/l]

G'day perl_boy,

[A couple of notes on presentation: It's good that you've put code within <code>...</code> tags; please also do the same for data and program output (e.g. error messages) so that we can see a verbatim copy of what you're seeing — HTML can modify what you write, e.g. by collapsing whitespace into a single space, which can make a huge difference in many cases. Please also "linkify" URLs; in this case, changing URL to [URL] would have sufficed; again, this helps us to help you (see "What shortcuts can I use for linking to other information?" for more details about that).]

You've shown your input data as having the same number of characters in each column (columns 1, 2 & 3 have 3 characters: foo, bar, baz; columns 4 & 5 have 4 characters: booz & qaaz; and so on). This could be a realistic representation of your data; for instance, order numbers, product codes, client IDs, and so on, are likely to have the same lengths. If this is the case, the following is a much simpler solution.

#!/usr/bin/env perl

use 5.014;
use warnings;
use autodie;

my $infile = 'pm_11140114_tab_align_even.dat';
my $outfile = 'pm_11140114_tab_align_even.out';

{
    open my $in_fh, '<', $infile;
    open my $out_fh, '>', $outfile;

    while (<$in_fh>) {
        print $out_fh $_ =~ y/\t/\t/rs;
    }
}
[download]

pm_11140114_tab_align_even.dat:

foo     bar             baz     booz            qaaz                  
+  abc
foo             bar     baz             booz    qaaz    abc     123
foo             bar             baz             booz    qaaz          
+  abc
[download]

pm_11140114_tab_align_even.out:

foo     bar     baz     booz    qaaz    abc
foo     bar     baz     booz    qaaz    abc     123
foo     bar     baz     booz    qaaz    abc
[download]

Note that this uses the /r option which was introduced in Perl 5.14: "perl5140delta: Non-destructive substitution". If you're using an older version of Perl, change use 5.014; to use strict; and the print statement will need to be split into two statements:

        y/\t/\t/s;
        print $out_fh $_;
[download]

This gives exactly the same result.

Your "SHOULD output to" shows two tabs between columns (except for "abc\t123" which I'm going to assume is just a typo). Because y///r and s///r can be chained, you can change

print $out_fh $_ =~ y/\t/\t/rs;
[download]

to

print $out_fh $_ =~ y/\t/\t/rs =~ s/\t/\t\t/gr;
[download]

Now, pm_11140114_tab_align_even.out will be:

foo             bar             baz             booz            qaaz  
+          abc
foo             bar             baz             booz            qaaz  
+          abc             123
foo             bar             baz             booz            qaaz  
+          abc
[download]

For older Perls, you'll need to split the print statement into three statements:

        y/\t/\t/s;
        s/\t/\t\t/g;
        print $out_fh $_;
[download]

Again, this gives exactly the same result.

Please either advise whether the input data in you OP is representative or, if not, provide something more realistic such that we can provide better help.

It would also be useful to know what you intend to do with the output; e.g. print to screen, write to a plain text file, use for CSV, generate an HTML table, etc. With this information, we may be able to provide different (better) advice.

— Ken

[reply]
[d/l]
[select]

Or just:

perl -pe 's/\t+/\t\t/g'  infile  >outfile
[download]

[reply]
[d/l]

text with spaces X

    text with spaces 1    text with spaces 2                text with 
+spaces 3                text with spaces 4        line 0
text with spaces 1    text with spaces 2            text with spaces 3
+            text with spaces 4                line 1
text with spaces 1            text with spaces 2    text with spaces 3
+        text with spaces 4    line 2
text with spaces 1                text with spaces 2                te
+xt with spaces 3    text with spaces 4            line 3
[download]

text with spaces

text with spaces 4

1 4 4 2
1 3 3 4
3 1 2 1
4 4 1 3
[download]

4 4 4 4

text with spaces 1

text with spaces 2

text with spaces 3

text with spaces 4

line X

text with spaces 1                text with spaces 2                te
+xt with spaces 3                text with spaces 4                lin
+e 0
text with spaces 1                text with spaces 2                te
+xt with spaces 3                text with spaces 4                lin
+e 1
text with spaces 1                text with spaces 2                te
+xt with spaces 3                text with spaces 4                lin
+e 2
text with spaces 1                text with spaces 2                te
+xt with spaces 3                text with spaces 4                lin
+e 3
[download]

text with spaces X

    text with spaces 1    text with spaces 2            text with spac
+es 3            text with spaces 4        line 0
text with spaces 1    text with spaces 2            text with spaces 3
+                    text with spaces 4            line 1
text with spaces 1        text with spaces 2                text with 
+spaces 3        text with spaces 4    line 2
text with spaces 1    text with spaces 2    text with spaces 3    text
+ with spaces 4        line 3
[download]

text with spaces

text with spaces 4

1 3 3 2
1 3 5 3
2 4 2 1
1 1 1 2
[download]

2 4 5 3

4 4 4 4

text with spaces 1    text with spaces 2    text with spaces 3    text
+ with spaces 4    line 0
text with spaces 1    text with spaces 2    text with spaces 3    text
+ with spaces 4    line 1
text with spaces 1    text with spaces 2    text with spaces 3    text
+ with spaces 4    line 2
text with spaces 1    text with spaces 2    text with spaces 3    text
+ with spaces 4    line 3
[download]

print $-[0], ' ', $+[0], ' ';

0 1 19 20 38 41 59 62 80 82
18 19 37 40 58 63 81 84
18 20 38 42 60 62 80 81
18 19 37 38 56 57 75 77
[download]

text with spaces X

# printing max number of TABs for each of the columns begin ($max_b) a
+nd end ($max_e)
print "max begin\t: " ;
for ($x=0;$x<=$nbr_tab;$x++) {
    print $max_b[$x], ' ';
}
print "\nmax end\t\t: ";
for ($x=0;$x<=$nbr_tab;$x++) {
    print $max_e[$x], ' ';
}
[download]

max begin    : 18 38 60 81 80  
max end        : 20 42 63 84 82
[download]

text with spaces

https://www.poetryfoundation.org/poems/55038/phrases

Bob, the rabbit jump above the fence    Jack, the cat hid under the po
+rch of the red house        Rex, the dog ran after Jack    the birds 
+fly        When the world is reduced to a single dark wood for our tw
+o pairs of dazzled eyes    to a musical house for our clear understan
+ding        then I shall find you
When we are very strong        who draws back?        very happy      
+  who collapses from ridicule    When we are very bad        what can
+ they do to us.
The taste of ashes in the air    the smell of wood sweating in the hea
+rth        steeped flowers                the devastation of paths   
+     drizzle over the canals in the fields        why not already pla
+ythings and incense?
Arousing a pleasant taste of Chinese ink    a black powder gently rain
+s on my night    I lower the jets of the chandelier        throw myse
+lf on the bed        and turning toward thedark        I see you     
+   O my daughters and queens!
[download]

Bob, the rabbit jump above the fence        Jack, the cat hid under th
+e porch of the red house    Rex, the dog ran after Jack        the bi
+rds fly            When the world is reduced to a single dark wood fo
+r our two pairs of dazzled eyes    to a musical house for our clear u
+nderstanding    then I shall find you
When we are very strong                who draws back?                
+        very happy                who collapses from ridicule    When
+ we are very bad                                    what can they do 
+to us.
The taste of ashes in the air            the smell of wood sweating in
+ the hearth        steeped flowers                the devastation of 
+paths    drizzle over the canals in the fields                       
+     why not already playthings and incense?
Arousing a pleasant taste of Chinese ink    a black powder gently rain
+s on my night            I lower the jets of the chandelier    throw 
+myself on the bed        and turning toward thedark                  
+              I see you                    O my daughters and queens!
[download]

text with spaces X

$max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ;
$max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;
[download]

print $out_fh $_ =~ y/\t/\t/rs;
[download]

print $out_fh $_ =~ y/\t/\t{3}/rs;
[download]

{}

print $out_fh $_ =~ tr/\t/\t/rs;
[download]

https://perldoc.perl.org/perlre

$max_b[$tab_index]

$max_e[$tab_index]

y/\t/\t/rs

tr/\t/\t/rs

my $infile = 'pm_11140114_tab_align_even.dat';
my $outfile = 'pm_11140114_tab_align_even.out';
[download]

my $infile =  $ARGV[0];
my $outfile =  $ARGV[1];
[download]

$ARGV[0] $ARGV[1]

close $in_fh;
close $out_fh;
[download]

my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);
[download]

while(<$in_fh>) { ... }
[download]

while(<$in_fh>) { ... }
[download]

seek $in_fh,0,0;
[download]

# use 5.014;
# use warnings;
# use autodie;
[download]

Parentheses missing around "my" list at 02-00.pl line 9.
Global symbol "$nbr_tab" requires explicit package name (did you forge
+t to declare "my $nbr_tab"?) at 02-00.pl line 9.
Global symbol "$valid_line" requires explicit package name (did you fo
+rget to declare "my $valid_line"?) at 02-00.pl line 9.
Global symbol "$valid_line" requires explicit package name (did you fo
+rget to declare "my $valid_line"?) at 02-00.pl line 13.
Global symbol "@max" requires explicit package name (did you forget to
+ declare "my @max"?) at 02-00.pl line 17.
Global symbol "@max" requires explicit package name (did you forget to
+ declare "my @max"?) at 02-00.pl line 17.
Global symbol "$nbr_tab" requires explicit package name (did you forge
+t to declare "my $nbr_tab"?) at 02-00.pl line 20.
Global symbol "$nbr_tab" requires explicit package name (did you forge
+t to declare "my $nbr_tab"?) at 02-00.pl line 20.
Execution of 02-00.pl aborted due to compilation errors.
[download]

print $out_fh $_ =~ y/\t/\t/rs;
[download]

# use 5.014;
# use warnings;
# use autodie;

$infile =  $ARGV[0];
$outfile =  $ARGV[1];

my ($max_tab,$nbr_tab,$valid_line,@max_b,@max_e);

open my $in_fh, '<', $infile;
open my $out_fh, '>', $outfile;

while(<$in_fh>) {
# print "$_\n";
    if (/[a-zA-Z0-9]/) {
        $valid_line++;
        $max_tab = 0;
        while (/\t+/g) {
#            print  $-[0], ' ', $+[0], ' ';
            $max_b[$max_tab] = $-[0] if $max_b[$max_tab] < $-[0] ;
            $max_e[$max_tab] = $+[0] if $max_e[$max_tab] < $+[0] ;

            print $max_b[$max_tab], ' ', $max_e[$max_tab], ' ', $max_t
+ab, ' ';
            $max_tab++; 
        }
#        print "\n";
        $nbr_tab = $max_tab if $nbr_tab < $max_tab;
    }
}


# printing max number of TABs for each of the columns begin ($max_b) a
+nd end ($max_e) DEBUG
print "max begin\t: " ;
for ($x=0;$x<=$nbr_tab;$x++) {
    print $max_b[$x], ' ';
}
print "\nmax end\t\t: ";
for ($x=0;$x<=$nbr_tab;$x++) {
    print $max_e[$x], ' ';
}

seek $in_fh,0,0;

    while (<$in_fh>) {
        print $out_fh $_ =~ y/\t/\t/rs;
    }
close $in_fh;
close $out_fh;
[download]

[reply]
[d/l]
[select]

Much of what you wrote in reply to my post seems to have no bearing whatsoever on my post. For instance, "there should not be a TAB in the beggining ...": none of the output I showed had a tab at the beginning of any line. I'm going to ignore all such content. Please ensure you're replying to the correct post; and, for clarity, add some indication showing to what your response refers (e.g. You wrote X; I think Y).

Where and how you get your filenames is entirely up to you. I used hard-coded filenames for demo purposes only. I often use a prefix of pm_NODE-ID_ for demo files: it provides unique names as well as a reference back to the associated PM node. If you're reading from @ARGV, you should include some sanity checking; in this instance, check that @ARGV has two elements, with the first being a valid file. Also take a look at Getopt::Long.

"I have tried print $out_fh $_ =~ y/\t/\t{3}/rs;"

That's not how transliteration works. See y/// and consider:

$ perl -E 'my $x = "A\tB\t\tC"; say $x; say $x =~ y/\tABC/\t{3}/rs;'
A       B               C
{       3       }
[download]

"I guess you forgot to close files"

No, I certainly did not forget to do that. I declared, and used, lexical filehandles in the smallest scope possible (the anonymous block). Perl automatically closes files at the end of that scope.

I also didn't forget to check for I/O exceptions. Again, Perl does this for me via the autodie pragma.

I have commented

# use 5.014; # use warnings; # use autodie;
[download]

because I have useless warnings ...

That's a very bad move and I strongly recommend that you do not do this. Parentheses missing around ... is the only warning; all the rest are errors (note the Execution of 02-00.pl aborted due to compilation errors. as the last line). Furthermore, none of those messages are "useless"!

As you're not checking for I/O exceptions, you should definitely use the autodie pragma and let Perl do it for you.

— Ken

[reply]
[d/l]
[select]

My code using tabs within code tags doesn't render exactly right on Perl Monks and I don't know why.

use strict;
use warnings;

print "123456789\n";
print "\t1\n";

__END__
123456789
    1
[download]

Be that presentation problem be as it may, this code adjusts the tabs correctly for the given input, at least as viewed with my program editor. I think I handled the "off by one" situation correctly, mileage varies.

use strict;
use warnings;
use Data::Dump qw(dump dd);

$|=1;


my $input2 = << "EOF";
foo    bar        baz    booz        qaaz            abc
foo        bar    baz        booz    qaaz    abc    123
foo        bar        thisis15chars15        booz    qaaz        abc
EOF

use constant {TAB_SPACES =>8};   # normal default is 8


#######
# Table 2 is more complex - reduce tabs when possible, 
#                           add tabs when needed.
#
# As each line is read, the maximum required width of each column
# is calculated.
#
# Table is stored in @table2 without separators.
#
# Reformatted table is output assuming TAB_SPACES
#

open my $input2_fh, "<", \$input2 or die "$!";
print "********\nTable2 input in raw form:\n********\n";
my @table2;
my @max_chars;
while (<$input2_fh>)
{
    print;
    chomp;

    my $i=0;
    my @tokens;
    foreach my $field (@tokens = split /\t+/,$_)
    {
        $max_chars[$i] //= 0;
        $max_chars[$i] = length $field if (length $field > $max_chars[
+$i]);
        $i++;
    }
    push (@table2,[@tokens]);
}
print "\nData dump of Table2:\n";
dd \@table2;

print "\n******\nReformatted Table:\n*****\n";

foreach my $row_ref (@table2)
{
    my $i = 0;
    my @line = @$row_ref;

    while (defined (my $field = shift @line))
    {      
       my $alignment_spaces = $max_chars[$i]-length($field);
       my $n_tabs = int($alignment_spaces/TAB_SPACES)+1;
       
       print "".$field, (@line) ? "\t" x $n_tabs : "\n";
       $i++;   
    }
}
__END__
********
Table2 input in raw form:
********
foo    bar        baz    booz        qaaz            abc
foo        bar    baz        booz    qaaz    abc    123
foo        bar        thisis15chars15        booz    qaaz        abc

Data dump of Table2:
[
  ["foo", "bar", "baz", "booz", "qaaz", "abc"],
  ["foo", "bar", "baz", "booz", "qaaz", "abc", 123],
  ["foo", "bar", "thisis15chars15", "booz", "qaaz", "abc"],
]

******
Reformatted Table:
*****
foo    bar    baz        booz    qaaz    abc
foo    bar    baz        booz    qaaz    abc    123
foo    bar    thisis15chars15    booz    qaaz    abc
[download]

[reply]
[d/l]
[select]

/[a-zA-Z0-9]/

group of words

variable name	variable type	line number in program	description
max_line	scalar number	{16 27}	the maximum line reference number when writing to output file by adding nbr_line to valid_line
max_tab	scalar number	{3 8 10 11 13 22 25}	current index in the max array
nbr_line	scalar number	{3 16 23 29}	current line number when writing line reference numbers
nbr_max_tab	scalar number	{3 13}	number of index numbers (size) in the max array
valid_line	scalar number	{3 8 16}	number of line containing text (regex `/[a-zA-Z0-9]/`) in the input file
max	array number	{3 10 25}	array containing the maximum TAB stop column number for each group of words read from the input file
ARGV	array misc	{1 3 4 23 27}	array containing command line arguements passed to the program, those are 0 input file to read from 1 output file to write to 2 number of 0s to prepend to line numbers when writing line reference numbers 3 starting number for line numbers when writing line reference numbers 4 number of SPACEs between reference numbers {2 3} and line content ($_) when writing to output file (1)
LAST_MATCH_START/@-/$-[0]	array number	25	column where TABs start within line and between group of words
LAST_MATCH_END/@+/$+[0]	array number	{10 25}	column where TABs end within line and between group of words
$_	scalar misc	25	last line read from the input file
{F0 F1}	scalar pointer	{4 6 17 19 23 25 27 28 32}	position in input/output files

http://www.xfce.org/


Bob, the rabbit jump above the fence        Jack, the cat hid under th
+e porch of the red house        Rex, the dog ran after Jack        th
+e birds fly        When the world is reduced to a single dark wood fo
+r our two pairs of dazzled eyes        to a musical house for our cle
+ar understanding        then I shall find you



When we are very strong        who draws back?        very happy      
+  who collapses from ridicule        When we are very bad        what
+ can they do to us.
The taste of ashes in the air        the smell of wood sweating in the
+ hearth        steeped flowers        the devastation of paths       
+ drizzle over the canals in the fields        why not already playthi
+ngs and incense?

Arousing a pleasant taste of Chinese ink        a black powder gently 
+rains on my night        I lower the jets of the chandelier        th
+row myself on the bed        and turning toward thedark        I see 
+you        O my daughters and queens!
[download]

group of words

00        Bob, the rabbit jump above the fence      Jack, the cat hid 
+under the porch of the red house                                     
+   Rex, the dog ran after Jack                                       
+                                                     the birds fly   
+                                                                     
+                                                           When the w
+orld is reduced to a single dark wood for our two pairs of dazzled ey
+es                                                                   
+                                                                     
+to a musical house for our clear understanding                       
+                                                                     
+                                                                     
+                                                          then I shal
+l find you             {00 .. 03}
01        When we are very strong                   who draws back?   
+                                                                     
+   very happy                                                        
+                                                     who collapses fr
+om ridicule                                                          
+                                                           When we ar
+e very bad                                                           
+                                                                     
+                                                                     
+what can they do to us.                                              
+                                                                     
+                                                                     
+                                                                     
+                       {00 .. 03}
02        The taste of ashes in the air             the smell of wood 
+sweating in the hearth                                               
+   steeped flowers                                                   
+                                                     the devastation 
+of paths                                                             
+                                                           drizzle ov
+er the canals in the fields                                          
+                                                                     
+                                                                     
+why not already playthings and incense?                              
+                                                                     
+                                                                     
+                                                                     
+                       {00 .. 03}
03        Arousing a pleasant taste of Chinese ink  a black powder gen
+tly rains on my night                                                
+   I lower the jets of the chandelier                                
+                                                     throw myself on 
+the bed                                                              
+                                                           and turnin
+g toward thedark                                                     
+                                                                     
+                                                                     
+I see you                                                            
+                                                                     
+                                                                     
+                                                          O my daught
+ers and queens!        {00 .. 03}
[download]

usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NUMBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_NUMBER_AND_LINE>

perl format-pre-post-nbr-SPACE.pl input.txt output-0.txt 2 0 8

die "usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NU
+MBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_
+NUMBER_AND_LINE>\n" if $#ARGV < 4;

$valid_line=$nbr_max_tab=0;$max[0]=0;$nbr_line=$ARGV[3];
open(F0, $ARGV[0]); open(F1, ">$ARGV[1]");

while(<F0>) {
    if (/[a-zA-Z0-9]/) {
        $valid_line++; $max_tab=1;
        while (/\t+/g) {
            $max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] || $max[$
+max_tab] eq "";
            $max_tab++;
        }
        $nbr_max_tab = $max_tab if $nbr_max_tab < $max_tab;
    }
}
$valid_line--;$max_line=$nbr_line+$valid_line;
seek F0,0,0;

while(<F0>) {
    s/\r//;chop;
    if (/[a-zA-Z0-9]/) {
        $max_tab=1;
        print F1 "0" x ($ARGV[2] - length($nbr_line)), $nbr_line, " " 
+x $ARGV[4];
        while (/[^\t]+/g) {
            print F1 substr($_,  $-[0], ($+[0] - $-[0])), " " x ($max[
+$max_tab++] - ($+[0] - $-[0]));
        }
        print F1 " " x $ARGV[4], "{", "0" x ($ARGV[2] - length($ARGV[3
+])), $ARGV[3], " .. ", "0" x ($ARGV[2] - length($max_line)), $max_lin
+e, "}";
        print F1 "\n";
        $nbr_line++;
    }
}
close F0;close F1;
[download]

group of words

/[^\t]/

[reply]
[d/l]
[select]