perl_boy has asked for the wisdom of the Perl Monks concerning the following question:
foo bar baz booz qaaz abc foo bar baz booz qaaz abc 123 foo bar baz booz qaaz abcSHOULD output to
foo bar baz booz qaaz abc foo bar baz booz qaaz abc 123 foo bar baz booz qaaz abcI read the file twice. The first time to get max position of each TAB stop for each line that I put into $max array. If a line has a Nth TAB stop bigger than any previous line then $max[$max_tab] is augmented
print substr("\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t", 0, ($max[$max_tab++] +- $+[0]) / 8), substr($_, $-[0], $+[0]);
foobar babaz booz qabooz qaaz abcqaaz abcabc foobar baz baz booz qabooz qaaz abc 123qaaz abc 123abc 123 123 foobar bazbaz booz qaabooz qaaz abcqaaz abcabc
and get the right indexesprint $-[0], ' ', $+[0], ' ';
3 4 7 9 12 13 17 19 23 26 3 5 8 9 12 14 18 19 23 24 27 28 3 5 8 10 13 15 19 20 24 26
https://www.geeksforgeeks.org/perl-substr-function/ https://perldoc.perl.org/functions/substr https://www.tutorialspoint.com/perl/perl_substr.htm https://squareperl.com/en/function/substrand give the right arguements i m trying to figure out what s wrong
$valid_line==0; $nbr_line=$ARGV[1]; # format the first digit @_[0], length @_[1] wide with leading 0s sub format { return substr("00000000", 0, (@_[1] - length(@_[0]))) . @_[0]; } open(F0, $ARGV[0]); while(<F0>) { if (/[a-zA-Z0-9]/) { $valid_line++; $max_tab = 0; while (/\t+/g) { # print $-[0], ' ', $+[0], ' '; $max[$max_tab] = $+[0] if $max[$max_tab] < $+[0] ;$max_tab +++; } # print "\n"; $nbr_tab = $max_tab if $nbr_tab < $max_tab; } } $max_line=$nbr_line+$valid_line; seek F0,0,0; while(<F0>) { s/\r//;chop; $max_tab = 0; while (/[^\t]+/g) { print substr("\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t", 0, ($max[$ma +x_tab++] - $+[0]) / 8), substr($_, $-[0], $+[0]); } print "\n"; } close F0;
|
|---|
| Replies are listed 'Best First'. | |||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by choroba (Cardinal) on Jan 03, 2022 at 20:46 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by tybalt89 (Monsignor) on Jan 04, 2022 at 02:00 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
Don't worry about the indexes, just fix the tabs.
| [reply] [d/l] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by kcott (Archbishop) on Jan 04, 2022 at 11:22 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
G'day perl_boy, [A couple of notes on presentation: It's good that you've put code within <code>...</code> tags; please also do the same for data and program output (e.g. error messages) so that we can see a verbatim copy of what you're seeing — HTML can modify what you write, e.g. by collapsing whitespace into a single space, which can make a huge difference in many cases. Please also "linkify" URLs; in this case, changing URL to [URL] would have sufficed; again, this helps us to help you (see "What shortcuts can I use for linking to other information?" for more details about that).] You've shown your input data as having the same number of characters in each column (columns 1, 2 & 3 have 3 characters: foo, bar, baz; columns 4 & 5 have 4 characters: booz & qaaz; and so on). This could be a realistic representation of your data; for instance, order numbers, product codes, client IDs, and so on, are likely to have the same lengths. If this is the case, the following is a much simpler solution.
pm_11140114_tab_align_even.dat:
pm_11140114_tab_align_even.out:
Note that this uses the /r option which was introduced in Perl 5.14: "perl5140delta: Non-destructive substitution". If you're using an older version of Perl, change use 5.014; to use strict; and the print statement will need to be split into two statements:
This gives exactly the same result. Your "SHOULD output to" shows two tabs between columns (except for "abc\t123" which I'm going to assume is just a typo). Because y///r and s///r can be chained, you can change
to
Now, pm_11140114_tab_align_even.out will be:
For older Perls, you'll need to split the print statement into three statements:
Again, this gives exactly the same result. Please either advise whether the input data in you OP is representative or, if not, provide something more realistic such that we can provide better help. It would also be useful to know what you intend to do with the output; e.g. print to screen, write to a plain text file, use for CSV, generate an HTML table, etc. With this information, we may be able to provide different (better) advice. — Ken | [reply] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||
by tybalt89 (Monsignor) on Jan 04, 2022 at 15:57 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
Or just:
| [reply] [d/l] | ||||||||||||||||||||||||||||||||||||||||||||||||
by perl_boy (Novice) on Jan 11, 2022 at 20:56 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
There are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 0) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges the max is 4 for each of the columns the max number of TABs for each of the columns is 4 4 4 4 there should be 4 TABs between text with spaces 1 and text with spaces 2 , text with spaces 2 and text with spaces 3 text with spaces 3 and text with spaces 4 and text with spaces 4 and line X X being 0,1,2,3 and the output should be the above example (ex 0) (in 0) is a bit unclear as all TAB ranges contain 4 TABs. In the following example (ex 1) (in 1) temp9.txt TAB ranges (max TAB) have a different number of TABs between group of wordstext with spaces X As in above example (ex 0), there are 4 TAB ranges (3 TABs ranges between text with spaces (1 and 2, 2 and 3, 3 and 4) and 1 TABs range between text with spaces 4 and line X, X being 0,1,2,3) in each for the 4 lines. Here is a 4 * 4 array (ar 1) (TAB_range * line),that is, for each of the 4 lines, the 4 TAB ranges the max number of TABs for each of the columns is 2 4 5 3 as opposed to 4 4 4 4 for the above array (ar 0) and thus output should be I have printed the begin/end of each TAB range using print $-[0], ' ', $+[0], ' '; line 20 I have printed the maximum column numbers for both TAB bebin and end for each of the TAB ranges between group of wordstext with spaces X for debugging only and outputs here text with spaces 1,2,3,4 for all 4 lines 0,1,2,3 all have the same width,that is,18, but in real life text with spaces have variable length as in here lines 2,3,4 taken from https://www.poetryfoundation.org/poems/55038/phrases (lak of insperation after line 1) should output as follows temp11.txt formatted using Ubuntu Mousepad I would like TABs to aline at the rightmost/maximum column number (between $max_b[] and $max_e[]) for each TAB ranges between group of wordstext with spaces X remember arrays @max_b and @max_e contain the beginning ending column numbers of TABs found using and insert missing TABs (to aline with the longest line) using I have tried just to test the use of {} for making the output 3 tabs wide but got the same output. I have also tried I have read https://perldoc.perl.org/perlre on perl regex but can t figure out where to insert $max_b[$tab_index] and $max_e[$tab_index] within y/\t/\t/rs or tr/\t/\t/rs I have changed the following lines in the code for because I need to read the input and output filenames from command line arguement $ARGV[0] $ARGV[1] and added I guess you forgot to close files for my array of TABs I also added at the begginning to read the file before writing to the output file in the second in order to get the positions of the begin/end of each TAB range inside $max_b and $max_e arrays before repositioning the file pointer at the beginning using I have commented because I have useless warnings I don t know where to put ($max_e[]) , rightmost/maximum column number within I am sorry for this but I m quite new to perl and regex and its quite confusing thank s in advance here is the re-written code
| [reply] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||
by kcott (Archbishop) on Jan 12, 2022 at 01:11 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
Much of what you wrote in reply to my post seems to have no bearing whatsoever on my post. For instance, "there should not be a TAB in the beggining ...": none of the output I showed had a tab at the beginning of any line. I'm going to ignore all such content. Please ensure you're replying to the correct post; and, for clarity, add some indication showing to what your response refers (e.g. You wrote X; I think Y). Where and how you get your filenames is entirely up to you. I used hard-coded filenames for demo purposes only. I often use a prefix of pm_NODE-ID_ for demo files: it provides unique names as well as a reference back to the associated PM node. If you're reading from @ARGV, you should include some sanity checking; in this instance, check that @ARGV has two elements, with the first being a valid file. Also take a look at Getopt::Long. "I have tried print $out_fh $_ =~ y/\t/\t{3}/rs;" That's not how transliteration works. See y/// and consider:
"I guess you forgot to close files" No, I certainly did not forget to do that. I declared, and used, lexical filehandles in the smallest scope possible (the anonymous block). Perl automatically closes files at the end of that scope. I also didn't forget to check for I/O exceptions. Again, Perl does this for me via the autodie pragma.
That's a very bad move and I strongly recommend that you do not do this. Parentheses missing around ... is the only warning; all the rest are errors (note the Execution of 02-00.pl aborted due to compilation errors. as the last line). Furthermore, none of those messages are "useless"! As you're not checking for I/O exceptions, you should definitely use the autodie pragma and let Perl do it for you. — Ken | [reply] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by Marshall (Canon) on Jan 05, 2022 at 03:31 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
Usually the idea is to compress as many columns onto the page as possible. This means removing tabs instead of adding them except when necessary for column alignment and allowing for a least one blank space between columns. My code using tabs within code tags doesn't render exactly right on Perl Monks and I don't know why. The "1" should be underneath the 9, but with my browser it is not.
Be that presentation problem be as it may, this code adjusts the tabs correctly for the given input, at least as viewed with my program editor. I think I handled the "off by one" situation correctly, mileage varies. When run on my computer, all of the "booz" column lines up. | [reply] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Re: misalined TABs using substr,LAST_MATCH_START/END,regex
by perl_boy (Novice) on Jan 21, 2022 at 15:15 UTC | |||||||||||||||||||||||||||||||||||||||||||||||||
The program has been finished and some features have been added, those are reference line numbers at the beginning/ending of each line in the output file. TABs have been changed to SPACEs for portability with Unicode. The program read the input file to collect the maximum line reference number (max_line), number of line containing text (regex /[a-zA-Z0-9]/) in the input file (valid_line) , the number of group of words read from the input file for the max array (nbr_max_tab) and the respective TABs stop position in max array.
The input.txt and output.txt file have used Ubuntu Mousepad http://www.xfce.org/ having vertical scrolling which vi lacks input have been purposely misaligned and containing empty lines input.txt
output line reference number at begin/end and group of words are well alined output.txt
shell command to call the program with the right arguement usage format-pre-post-nbr-SPACE.pl <INPUT_FILE> <OUTPUT_FILE> <NUMBER_OF_0\'s_IN_NUMBERS> <STARTING_NUMBER> <NUMBER_OF_SPACES_BETWEEN_NUMBER_AND_LINE> ex: perl format-pre-post-nbr-SPACE.pl input.txt output-0.txt 2 0 8 code
group of words : text NOT containing TABs /[^\t]/ | [reply] [d/l] [select] | ||||||||||||||||||||||||||||||||||||||||||||||||