Re: rectangularizing input to become array
by haukex (Archbishop) on Feb 27, 2019 at 06:58 UTC
|
I'm attempting to present an SSCCE.
Note that an SSCCE is all about making things clear and easy for the people trying to help: no extra editing of the inputs, no guessing which input produces which output, guessing whether commented-out code is relevant to the question, and so on. For me, an ideal question would contain three of PerlMonks' [download] links that I can right-click and do "Save As" on: the input file, the output file, and the source code, such that I can just download them and run perl script.pl input.txt | diff - output.txt and see what's going on. There are also a couple other options, such as embedding the input in the source in the form of a __DATA__ section or just a variable, and embedding the output in the source and using e.g. Test::More to check if it matches.
In your question, you've included a link to GitHub, which might go down sometime in the future (plus it's a couple more clicks), and also, in your <code> tags you've included the command-line invocations that I'd have to trim, and that aren't really necessary here to show others how to run the script or what's going on. Also, since the question is about whitespace, the dd invocation you commented out is actually more useful than a plain print here.
I read somewhere that a good name for a default directory in perl development is 'lib'.
Yes, but mostly for .pm files - the lib directory is usually what would get added to @INC such that use and require (and in some cases do) can find the files. Typically, the content of such directories is not what would get modified by a user or get modified during the run of a script.
specified by $width and $length.
Personally, I find those variable names a little confusing: You've got an array of strings, and $length sounds like it refers to the strings' length, but it seems like that's what $width is for. I might suggest $width/$height, $length/$height, $length/$rows, or $cols/$rows.
What I would like is for these inputs to get trimmed to an array of the size specified by $width and $length . Spaces added/substituted on right if necessary to pad each vector to the same size.
You could use substr for the trimming and sprintf for the output with padding.
use warnings;
use strict;
use Data::Dump;
my $input = <<'(END INPUT)';
abcdef
abcdefg
abcde
bcdefgh
bcd
(END INPUT)
my @lines = split /\n/, $input;
dd \@lines;
my $out = make_rectangular( \@lines, 4, 6 );
dd $out;
sub make_rectangular {
my ( $lines, $maxrows, $maxlength ) = @_;
my @out;
my $rowcount=1;
for my $line (@$lines) {
my $trimmed = substr $line, 0, $maxlength;
push @out, sprintf "%-*s", $maxlength, $trimmed;
last if ++$rowcount>$maxrows;
}
return \@out;
}
__END__
["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd\t"]
["abcdef", "abcdef", "abcde ", " bcdef"]
(Note I've assumed you want to not modify the input array here.) Of course TIMTOWTDI, I could've mashed the code into a single map statement, but I hope this is a little more clear.
BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.
Also, what character is it that renders an entire space dark?
Do you mean the block drawing character U+2588, "█"? That would be "\N{U+2588}" or "\N{FULL BLOCK}". I might suggest U+2420, "␠" ("\N{U+2420}" or "\N{SYMBOL FOR SPACE}", see also). Note that for the \N{CHARNAME} variant, you may have to add use charnames ':full'; to your script, depending on your Perl version (newer versions load it automatically).
| [reply] [d/l] [select] |
|
|
BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.
I'm making every effort to quote haukex fairly, but I will re-order for thematic and write-up reasons. I threw the tab character in to be a possible problem. I think I deal with it with:
$input=~ s/\t/ /g;
I'm also trying to make the write-up as austere as it can be in terms of using vertical space, so I will continue in readmore tags. I think I get more eyes if people don't have to scroll down to continue finding good content, and the thread might read more about the solutions as opposed to the problem. I haven't even gotten to the third one yet.
using e.g. Test::More to check if it matches
What I seek to do is pass the first test...then others....
Vielen Dank und Schoenen Gruss aus Amiland,
| [reply] [d/l] [select] |
|
|
So there's the base directory of the script. I wouldn't want output there. ... would you rather put such a thing on our one and only subdirectory or split input and output into deparate directories?
Here's an idea for how to handle a script with a library.pm file or two that goes with it:
- Say /home/user is the base directory.
- I put my script in e.g. /home/user/myscript.
- Libraries (.pms) could go in the same directory, or in /home/user/myscript/lib, that doesn't really make a difference for small scripts - if you've got a lot of .pm files then a lib dir is a good idea.
- Ideally, /home/user/myscript is also a git working copy - in which case input and output data doesn't really belong in that directory anyway, as otherwise it'd have to be added to .gitignore.
- The script can be made to not worry about which directory it is located in using code like this:
use FindBin;
use lib $FindBin::Bin;
Or, if there's a lib subdirectory, using the following (platform-independent) code:
use FindBin;
use File::Spec::Functions qw/catdir/;
use lib catdir($FindBin::Bin, 'lib');
- You can put your input data in e.g. /home/user/mydata, cd to that directory, and run your script with e.g. perl ../myscript/script.pl input.txt, and it should generate its output in the current directory.
- If it's a script you use a lot, and you don't want to type out its path all the time, you could add it to your PATH. For example, on a couple of my boxes, I have lines like this in my ~/.profile: test -d "$HOME/myscript" && PATH="$HOME/myscript:$PATH" (the script needs to be chmod u+x for this to work).
Should I go update that on the original post?
I think in this case you don't need to, it's just for future reference, thanks.
in your code tags you've included the command-line invocations that I'd have to trim
I tend to think that it provides context ... Might pre tags work here?
Yes you're right - I didn't mean to make it sound like it's not a good idea, context can certainly be useful in some cases - the main point was not to put it in the same <code> tag as the code, to make downloading easier. <pre> tags have the issue that HTML and PerlMonks special characters have to be escaped (as you can see your <pre> tag has been rendered with links in it), so two separate sets of <code> tags work. Or, here's how I might have written that post (note you can use <code> tags in paragraphs as well):
Here is the script 3.rm.pl, which I run via ./3.rm.pl:
#!/usr/bin/perl -w
use 5.011;
...
And here is the output:
["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd "]
["abcdef", "abcdef", "abcde ", " bcdef"]
...
Also, command lines like cat or perl script.pl are simple enough that we usually don't need to see them, it only becomes important when there are additional arguments involved. (And for some questions, it can be relevant whether a script was invoked as ./script.pl or perl script.pl, but that's not too often.)
What I seek to do is pass the first test...then others....
Sometimes it can be very useful to write the tests first, as it forces one to think about the API and what the output should ideally look like.
Can't use string ("abcdef") as an ARRAY ref while "strict refs" in use at ./3.rm.pl line 59.
getsubset expects an array of arrays, but $out is just an array of strings. Assuming you want each character to be a "column", you could do $out = [ map { [split //] } @$out ]; after $out = make_rectangular(..., or you integrate it directly in the push in your make_rectangular like so: push @out, [ split //, sprintf "%-*s", $maxlength, $trimmed ]; - either of those changes make your test pass. (Note you should call done_testing; after your tests.)
| [reply] [d/l] [select] |
|
|
|
|
|
|
Re: rectangularizing input to become array
by Athanasius (Archbishop) on Feb 27, 2019 at 07:31 UTC
|
| [reply] [d/l] [select] |
|
|
my @lines = $path_to_file->slurp;
almost certainly doesn’t do what you think it does.
It did not. I've been whittling this SSCCE down from a river of mojibake and woe, my soaked and freezing body wondering where my skills to deal with such environments have been. Well, getting the logic from Path::Tiny wrong was one thing that had me beat. I was reading this as:
@lines = $file->lines;
, which, I believe would produce different results. It's very difficult to diagnose path and file input problems from the net, but you and haukex have done exactly that. Thank you.
What can the OP do about misapprehension? (Open question) I would like to introduce a little bit of code to test whether I have these data represented correctly. I frequently find that I'm off by a pair of square brackets or quotes and commas. I'll use readmore tags for output then new source for the caller.
Thanks all for comments,
2019-03-01 Athanasius changed one set of pre tags to code tags
| [reply] [d/l] [select] |
Re: rectangularizing input to become array
by johngg (Canon) on Feb 27, 2019 at 11:38 UTC
|
Note also, as an alternative to substr, that pack with the A template will either truncate or pad to the right with spaces.
johngg@shiraz:~/perl/Monks$ perl -Mstrict -Mwarnings -MData::Dumper -e
+ '
open my $inFH, q{<}, \ <<__EOD__ or die $!;
abcdef
abcdefg
abcde
bcdefgh
bcd
__EOD__
chomp( my @lines = <$inFH> );
close $inFH or die $!;
my $width = 6;
my $height = 4;
my $raRect = makeRect( \ @lines, $width, $height );
print Data::Dumper->Dumpxs( [ $raRect ], [ qw{ raRect } ] );
sub makeRect
{
my( $raLines, $width, $height ) = @_;
my @rect;
push @rect, pack qq{A$width}, shift @{ $raLines } for 1 .. $height
+;
return \ @rect;
}'
$raRect = [
'abcdef',
'abcdef',
'abcde ',
' bcdef'
];
I hope this is of interest.
| [reply] [d/l] [select] |
|
|
pack with the A template
The many and various uses of pack. I'm glad to have another useful example. What's the A template?
Code replication between readmore tags, invocation, output, then source.
Can you say a few words about this line of code (I've never seen this before in Data::Dumper)?
print Data::Dumper->Dumpxs( [ $raRect ], [ qw{ raRect } ] );
Thanks.
| [reply] [d/l] [select] |
|
|
Unlike Dumper(), Data::Dumper->Dump() and ->Dumpxs() are not exported by Data::Dumper. However, for a little extra effort they do provide clearer output, especially if examining multiple data structures. They allow you to distinguish arrays and references to arrays, ditto for hashes. Many Monks recommend and use the more modern and flexible Data::Dump module but Data::Dumper has the advantage of being in core from way back so is useful if maintaining elderly servers in a closed environment running, say, Perl 5.8 or even earlier. Here is a simple example.
I hope this is helpful.
| [reply] [d/l] [select] |
|
|
What's the A template?
As documented in pack, it's a "text (ASCII) string, will be space padded." I showed the differences between some of those pack templates here.
Can you say a few words about this line of code (I've never seen this before in Data::Dumper)?
print Data::Dumper->Dumpxs( [ $raRect ], [ qw{ raRect } ] );
Simplifying a lot, Dumpxs is just another name for Dump. (See Where is Data::Dumper->Dumpx?)
| [reply] [d/l] [select] |