in reply to rectangularizing input to become array

I'm attempting to present an SSCCE.

Note that an SSCCE is all about making things clear and easy for the people trying to help: no extra editing of the inputs, no guessing which input produces which output, guessing whether commented-out code is relevant to the question, and so on. For me, an ideal question would contain three of PerlMonks' [download] links that I can right-click and do "Save As" on: the input file, the output file, and the source code, such that I can just download them and run perl script.pl input.txt | diff - output.txt and see what's going on. There are also a couple other options, such as embedding the input in the source in the form of a __DATA__ section or just a variable, and embedding the output in the source and using e.g. Test::More to check if it matches.

In your question, you've included a link to GitHub, which might go down sometime in the future (plus it's a couple more clicks), and also, in your <code> tags you've included the command-line invocations that I'd have to trim, and that aren't really necessary here to show others how to run the script or what's going on. Also, since the question is about whitespace, the dd invocation you commented out is actually more useful than a plain print here.

I read somewhere that a good name for a default directory in perl development is 'lib'.

Yes, but mostly for .pm files - the lib directory is usually what would get added to @INC such that use and require (and in some cases do) can find the files. Typically, the content of such directories is not what would get modified by a user or get modified during the run of a script.

specified by $width and $length.

Personally, I find those variable names a little confusing: You've got an array of strings, and $length sounds like it refers to the strings' length, but it seems like that's what $width is for. I might suggest $width/$height, $length/$height, $length/$rows, or $cols/$rows.

What I would like is for these inputs to get trimmed to an array of the size specified by $width and $length . Spaces added/substituted on right if necessary to pad each vector to the same size.

You could use substr for the trimming and sprintf for the output with padding.

use warnings; use strict; use Data::Dump; my $input = <<'(END INPUT)'; abcdef abcdefg abcde bcdefgh bcd (END INPUT) my @lines = split /\n/, $input; dd \@lines; my $out = make_rectangular( \@lines, 4, 6 ); dd $out; sub make_rectangular { my ( $lines, $maxrows, $maxlength ) = @_; my @out; my $rowcount=1; for my $line (@$lines) { my $trimmed = substr $line, 0, $maxlength; push @out, sprintf "%-*s", $maxlength, $trimmed; last if ++$rowcount>$maxrows; } return \@out; } __END__ ["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd\t"] ["abcdef", "abcdef", "abcde ", " bcdef"]

(Note I've assumed you want to not modify the input array here.) Of course TIMTOWTDI, I could've mashed the code into a single map statement, but I hope this is a little more clear.

BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.

Also, what character is it that renders an entire space dark?

Do you mean the block drawing character U+2588, "█"? That would be "\N{U+2588}" or "\N{FULL BLOCK}". I might suggest U+2420, "␠" ("\N{U+2420}" or "\N{SYMBOL FOR SPACE}", see also). Note that for the \N{CHARNAME} variant, you may have to add use charnames ':full'; to your script, depending on your Perl version (newer versions load it automatically).

Replies are listed 'Best First'.
Re^2: rectangularizing input to become array
by Aldebaran (Curate) on Feb 28, 2019 at 22:25 UTC
    BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.

    I'm making every effort to quote haukex fairly, but I will re-order for thematic and write-up reasons. I threw the tab character in to be a possible problem. I think I deal with it with:

    $input=~ s/\t/ /g;

    I'm also trying to make the write-up as austere as it can be in terms of using vertical space, so I will continue in readmore tags. I think I get more eyes if people don't have to scroll down to continue finding good content, and the thread might read more about the solutions as opposed to the problem. I haven't even gotten to the third one yet.

    using e.g. Test::More to check if it matches

    What I seek to do is pass the first test...then others....

    Vielen Dank und Schoenen Gruss aus Amiland,
      So there's the base directory of the script. I wouldn't want output there. ... would you rather put such a thing on our one and only subdirectory or split input and output into deparate directories?

      Here's an idea for how to handle a script with a library.pm file or two that goes with it:

      • Say /home/user is the base directory.
      • I put my script in e.g. /home/user/myscript.
        • Libraries (.pms) could go in the same directory, or in /home/user/myscript/lib, that doesn't really make a difference for small scripts - if you've got a lot of .pm files then a lib dir is a good idea.
        • Ideally, /home/user/myscript is also a git working copy - in which case input and output data doesn't really belong in that directory anyway, as otherwise it'd have to be added to .gitignore.
      • The script can be made to not worry about which directory it is located in using code like this:
        use FindBin; use lib $FindBin::Bin;
        Or, if there's a lib subdirectory, using the following (platform-independent) code:
        use FindBin; use File::Spec::Functions qw/catdir/; use lib catdir($FindBin::Bin, 'lib');
      • You can put your input data in e.g. /home/user/mydata, cd to that directory, and run your script with e.g. perl ../myscript/script.pl input.txt, and it should generate its output in the current directory.
      • If it's a script you use a lot, and you don't want to type out its path all the time, you could add it to your PATH. For example, on a couple of my boxes, I have lines like this in my ~/.profile: test -d "$HOME/myscript" && PATH="$HOME/myscript:$PATH" (the script needs to be chmod u+x for this to work).
      Should I go update that on the original post?

      I think in this case you don't need to, it's just for future reference, thanks.

      in your code tags you've included the command-line invocations that I'd have to trim
      I tend to think that it provides context ... Might pre tags work here?

      Yes you're right - I didn't mean to make it sound like it's not a good idea, context can certainly be useful in some cases - the main point was not to put it in the same <code> tag as the code, to make downloading easier. <pre> tags have the issue that HTML and PerlMonks special characters have to be escaped (as you can see your <pre> tag has been rendered with links in it), so two separate sets of <code> tags work. Or, here's how I might have written that post (note you can use <code> tags in paragraphs as well):

      Here is the script 3.rm.pl, which I run via ./3.rm.pl:

      #!/usr/bin/perl -w use 5.011; ...

      And here is the output:

      ["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd "] ["abcdef", "abcdef", "abcde ", " bcdef"] ...

      Also, command lines like cat or perl script.pl are simple enough that we usually don't need to see them, it only becomes important when there are additional arguments involved. (And for some questions, it can be relevant whether a script was invoked as ./script.pl or perl script.pl, but that's not too often.)

      What I seek to do is pass the first test...then others....

      Sometimes it can be very useful to write the tests first, as it forces one to think about the API and what the output should ideally look like.

      Can't use string ("abcdef") as an ARRAY ref while "strict refs" in use at ./3.rm.pl line 59.

      getsubset expects an array of arrays, but $out is just an array of strings. Assuming you want each character to be a "column", you could do $out = [ map { [split //] } @$out ]; after $out = make_rectangular(..., or you integrate it directly in the push in your make_rectangular like so: push @out, [ split //, sprintf "%-*s", $maxlength, $trimmed ]; - either of those changes make your test pass. (Note you should call done_testing; after your tests.)

        Again I'm gonna try to keep my response short vertically by putting the bulk of it in readmore tags. If you like this tack, hey, throw a hermit an upvote.

        This indicates a lot of successes, including my subroutines for printing 2-d arrays being vindicated as not being broken. On github here: sscce

        I think this is an elegant solution. Thank you for your generous comments.

        I believe this has passed its first test. Output, then source, just one code tag, buyer beware:

        (Note you should call done_testing; after your tests.)

        Copy that, so, yahoo, right? I'll write a few more tests....