Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

I'm attempting to present an SSCCE. I realized that I can use directories on github, and I read somewhere that a good name for a default directory in perl development is 'lib'. I have a calling script which uses Path::Tiny to find an input file in 'lib'. The following link contains the entire source and input file: SSCCE (1.1 kB).

Update:In the time between when I posted the problem and when I have solutions, laden with tests, this workspace has grown to just under 5k. The first script 1.sscce.pl should still "work," but one can see how much distance we traveled by looking at 5.sscce.pl .

But one need not chase an external link to comment on the substance of my question. I'm a fortran then a C guy, so I'm alway tempted to go to external loops to set values in arrays. Likewise, the c technique of doing one char at a time is effective, but not extensible the way I desire. I do not seek solutions here that look like fortran or c. The perl solution is elegant and attainable.

Caller is pretty plain, but I'll put code and initial output between readmore tags:

$ pwd /home/bob/Desktop/SSCCE-master/sscce $ ls 1.sscce.pl lib $ cat 1.sscce.pl #!/usr/bin/perl -w use 5.011; use Path::Tiny; use lib "lib"; use Data::Dump; my $path_to_file = path( "lib", "1.txt" ); my $guts = $path_to_file->slurp; say "return1 is "; say "$guts"; my @lines = $path_to_file->slurp; say "return2 is "; say "@lines"; my $ref_lines = \@lines; my $width = 6; my $length = 4; my $ref_array = make_rectangular1( $guts, $width, $length ); #dd $ref_array; sub make_rectangular1 { my ( $guts, $width, $length ) = @_; my $ref_array = "nothing yet"; return $ref_array; } sub make_rectangular2 { my ( $ref_array, $width, $length ) = @_; my @new_array = @$ref_array; say "new array is "; say "@new_array"; #my $ref_array = "nothing yet"; return $ref_array; } __END__ $

Output is pretty clear:

$ ./1.sscce.pl return1 is abcdef abcdefg abcde bcdefgh bcd return2 is abcdef abcdefg abcde bcdefgh bcd $

What I would like is for these inputs to get trimmed to an array of the size specified by $width and $length . Spaces added/substituted on right if necessary to pad each vector to the same size. There's a sub stubbed out for each of the slurp possibilities.

Also, what character is it that renders an entire space dark?

Thanks for your comment,

Replies are listed 'Best First'.
Re: rectangularizing input to become array
by haukex (Archbishop) on Feb 27, 2019 at 06:58 UTC
    I'm attempting to present an SSCCE.

    Note that an SSCCE is all about making things clear and easy for the people trying to help: no extra editing of the inputs, no guessing which input produces which output, guessing whether commented-out code is relevant to the question, and so on. For me, an ideal question would contain three of PerlMonks' [download] links that I can right-click and do "Save As" on: the input file, the output file, and the source code, such that I can just download them and run perl script.pl input.txt | diff - output.txt and see what's going on. There are also a couple other options, such as embedding the input in the source in the form of a __DATA__ section or just a variable, and embedding the output in the source and using e.g. Test::More to check if it matches.

    In your question, you've included a link to GitHub, which might go down sometime in the future (plus it's a couple more clicks), and also, in your <code> tags you've included the command-line invocations that I'd have to trim, and that aren't really necessary here to show others how to run the script or what's going on. Also, since the question is about whitespace, the dd invocation you commented out is actually more useful than a plain print here.

    I read somewhere that a good name for a default directory in perl development is 'lib'.

    Yes, but mostly for .pm files - the lib directory is usually what would get added to @INC such that use and require (and in some cases do) can find the files. Typically, the content of such directories is not what would get modified by a user or get modified during the run of a script.

    specified by $width and $length.

    Personally, I find those variable names a little confusing: You've got an array of strings, and $length sounds like it refers to the strings' length, but it seems like that's what $width is for. I might suggest $width/$height, $length/$height, $length/$rows, or $cols/$rows.

    What I would like is for these inputs to get trimmed to an array of the size specified by $width and $length . Spaces added/substituted on right if necessary to pad each vector to the same size.

    You could use substr for the trimming and sprintf for the output with padding.

    use warnings; use strict; use Data::Dump; my $input = <<'(END INPUT)'; abcdef abcdefg abcde bcdefgh bcd (END INPUT) my @lines = split /\n/, $input; dd \@lines; my $out = make_rectangular( \@lines, 4, 6 ); dd $out; sub make_rectangular { my ( $lines, $maxrows, $maxlength ) = @_; my @out; my $rowcount=1; for my $line (@$lines) { my $trimmed = substr $line, 0, $maxlength; push @out, sprintf "%-*s", $maxlength, $trimmed; last if ++$rowcount>$maxrows; } return \@out; } __END__ ["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd\t"] ["abcdef", "abcdef", "abcde ", " bcdef"]

    (Note I've assumed you want to not modify the input array here.) Of course TIMTOWTDI, I could've mashed the code into a single map statement, but I hope this is a little more clear.

    BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.

    Also, what character is it that renders an entire space dark?

    Do you mean the block drawing character U+2588, "█"? That would be "\N{U+2588}" or "\N{FULL BLOCK}". I might suggest U+2420, "␠" ("\N{U+2420}" or "\N{SYMBOL FOR SPACE}", see also). Note that for the \N{CHARNAME} variant, you may have to add use charnames ':full'; to your script, depending on your Perl version (newer versions load it automatically).

      BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.

      I'm making every effort to quote haukex fairly, but I will re-order for thematic and write-up reasons. I threw the tab character in to be a possible problem. I think I deal with it with:

      $input=~ s/\t/ /g;

      I'm also trying to make the write-up as austere as it can be in terms of using vertical space, so I will continue in readmore tags. I think I get more eyes if people don't have to scroll down to continue finding good content, and the thread might read more about the solutions as opposed to the problem. I haven't even gotten to the third one yet.

      using e.g. Test::More to check if it matches

      What I seek to do is pass the first test...then others....

      Vielen Dank und Schoenen Gruss aus Amiland,
        So there's the base directory of the script. I wouldn't want output there. ... would you rather put such a thing on our one and only subdirectory or split input and output into deparate directories?

        Here's an idea for how to handle a script with a library.pm file or two that goes with it:

        • Say /home/user is the base directory.
        • I put my script in e.g. /home/user/myscript.
          • Libraries (.pms) could go in the same directory, or in /home/user/myscript/lib, that doesn't really make a difference for small scripts - if you've got a lot of .pm files then a lib dir is a good idea.
          • Ideally, /home/user/myscript is also a git working copy - in which case input and output data doesn't really belong in that directory anyway, as otherwise it'd have to be added to .gitignore.
        • The script can be made to not worry about which directory it is located in using code like this:
          use FindBin; use lib $FindBin::Bin;
          Or, if there's a lib subdirectory, using the following (platform-independent) code:
          use FindBin; use File::Spec::Functions qw/catdir/; use lib catdir($FindBin::Bin, 'lib');
        • You can put your input data in e.g. /home/user/mydata, cd to that directory, and run your script with e.g. perl ../myscript/script.pl input.txt, and it should generate its output in the current directory.
        • If it's a script you use a lot, and you don't want to type out its path all the time, you could add it to your PATH. For example, on a couple of my boxes, I have lines like this in my ~/.profile: test -d "$HOME/myscript" && PATH="$HOME/myscript:$PATH" (the script needs to be chmod u+x for this to work).
        Should I go update that on the original post?

        I think in this case you don't need to, it's just for future reference, thanks.

        in your code tags you've included the command-line invocations that I'd have to trim
        I tend to think that it provides context ... Might pre tags work here?

        Yes you're right - I didn't mean to make it sound like it's not a good idea, context can certainly be useful in some cases - the main point was not to put it in the same <code> tag as the code, to make downloading easier. <pre> tags have the issue that HTML and PerlMonks special characters have to be escaped (as you can see your <pre> tag has been rendered with links in it), so two separate sets of <code> tags work. Or, here's how I might have written that post (note you can use <code> tags in paragraphs as well):

        Here is the script 3.rm.pl, which I run via ./3.rm.pl:

        #!/usr/bin/perl -w use 5.011; ...

        And here is the output:

        ["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd "] ["abcdef", "abcdef", "abcde ", " bcdef"] ...

        Also, command lines like cat or perl script.pl are simple enough that we usually don't need to see them, it only becomes important when there are additional arguments involved. (And for some questions, it can be relevant whether a script was invoked as ./script.pl or perl script.pl, but that's not too often.)

        What I seek to do is pass the first test...then others....

        Sometimes it can be very useful to write the tests first, as it forces one to think about the API and what the output should ideally look like.

        Can't use string ("abcdef") as an ARRAY ref while "strict refs" in use at ./3.rm.pl line 59.

        getsubset expects an array of arrays, but $out is just an array of strings. Assuming you want each character to be a "column", you could do $out = [ map { [split //] } @$out ]; after $out = make_rectangular(..., or you integrate it directly in the push in your make_rectangular like so: push @out, [ split //, sprintf "%-*s", $maxlength, $trimmed ]; - either of those changes make your test pass. (Note you should call done_testing; after your tests.)

Re: rectangularizing input to become array
by Athanasius (Archbishop) on Feb 27, 2019 at 07:31 UTC

    Hello Aldebaran,

    Just a couple of points in haukex’s excellent answer that I would like to emphasise:

    1. The most important data missing from your SSCCE is your desired output. “A picture is worth a thousand words.”
    2. This line:
      my @lines = $path_to_file->slurp;
      almost certainly doesn’t do what you think it does. The documentation for Path::Tiny::slurp says that it “Reads file contents into a scalar.” So after that line of code is executed, the array @lines contains a single entry, identical to the string previously assigned to $guts. To get an array of lines, you need to split the string, either on newlines as haukex showed:
      my $guts = $path_to_file->slurp; my @lines = split /\n/, $guts;
      or using the special multiline pattern documented in split:
      my $guts = $path_to_file->slurp; my @lines = split /^/, $guts;
      The latter preserves newlines in the input data, including blank lines at the end of the input file; the former does not.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      my @lines = $path_to_file->slurp; almost certainly doesn’t do what you think it does.

      It did not. I've been whittling this SSCCE down from a river of mojibake and woe, my soaked and freezing body wondering where my skills to deal with such environments have been. Well, getting the logic from Path::Tiny wrong was one thing that had me beat. I was reading this as:

      @lines = $file->lines;

      , which, I believe would produce different results. It's very difficult to diagnose path and file input problems from the net, but you and haukex have done exactly that. Thank you.

      What can the OP do about misapprehension? (Open question) I would like to introduce a little bit of code to test whether I have these data represented correctly. I frequently find that I'm off by a pair of square brackets or quotes and commas. I'll use readmore tags for output then new source for the caller.

      Thanks all for comments,

      2019-03-01 Athanasius changed one set of pre tags to code tags

Re: rectangularizing input to become array
by johngg (Canon) on Feb 27, 2019 at 11:38 UTC

    Note also, as an alternative to substr, that pack with the A template will either truncate or pad to the right with spaces.

    johngg@shiraz:~/perl/Monks$ perl -Mstrict -Mwarnings -MData::Dumper -e + ' open my $inFH, q{<}, \ <<__EOD__ or die $!; abcdef abcdefg abcde bcdefgh bcd __EOD__ chomp( my @lines = <$inFH> ); close $inFH or die $!; my $width = 6; my $height = 4; my $raRect = makeRect( \ @lines, $width, $height ); print Data::Dumper->Dumpxs( [ $raRect ], [ qw{ raRect } ] ); sub makeRect { my( $raLines, $width, $height ) = @_; my @rect; push @rect, pack qq{A$width}, shift @{ $raLines } for 1 .. $height +; return \ @rect; }' $raRect = [ 'abcdef', 'abcdef', 'abcde ', ' bcdef' ];

    I hope this is of interest.

    Cheers,

    JohnGG

      pack with the A template

      The many and various uses of pack. I'm glad to have another useful example. What's the A template?

      Code replication between readmore tags, invocation, output, then source.

      Can you say a few words about this line of code (I've never seen this before in Data::Dumper)?

      print Data::Dumper->Dumpxs( [ $raRect ], [ qw{ raRect } ] );

      Thanks.

        Unlike Dumper(), Data::Dumper->Dump() and ->Dumpxs() are not exported by Data::Dumper. However, for a little extra effort they do provide clearer output, especially if examining multiple data structures. They allow you to distinguish arrays and references to arrays, ditto for hashes. Many Monks recommend and use the more modern and flexible Data::Dump module but Data::Dumper has the advantage of being in core from way back so is useful if maintaining elderly servers in a closed environment running, say, Perl 5.8 or even earlier. Here is a simple example.

        I hope this is helpful.

        Cheers,

        JohnGG

        What's the A template?

        As documented in pack, it's a "text (ASCII) string, will be space padded." I showed the differences between some of those pack templates here.

        Can you say a few words about this line of code (I've never seen this before in Data::Dumper)? print Data::Dumper->Dumpxs( [ $raRect ], [ qw{ raRect } ] );

        Simplifying a lot, Dumpxs is just another name for Dump. (See Where is Data::Dumper->Dumpx?)