in reply to Re: rectangularizing input to become array
in thread rectangularizing input to become array

BTW, I'm not sure what you want to do with the tab character in your input file? This code counts it as a single character.

I'm making every effort to quote haukex fairly, but I will re-order for thematic and write-up reasons. I threw the tab character in to be a possible problem. I think I deal with it with:

$input=~ s/\t/ /g;

I'm also trying to make the write-up as austere as it can be in terms of using vertical space, so I will continue in readmore tags. I think I get more eyes if people don't have to scroll down to continue finding good content, and the thread might read more about the solutions as opposed to the problem. I haven't even gotten to the third one yet.

Typically, the content of such directories is not what would get modified by a user or get modified during the run of a script.

I'm reminded what someone told me the first time I got on the U-bahn in Berlin with my bike: "Das hast Du voellig verkehrt getan." I figured out what signs he was pointed at and never made the same mistake. Were it only one and done with perl....

So there's the base directory of the script. I wouldn't want output there. I've now put a module in the lib folder that presumably would be done before release. It is certainly not in such a state now. Given that we need a "hard" output file, would you rather put such a thing on our one and only subdirectory or split input and output into separate directories? Won't that complicate diff'ing?

you've included a link to GitHub, which might go down sometime in the future

And they may change over time. For example, I added a module to lib/ which brought the size of the zip file from 1.1 to 3.1 k. Should I go update that on the original post?

For this subthread, I would like to stick to the SSCCE input you offered.

in your code tags you've included the command-line invocations that I'd have to trim

I tend to think that it provides context; where I might describe a situation completely "verkehrt," the computer commands inform the sleuths who can divine what I am actually asking my computer to do. Sometimes I can get too cute. Sometimes, I can't read my own input or output. We have to be somewhat ecumenical about the starts of scripts. Might pre tags work here? I'll put pre tags on the output and invocation line, and leave everything inside code tags be an executable.

$ ./3.rm.pl 
"abcdef", "abcdefg", "abcde", " bcdefgh", " bcd "
"abcdef", "abcdef", "abcde ", " bcdef"
inside first anonymous block
Can't use string ("abcdef") as an ARRAY ref while "strict refs" in use at ./3.rm.pl line 59.
$ cat 3.rm.pl 

#!/usr/bin/perl -w use 5.011; use Data::Dump; my $input = <<'(END INPUT)'; abcdef abcdefg abcde bcdefgh bcd (END INPUT) $input=~ s/\t/ /g; my @lines = split /\n/, $input; dd \@lines; my $out = make_rectangular( \@lines, 4, 6 ); dd $out; use Test::More; { say "inside first anonymous block"; my $subset = getsubset( $out, "R1" ); is_deeply $subset, [ [ 'a' .. 'f' ] ]; say "exiting first anonymous block"; } sub make_rectangular { my ( $lines, $maxrows, $maxlength ) = @_; my @out; my $rowcount=1; for my $line (@$lines) { my $trimmed = substr $line, 0, $maxlength; push @out, sprintf "%-*s", $maxlength, $trimmed; last if ++$rowcount>$maxrows; } return \@out; } sub rangeparse { use Carp; local $_ = shift; my @o; # [ row1,col1, row2,col2 ] (-1 = last row/col) if ( @o = /\AR([0-9]+|n)C([0-9]+|n):R([0-9]+|n)C([0-9]+|n)\z/ ) { } elsif (/\AR([0-9]+|n):R([0-9]+|n)\z/) { @o = ( $1, 1, $2, -1 ) } elsif (/\AC([0-9]+|n):C([0-9]+|n)\z/) { @o = ( 1, $1, -1, $2 ) } elsif (/\AR([0-9]+|n)C([0-9]+|n)\z/) { @o = ( $1, $2, $1, $2 ) } elsif (/\AR([0-9]+|n)\z/) { @o = ( $1, 1, $1, -1 ) } elsif (/\AC([0-9]+|n)\z/) { @o = ( 1, $1, -1, $1 ) } else { croak "failed to parse '$_'" } $_ eq 'n' and $_ = -1 for @o; return \@o; } sub getsubset { use Carp; my ( $data, $range ) = @_; my $cols = @{ $$data[0] }; @$_ == $cols or croak "data not rectangular" for @$data; $range = rangeparse($range) unless ref $range eq 'ARRAY'; @$range == 4 or croak "bad size of range"; my @max = ( 0 + @$data, $cols ) x 2; for my $i ( 0 .. 3 ) { $$range[$i] = $max[$i] if $$range[$i] < 0; croak "index $i out of range" if $$range[$i] < 1 || $$range[$i] > $max[$i]; } croak "bad rows $$range[0]-$$range[2]" if $$range[0] > $$range[2]; croak "bad cols $$range[1]-$$range[3]" if $$range[1] > $$range[3]; my @cis = $$range[1] - 1 .. $$range[3] - 1; return [ map { sub { \@_ } ->( @{ $$data[$_] }[@cis] ) } $$range[0] - 1 .. $$range[2] - 1 ]; } __END__

The ultimate two routines are from Selecting Ranges of 2-Dimensional Data, and work fine with other data.

using e.g. Test::More to check if it matches

What I seek to do is pass the first test...then others....

Vielen Dank und Schoenen Gruss aus Amiland,

Replies are listed 'Best First'.
Re^3: rectangularizing input to become array
by haukex (Archbishop) on Feb 28, 2019 at 23:08 UTC
    So there's the base directory of the script. I wouldn't want output there. ... would you rather put such a thing on our one and only subdirectory or split input and output into deparate directories?

    Here's an idea for how to handle a script with a library.pm file or two that goes with it:

    • Say /home/user is the base directory.
    • I put my script in e.g. /home/user/myscript.
      • Libraries (.pms) could go in the same directory, or in /home/user/myscript/lib, that doesn't really make a difference for small scripts - if you've got a lot of .pm files then a lib dir is a good idea.
      • Ideally, /home/user/myscript is also a git working copy - in which case input and output data doesn't really belong in that directory anyway, as otherwise it'd have to be added to .gitignore.
    • The script can be made to not worry about which directory it is located in using code like this:
      use FindBin; use lib $FindBin::Bin;
      Or, if there's a lib subdirectory, using the following (platform-independent) code:
      use FindBin; use File::Spec::Functions qw/catdir/; use lib catdir($FindBin::Bin, 'lib');
    • You can put your input data in e.g. /home/user/mydata, cd to that directory, and run your script with e.g. perl ../myscript/script.pl input.txt, and it should generate its output in the current directory.
    • If it's a script you use a lot, and you don't want to type out its path all the time, you could add it to your PATH. For example, on a couple of my boxes, I have lines like this in my ~/.profile: test -d "$HOME/myscript" && PATH="$HOME/myscript:$PATH" (the script needs to be chmod u+x for this to work).
    Should I go update that on the original post?

    I think in this case you don't need to, it's just for future reference, thanks.

    in your code tags you've included the command-line invocations that I'd have to trim
    I tend to think that it provides context ... Might pre tags work here?

    Yes you're right - I didn't mean to make it sound like it's not a good idea, context can certainly be useful in some cases - the main point was not to put it in the same <code> tag as the code, to make downloading easier. <pre> tags have the issue that HTML and PerlMonks special characters have to be escaped (as you can see your <pre> tag has been rendered with links in it), so two separate sets of <code> tags work. Or, here's how I might have written that post (note you can use <code> tags in paragraphs as well):

    Here is the script 3.rm.pl, which I run via ./3.rm.pl:

    #!/usr/bin/perl -w use 5.011; ...

    And here is the output:

    ["abcdef", "abcdefg", "abcde", " bcdefgh", " bcd "] ["abcdef", "abcdef", "abcde ", " bcdef"] ...

    Also, command lines like cat or perl script.pl are simple enough that we usually don't need to see them, it only becomes important when there are additional arguments involved. (And for some questions, it can be relevant whether a script was invoked as ./script.pl or perl script.pl, but that's not too often.)

    What I seek to do is pass the first test...then others....

    Sometimes it can be very useful to write the tests first, as it forces one to think about the API and what the output should ideally look like.

    Can't use string ("abcdef") as an ARRAY ref while "strict refs" in use at ./3.rm.pl line 59.

    getsubset expects an array of arrays, but $out is just an array of strings. Assuming you want each character to be a "column", you could do $out = [ map { [split //] } @$out ]; after $out = make_rectangular(..., or you integrate it directly in the push in your make_rectangular like so: push @out, [ split //, sprintf "%-*s", $maxlength, $trimmed ]; - either of those changes make your test pass. (Note you should call done_testing; after your tests.)

      Again I'm gonna try to keep my response short vertically by putting the bulk of it in readmore tags. If you like this tack, hey, throw a hermit an upvote.

      This indicates a lot of successes, including my subroutines for printing 2-d arrays being vindicated as not being broken. On github here: sscce

      I think this is an elegant solution. Thank you for your generous comments.

        you could add it to your PATH
        I did all this, but what has it availed me?
        $ ./5.sscce.pl bash: ./5.sscce.pl: No such file or directory

        Note that ./script.pl tells the shell to only look in the current directory. If you run a shell command without the ./ or any other path name, the shell will look in your PATH for a file by that name and run it (it needs to have executable permissions).

        use lib "lib"; ... I don't completely understand how and why people use it.

        Once you've added the script to your PATH and are able to call it from any directory, relative paths used to load libraries will no longer work. That's what my FindBin example does: it'll locate the directory where the script is, no matter what the current working directory is, and then use the lib directory relative to the script's location, not the current working directory.

      I believe this has passed its first test. Output, then source, just one code tag, buyer beware:

      (Note you should call done_testing; after your tests.)

      Copy that, so, yahoo, right? I'll write a few more tests....