STDIN typeglob

Bod has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: STDIN typeglob by hv (Prior) on Jun 11, 2023 at 19:40 UTC
One of the benefits of writing tests (and particularly of TDD) is that it can give you a signal about your interface: if something is difficult to write tests for, maybe it's the interface that should change. A module providing a function that reads from STDIN would be an example of that: perhaps it would be easier to test, _and_ provide a more powerful, flexible function if the function were to accept the filehandle to read from as an argument instead. A typical way to use such a function to read from STDIN would be to pass a reference to the glob: `MyModule::function(\STDIN);` Typeglobs aren't really a "legacy left over from the days before Perl had references", rather they expose aspects of how Perl works internally. The introduction of references certainly reduced the number of situations where one needs to use globs, but because filehandles and directory handles don't have their own sigil to address them directly (the way `$STDIN, @STDIN, %STDIN, &STDIN` do), a glob reference as in the example above is still a perfectly fine way to access them. Another example is for getting clever with generated code, for example to auto-generate accessors for an object: `for my $accessor (qw{ foo bar }) { my $method = sub { my($self) = @_; return $self->{$accessor}; }; # inject it as a named subroutine no strict 'refs'; $accessor = $method; }` [download] This works due to one of the "magic" aspects of globs: if you assign a reference to a glob, it will store the thing referenced in the appropriate slot. In this case we are assigning a subroutine reference, so that loops creates subroutines "foo" and "bar" (almost) exactly as if we had defined them in the normal way like: `sub foo { my($self) = @_; return $self->{foo}; }` [download]	[reply] [d/l] [select]
Re^2: STDIN typeglob by afoken (Chancellor) on Jun 12, 2023 at 09:35 UTC
One of the modules I have recently released to CPAN reads JSON data from STDIN on a webserver. One of the benefits of writing tests (and particularly of TDD) is that it can give you a signal about your interface: if something is difficult to write tests for, maybe it's the interface that should change. Reading from STDIN on a webserver sounds very much like CGI. That's not a problem as such, but there are many other interfaces where data from the web browser is not passed via STDIN. Already FastCGI, which is only a tiny step away from CGI, does not use that simple interface (but FCGI and CGI::Fast can do a lot to hide that fact). And when it comes to other interfaces to webservers, like modperl, STDIN is not used at all (again, there are compatibility layers like ModPerl::Registry). In other words, passing a handle to the reading function might be a smarter solution. Perhaps, your module should not fetch the data at all, but just accept the data as a scalar value. Both would also allow for easier testing. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^3: STDIN typeglob by eyepopslikeamosquito (Archbishop) on Jun 12, 2023 at 13:17 UTC
Perhaps, your module should not fetch the data at all, but just accept the data as a scalar value. Both would also allow for easier testing. That is exactly what I was thinking! In the hope it clarifies Bod's question, I think the module in question is Business::Stripe::Webhook, whose version `1.0` constructor is: `sub new { my $class = shift; my %vars = @_; $vars{'error'} = ''; $vars{'reply'} = { 'status' => 'noaction', 'sent_to' => [ ], 'sent_to_all' => 'false', }; if (exists $ENV{'GATEWAY_INTERFACE'}) { read(STDIN, $vars{'payload'}, $ENV{'CONTENT_LENGTH'}); $vars{'webhook'} = decode_json($vars{'payload'}) if $vars{'pay +load'}; $vars{'error'} = 'No payload data' unless $vars{'webhook'}; } else { $vars{'error'} = 'Looks like this is not a web request!'; } return bless \%vars, $class; }` [download] Though I'm definitely not a Web programmer, from an interface and TDD point of view, I pulled a face the instant I saw the constructor using an environment variable to decide whether to read from `STDIN` or not. It seems clearer and easier to test if this module were to simply accept a `payload` property. That way, the module's tests can easily pass in all sorts of dodgy payloads to see how it handles bad input. That is, instead of trying to do everything in one module, use several, smaller, more cohesive modules to get the job done.	[reply] [d/l] [select]
Re^4: STDIN typeglob by afoken (Chancellor) on Jun 12, 2023 at 19:53 UTC
Re^5: STDIN typeglob by Bod (Parson) on Jun 13, 2023 at 20:42 UTC
Re^4: STDIN typeglob by Bod (Parson) on Jun 13, 2023 at 20:58 UTC
Re^5: STDIN typeglob by eyepopslikeamosquito (Archbishop) on Jun 14, 2023 at 00:24 UTC
Some notes below your chosen depth have not been shown here
Re^2: STDIN typeglob by Bod (Parson) on Jun 11, 2023 at 21:48 UTC
if something is difficult to write tests for, maybe it's the interface that should change It's not that it is difficult to write the tests, it is difficult to simulate the module being connected to the API when it is being tested and isn't connected to a live API.	[reply]
Re^3: STDIN typeglob by hippo (Archbishop) on Jun 11, 2023 at 22:07 UTC
You could perhaps use one of the various mocking modules to accomplish this. I often use Test::MockModule but there are plenty of others to choose from too (see for example this tutorial for Test::MockObject). 🦛	[reply]
Re: STDIN typeglob by pryrt (Abbot) on Jun 11, 2023 at 17:32 UTC
open's documentation has a whole section on duplication filehandles, which shows different perlish way of duplicating (and temporarily overriding) the standard filehandles like STDIN. But using typeglobs for such a thing doesn't seem inherently un-perlish to me (not that I'm an expert) -- though I cannot explain it to you in technical terms. You can seek on the DATA filehandle, though since it's actually partway through the active file, you'll want to use tell on DATA before doing any input from DATA so you'll know where to seek to in order to find the beginning of the _DATA_ section. There is some more on _DATA_ in Special Literals	[reply]
Re: STDIN typeglob by kcott (Archbishop) on Jun 12, 2023 at 02:50 UTC
G'day Bod, "... reads JSON data from `STDIN` on a webserver. ... I have had to simulate this." There's a number of ways a program can read from `STDIN`: `# By default $ cat > TabNL Tab NL # By redirection $ cat -vet < TabNL Tab^INL$ # By piping $ cat TabNL \| cat -vet Tab^INL$ # By using '-' as a special filename $ cat -vet - Tab NL Tab^INL$ # By others I didn't immediately think of` [download] Which of those methods does your module use? Knowing this will allow us to better advise you on ways to perform the simulation. :-) "Using code I found in an answer on SO ... `STDIN = DATA;` ..." That's not really simulating `STDIN`. You're just rebadging an existing filehandle: `$ perl -E ' use strict; use warnings; say "Real \\STDIN fileno: ", fileno(\STDIN); say "Real \\DATA fileno: ", fileno(\DATA); STDIN = DATA; say "Fake \\STDIN fileno: ", fileno(\STDIN); __DATA__ some data ' Real \STDIN fileno: 0 Real \DATA fileno: 3 Fake \*STDIN fileno: 3` [download] Beyond simulating the input, it would probably help to have some idea of what tests you intend to run. Here's a test script that simulates JSON being piped to your application. It's subsequently decoded and compared with reference data via `is_deeply()` (presumably you'd have more useful tests here). Note how you can run the tests on multiple JSON files. ken@titan ~/tmp/pm_11152777_test_stdin/t $ cat test_json.t #!perl use strict; use warnings; use autodie; use Cwd 'abs_path'; use File::Basename 'dirname'; my $THISDIR; BEGIN { $THISDIR = dirname abs_path __FILE__ } use JSON::MaybeXS; use POSIX '_exit'; use Test::More; my @file_bases = qw{test1 testA}; plan tests => 0+@file_bases; for my $file_base (@file_bases) { my $json_data = ''; my $child_pid = open my $from_kid, '-\|'; if ($child_pid) { # parent process (pipe from child): # reads JSON from "effective" STDIN while (my $line = <$from_kid>) { $json_data .= $line; } waitpid $child_pid, 0; } else { # child process (pipe to parent): # writes JSON to STDOUT my $json_file = "$THISDIR/../data/$file_base.json"; open my $json_fh, '<', $json_file; while (my $line = <$json_fh>) { print $line; } _exit 0; } my $perl_data = decode_json($json_data); my $reference_data = do "$THISDIR/../data/$file_base.perl"; is_deeply $perl_data, $reference_data, "Testing '$file_base'"; } [download] Here's the test data: ken@titan ~/tmp/pm_11152777_test_stdin/data $ cat test1.json { "key1" : "val1", "key2" : [ "elem1", "elem2", "elem3" ], "key3" : { "name1" : "value1", "name2" : "value2" } } $ cat test1.perl { key1 => 'val1', key2 => [qw{elem1 elem2 elem3}], key3 => {name1 => 'value1', name2 => 'value2'}, }; $ cat testA.json { "keyA" : "valA", "keyB" : [ "elemA", "elemB", "elemC" ], "keyC" : { "nameA" : "valueA", "nameB" : "valueB" } } $ cat testA.perl { keyA => 'valA', keyB => [qw{elemA elemB elemC}], keyC => {nameA => 'valueA', nameB => 'valueB'}, }; [download] And here's an actual test run: `ken@titan ~/tmp/pm_11152777_test_stdin $ prove -v t/test_json.t t/test_json.t .. 1..2 ok 1 - Testing 'test1' ok 2 - Testing 'testA' ok All tests successful. Files=1, Tests=2, 1 wallclock secs ( 0.01 usr 0.03 sys + 0.12 cusr + 0.08 csys = 0.25 CPU) Result: PASS` [download] — Ken	[reply] [d/l] [select]
Re^2: STDIN typeglob by Bod (Parson) on Jun 13, 2023 at 21:06 UTC
it would probably help to have some idea of what tests you intend to run Thanks kcott, eyepopslikeamosquito identified the `new` method in this comment - Re^3: STDIN typeglob The test I am trying to run is like this only with a bigger JSON object... #!perl use 5.006; use strict; use warnings; use Test::More; use Business::Stripe::Webhook; plan tests => 7; STDIN = DATA; my $webhook_fail = Business::Stripe::Webhook->new( 'signing_secret' => 'whsec_...', 'invoice-paid' => \&pay_invoice, ); ok( !$webhook_fail->success, "Didn't instantiate" ); is( $webhook_fail->error, "Looks like this is not a web request!", "No +t a web request" ); # Pretend we are on a webserver $ENV{'GATEWAY_INTERFACE'} = 'CGI/1.1'; $ENV{'CONTENT_LENGTH'} = 10024; $ENV{'HTTP_STRIPE_SIGNATURE'} = 't=ABCDEFGHIJ,v1=abcdefghij'; my $webhook_pass1 = Business::Stripe::Webhook->new( 'invoice-paid' => \&pay_invoice, ); ok( $webhook_pass1->success, "Basic instantiation" ); $webhook_pass1->process(); my $webhook_fail2 = Business::Stripe::Webhook->new( signing_secret => 'whsec_...', 'invoice-paid' => \&pay_invoice, ); is( $webhook_fail2->error, 'No payload data', "No payload for signed i +nstantiation" ); $webhook_fail2->process(); ok( !$webhook_fail2->success, "Signature error" ); is( $webhook_fail2->error, 'Invalid Stripe Signature', "Invalid signat +ure" ); sub pay_invoice { is( $_[0]->{'object'}, 'event', "pay.invoice handled" ); } __DATA__ { "id": "evt_1NFK32EfkkexSbWLZb6LoEap", "object": "event", "api_version": "2020-08-27", "data": { "object": { "id": "in_1NFK30EfkkfpSbWLeMoI8HzB", } } } [download]	[reply] [d/l] [select]
Re^2: STDIN typeglob by Bod (Parson) on Jun 18, 2023 at 17:10 UTC
Which of those methods does your module use? I reads `STDIN` like this... `read(STDIN, $vars{'payload'}, $ENV{'CONTENT_LENGTH'});` [download] But following advice given elsewhere, that is deprecated now and it is up to the user to read `STDIN` or wherever else they want to get the data from. They then pass that to the constructor. That is still there as a fallback - for now...	[reply] [d/l] [select]
Re^3: STDIN typeglob by afoken (Chancellor) on Jun 18, 2023 at 18:35 UTC
I reads STDIN [...] but following advice given elsewhere, that is deprecated now [...] That is still there as a fallback - for now... Why do you keep that around? Has your module already attracted a relevant number of users? If so, that's nice (stable API), and you should IMHO document a date after which you will remove reading STDIN. If not, just drop it completely. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^4: STDIN typeglob by Bod (Parson) on Jun 18, 2023 at 18:49 UTC
Re^5: STDIN typeglob by afoken (Chancellor) on Jun 18, 2023 at 19:57 UTC
Some notes below your chosen depth have not been shown here
Re: STDIN typeglob by ikegami (Patriarch) on Jun 11, 2023 at 22:35 UTC
`STDIN = DATA;` makes `STDIN` the same glob as `DATA`. That makes `STDIN{IO}` the same as `DATA{IO}`. And `<STDIN>` reads from `*STDIN{IO}` (i.e. the file handle associated with the name `STDIN`).	[reply] [d/l] [select]
Re: STDIN typeglob by Marshall (Canon) on Jun 11, 2023 at 19:09 UTC
I don't see anything wrong with your code although I have heard that at some point bare word file handles are going to be deprecated. I don't think "resetting" the __DATA__ input is what you want for multiple tests although that is possible. The DATA file handle is a pre-opened file handle to your Perl script that is pre-seeked to first byte of the line right after __DATA__. To cause a Perl program to read itself, you seek the DATA handle back to be beginning (byte 0) and then print all lines. If you want to just re-read the __DATA__ segment, you can use "tell" to find out the byte position where the DATA segment starts, save that number and then seek to that byte number instead of to byte #0 for the re-read operation. Another option is to use variables for the I/O. Note that you can open a scalar for "write" - I would NOT advise doing that with a DATA segment as you are liable to wind up scribbling over your Perl program! Consider the following code: use strict; use warnings; my $data_set_name; my $another_data_set; open my $data2, '<', \$data_set_name or die "some message $!"; print $_ while (<$data2>); print "\n"; open my $data3, '<', \$another_data_set or die "some message $!"; print $_ while (<$data3>); print "\n"; print "NOW READING MYSELF...\n"; seek(DATA,0,0); print $_ while (<DATA>); # Using BEGIN blocks allows potentially lengthy data # to appear at the end of the program file BEGIN{ $data_set_name = <<END; asdf qerg 5666 END } BEGIN{ $another_data_set = <<EOF 46464 9187 jjh EOF } __DATA__ [download] Another option is to use Inline::Files. I have used that module before and there can be unexplained weirdness with it! For example, it won't co-exist with the above code. I think because Inline::Files plays some fancy games with BEGIN. Added: You can write your code using a lexical file handle, `my $input_fh = \*STDIN` and then of course set `$input_fh = $data2;`, etc..	[reply] [d/l] [select]
Re: STDIN typeglob by jwkrahn (Abbot) on Jun 11, 2023 at 22:56 UTC
There is nothing special about `STDIN` (or `stdin` for that matter). It is exactly the same as any other file handle opened read only. The only thing "special" about it is that it is `fileno` 0. Naked blocks are fun! -- Randal L. Schwartz, Perl hacker	[reply]