Bod has asked for the wisdom of the Perl Monks concerning the following question:

One of the modules I have recently released to CPAN reads JSON data from STDIN on a webserver. To write the tests I have had to simulate this.

Using code I found in an answer on SO, I have written code similar to this test:

use strict; use warnings; *STDIN = *DATA; while (my $test = <STDIN>) { print "$test\n"; } __DATA__ one two three four
But I don't understand how or why it works...

I've read the documentation and still don't understand it.

My understanding of a typeglob is that it is a legacy left over from the days before Perl had references. It manipulates the scalar, array and hash of the same name simultaneously. Still, it shouldn't be used in modern Perl because it is generally bad practice to call arrays, hashes and scalars by the same name and references have replaced its necessity. References, thanks to my time in The Monastery, I understand reasonably well.

Is there a better way to do what the above code does?

Can you help me understand this use of a typeglob and typeglobs more generally?

Am I able to 'reset' the __DATA__ input so that I can use the same input for testing again or should I put multiple tests in different test files?

Replies are listed 'Best First'.
Re: STDIN typeglob
by hv (Prior) on Jun 11, 2023 at 19:40 UTC

    One of the benefits of writing tests (and particularly of TDD) is that it can give you a signal about your interface: if something is difficult to write tests for, maybe it's the interface that should change.

    A module providing a function that reads from STDIN would be an example of that: perhaps it would be easier to test, _and_ provide a more powerful, flexible function if the function were to accept the filehandle to read from as an argument instead.

    A typical way to use such a function to read from STDIN would be to pass a reference to the glob:

    MyModule::function(\*STDIN);

    Typeglobs aren't really a "legacy left over from the days before Perl had references", rather they expose aspects of how Perl works internally. The introduction of references certainly reduced the number of situations where one needs to use globs, but because filehandles and directory handles don't have their own sigil to address them directly (the way $STDIN, @STDIN, %STDIN, &STDIN do), a glob reference as in the example above is still a perfectly fine way to access them.

    Another example is for getting clever with generated code, for example to auto-generate accessors for an object:

    for my $accessor (qw{ foo bar }) { my $method = sub { my($self) = @_; return $self->{$accessor}; }; # inject it as a named subroutine no strict 'refs'; *$accessor = $method; }

    This works due to one of the "magic" aspects of globs: if you assign a reference to a glob, it will store the thing referenced in the appropriate slot. In this case we are assigning a subroutine reference, so that loops creates subroutines "foo" and "bar" (almost) exactly as if we had defined them in the normal way like:

    sub foo { my($self) = @_; return $self->{foo}; }
      One of the modules I have recently released to CPAN reads JSON data from STDIN on a webserver.

      One of the benefits of writing tests (and particularly of TDD) is that it can give you a signal about your interface: if something is difficult to write tests for, maybe it's the interface that should change.

      Reading from STDIN on a webserver sounds very much like CGI. That's not a problem as such, but there are many other interfaces where data from the web browser is not passed via STDIN. Already FastCGI, which is only a tiny step away from CGI, does not use that simple interface (but FCGI and CGI::Fast can do a lot to hide that fact). And when it comes to other interfaces to webservers, like modperl, STDIN is not used at all (again, there are compatibility layers like ModPerl::Registry).

      In other words, passing a handle to the reading function might be a smarter solution. Perhaps, your module should not fetch the data at all, but just accept the data as a scalar value. Both would also allow for easier testing.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Perhaps, your module should not fetch the data at all, but just accept the data as a scalar value. Both would also allow for easier testing.

        That is exactly what I was thinking!

        In the hope it clarifies Bod's question, I think the module in question is Business::Stripe::Webhook, whose version 1.0 constructor is:

        sub new { my $class = shift; my %vars = @_; $vars{'error'} = ''; $vars{'reply'} = { 'status' => 'noaction', 'sent_to' => [ ], 'sent_to_all' => 'false', }; if (exists $ENV{'GATEWAY_INTERFACE'}) { read(STDIN, $vars{'payload'}, $ENV{'CONTENT_LENGTH'}); $vars{'webhook'} = decode_json($vars{'payload'}) if $vars{'pay +load'}; $vars{'error'} = 'No payload data' unless $vars{'webhook'}; } else { $vars{'error'} = 'Looks like this is not a web request!'; } return bless \%vars, $class; }

        Though I'm definitely not a Web programmer, from an interface and TDD point of view, I pulled a face the instant I saw the constructor using an environment variable to decide whether to read from STDIN or not.

        It seems clearer and easier to test if this module were to simply accept a payload property. That way, the module's tests can easily pass in all sorts of dodgy payloads to see how it handles bad input.

        That is, instead of trying to do everything in one module, use several, smaller, more cohesive modules to get the job done.

      if something is difficult to write tests for, maybe it's the interface that should change

      It's not that it is difficult to write the tests, it is difficult to simulate the module being connected to the API when it is being tested and isn't connected to a live API.

Re: STDIN typeglob
by pryrt (Abbot) on Jun 11, 2023 at 17:32 UTC
    open's documentation has a whole section on duplication filehandles, which shows different perlish way of duplicating (and temporarily overriding) the standard filehandles like STDIN. But using typeglobs for such a thing doesn't seem inherently un-perlish to me (not that I'm an expert) -- though I cannot explain it to you in technical terms.

    You can seek on the DATA filehandle, though since it's actually partway through the active file, you'll want to use tell on DATA before doing any input from DATA so you'll know where to seek to in order to find the beginning of the _DATA_ section. There is some more on _DATA_ in Special Literals

Re: STDIN typeglob
by kcott (Archbishop) on Jun 12, 2023 at 02:50 UTC

    G'day Bod,

    "... reads JSON data from STDIN on a webserver. ... I have had to simulate this."

    There's a number of ways a program can read from STDIN:

    # By default $ cat > TabNL Tab NL # By redirection $ cat -vet < TabNL Tab^INL$ # By piping $ cat TabNL | cat -vet Tab^INL$ # By using '-' as a special filename $ cat -vet - Tab NL Tab^INL$ # By others I didn't immediately think of

    Which of those methods does your module use? Knowing this will allow us to better advise you on ways to perform the simulation. :-)

    "Using code I found in an answer on SO ... *STDIN = *DATA; ..."

    That's not really simulating STDIN. You're just rebadging an existing filehandle:

    $ perl -E ' use strict; use warnings; say "Real \\*STDIN fileno: ", fileno(\*STDIN); say "Real \\*DATA fileno: ", fileno(\*DATA); *STDIN = *DATA; say "Fake \\*STDIN fileno: ", fileno(\*STDIN); __DATA__ some data ' Real \*STDIN fileno: 0 Real \*DATA fileno: 3 Fake \*STDIN fileno: 3

    Beyond simulating the input, it would probably help to have some idea of what tests you intend to run.

    Here's a test script that simulates JSON being piped to your application. It's subsequently decoded and compared with reference data via is_deeply() (presumably you'd have more useful tests here). Note how you can run the tests on multiple JSON files.

    ken@titan ~/tmp/pm_11152777_test_stdin/t $ cat test_json.t #!perl use strict; use warnings; use autodie; use Cwd 'abs_path'; use File::Basename 'dirname'; my $THISDIR; BEGIN { $THISDIR = dirname abs_path __FILE__ } use JSON::MaybeXS; use POSIX '_exit'; use Test::More; my @file_bases = qw{test1 testA}; plan tests => 0+@file_bases; for my $file_base (@file_bases) { my $json_data = ''; my $child_pid = open my $from_kid, '-|'; if ($child_pid) { # parent process (pipe from child): # reads JSON from "effective" STDIN while (my $line = <$from_kid>) { $json_data .= $line; } waitpid $child_pid, 0; } else { # child process (pipe to parent): # writes JSON to STDOUT my $json_file = "$THISDIR/../data/$file_base.json"; open my $json_fh, '<', $json_file; while (my $line = <$json_fh>) { print $line; } _exit 0; } my $perl_data = decode_json($json_data); my $reference_data = do "$THISDIR/../data/$file_base.perl"; is_deeply $perl_data, $reference_data, "Testing '$file_base'"; }

    Here's the test data:

    ken@titan ~/tmp/pm_11152777_test_stdin/data $ cat test1.json { "key1" : "val1", "key2" : [ "elem1", "elem2", "elem3" ], "key3" : { "name1" : "value1", "name2" : "value2" } } $ cat test1.perl { key1 => 'val1', key2 => [qw{elem1 elem2 elem3}], key3 => {name1 => 'value1', name2 => 'value2'}, }; $ cat testA.json { "keyA" : "valA", "keyB" : [ "elemA", "elemB", "elemC" ], "keyC" : { "nameA" : "valueA", "nameB" : "valueB" } } $ cat testA.perl { keyA => 'valA', keyB => [qw{elemA elemB elemC}], keyC => {nameA => 'valueA', nameB => 'valueB'}, };

    And here's an actual test run:

    ken@titan ~/tmp/pm_11152777_test_stdin $ prove -v t/test_json.t t/test_json.t .. 1..2 ok 1 - Testing 'test1' ok 2 - Testing 'testA' ok All tests successful. Files=1, Tests=2, 1 wallclock secs ( 0.01 usr 0.03 sys + 0.12 cusr + 0.08 csys = 0.25 CPU) Result: PASS

    — Ken

      it would probably help to have some idea of what tests you intend to run

      Thanks kcott,

      eyepopslikeamosquito identified the new method in this comment - Re^3: STDIN typeglob

      The test I am trying to run is like this only with a bigger JSON object...

      #!perl use 5.006; use strict; use warnings; use Test::More; use Business::Stripe::Webhook; plan tests => 7; *STDIN = *DATA; my $webhook_fail = Business::Stripe::Webhook->new( 'signing_secret' => 'whsec_...', 'invoice-paid' => \&pay_invoice, ); ok( !$webhook_fail->success, "Didn't instantiate" ); is( $webhook_fail->error, "Looks like this is not a web request!", "No +t a web request" ); # Pretend we are on a webserver $ENV{'GATEWAY_INTERFACE'} = 'CGI/1.1'; $ENV{'CONTENT_LENGTH'} = 10024; $ENV{'HTTP_STRIPE_SIGNATURE'} = 't=ABCDEFGHIJ,v1=abcdefghij'; my $webhook_pass1 = Business::Stripe::Webhook->new( 'invoice-paid' => \&pay_invoice, ); ok( $webhook_pass1->success, "Basic instantiation" ); $webhook_pass1->process(); my $webhook_fail2 = Business::Stripe::Webhook->new( signing_secret => 'whsec_...', 'invoice-paid' => \&pay_invoice, ); is( $webhook_fail2->error, 'No payload data', "No payload for signed i +nstantiation" ); $webhook_fail2->process(); ok( !$webhook_fail2->success, "Signature error" ); is( $webhook_fail2->error, 'Invalid Stripe Signature', "Invalid signat +ure" ); sub pay_invoice { is( $_[0]->{'object'}, 'event', "pay.invoice handled" ); } __DATA__ { "id": "evt_1NFK32EfkkexSbWLZb6LoEap", "object": "event", "api_version": "2020-08-27", "data": { "object": { "id": "in_1NFK30EfkkfpSbWLeMoI8HzB", } } }

      Which of those methods does your module use?

      I reads STDIN like this...

      read(STDIN, $vars{'payload'}, $ENV{'CONTENT_LENGTH'});

      But following advice given elsewhere, that is deprecated now and it is up to the user to read STDIN or wherever else they want to get the data from. They then pass that to the constructor. That is still there as a fallback - for now...

        I reads STDIN [...] but following advice given elsewhere, that is deprecated now [...] That is still there as a fallback - for now...

        Why do you keep that around? Has your module already attracted a relevant number of users? If so, that's nice (stable API), and you should IMHO document a date after which you will remove reading STDIN. If not, just drop it completely.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: STDIN typeglob
by ikegami (Patriarch) on Jun 11, 2023 at 22:35 UTC

    *STDIN = *DATA; makes *STDIN the same glob as *DATA. That makes *STDIN{IO} the same as *DATA{IO}. And <STDIN> reads from *STDIN{IO} (i.e. the file handle associated with the name STDIN).

Re: STDIN typeglob
by Marshall (Canon) on Jun 11, 2023 at 19:09 UTC
    I don't see anything wrong with your code although I have heard that at some point bare word file handles are going to be deprecated.
    I don't think "resetting" the __DATA__ input is what you want for multiple tests although that is possible.

    The DATA file handle is a pre-opened file handle to your Perl script that is pre-seeked to first byte of the line right after __DATA__. To cause a Perl program to read itself, you seek the DATA handle back to be beginning (byte 0) and then print all lines. If you want to just re-read the __DATA__ segment, you can use "tell" to find out the byte position where the DATA segment starts, save that number and then seek to that byte number instead of to byte #0 for the re-read operation.

    Another option is to use variables for the I/O. Note that you can open a scalar for "write" - I would NOT advise doing that with a DATA segment as you are liable to wind up scribbling over your Perl program!
    Consider the following code:

    use strict; use warnings; my $data_set_name; my $another_data_set; open my $data2, '<', \$data_set_name or die "some message $!"; print $_ while (<$data2>); print "\n"; open my $data3, '<', \$another_data_set or die "some message $!"; print $_ while (<$data3>); print "\n"; print "NOW READING MYSELF...\n"; seek(DATA,0,0); print $_ while (<DATA>); # Using BEGIN blocks allows potentially lengthy data # to appear at the end of the program file BEGIN{ $data_set_name = <<END; asdf qerg 5666 END } BEGIN{ $another_data_set = <<EOF 46464 9187 jjh EOF } __DATA__
    Another option is to use Inline::Files. I have used that module before and there can be unexplained weirdness with it! For example, it won't co-exist with the above code. I think because Inline::Files plays some fancy games with BEGIN.

    Added: You can write your code using a lexical file handle, my $input_fh = \*STDIN and then of course set $input_fh = $data2;, etc..

Re: STDIN typeglob
by jwkrahn (Abbot) on Jun 11, 2023 at 22:56 UTC

    There is nothing special about STDIN (or stdin for that matter). It is exactly the same as any other file handle opened read only. The only thing "special" about it is that it is fileno 0.

    Naked blocks are fun! -- Randal L. Schwartz, Perl hacker