cavac has asked for the wisdom of the Perl Monks concerning the following question:

I have the problem that i need to request data from an HTTPS API that can take a minute or two to return. I'm currently using LWP::UserAgent. This blocks my worker script (cyclic executive).

Is there any non-blocking way (or module) to run those requests without forking? E.g., start the request and poll in my worker loop? I have a lot of open file and network handles which i don't want to mess up by forking.

I was thinking along the lines of this pseudo-code:

my $bla = Some::Module->new(); $bla->get("https://example.com/api1?bli=blub"); while(1) { ... if($bla->finished) { my $result = $bla->getResult(); ... # Start next request ... }

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Also check out my sisters artwork and my weekly webcomics

Replies are listed 'Best First'.
Re: LWP::UserAgent non-blocking calls
by Corion (Patriarch) on Oct 22, 2024 at 07:15 UTC

    There is/was Coro::LWP, which integrated LWP with Coro, but so far I have not found a good way to make LWP::UserAgent-based scripts parallel without rewriting them.

    I am fond of using Future, but also had good success using Mojo::UserAgent. Both have APIs somewhat different from LWP::UserAgent though.

    With both systems, the logic would very much like the approach you outlined above:

    # Using (my own) Future::HTTP, which provides an AnyEvent::HTTP-like A +PI my $ua = Future::HTTP->new(); # Fire off all requests at once, rate limiting etc. is left as an exer +cise my @outstanding; for my $url (@requests) { my $res = $ua->http_get($url)->then(sub { my( $body, $data ) = @_; # ... handle the response return $body }); push @outstanding, $res; }; while( @outstanding ) { my $body = (shift @outstanding)->get; ... };

    With Mojolicious you can look at COWS::Crawler, which also implements something like the following API:

    my $crawler = COWS::Crawler->new(); $crawler->submit_request({ method => 'GET', url => $url, info => { u +rl => $url }} ); while( my ($page) = $crawler->next_page ) { my $body = $page->{res}->body; my $url = $page->{req}->req->url; ... };

    But all of these approaches need a real rewrite of the formerly sequential program logic into asynchronous execution.

      Maybe i'm doing something wrong or i'm not understanding how Future::HTTP works. But i can't get it to work in the background while the worker does other stuff.

      It always returns false on $ua->is_async(). I can't change my event loop to a different type, and i don't see a way to poll Future::HTTP directly.

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Also check out my sisters artwork and my weekly webcomics

        I fell into the trap myself. Future::HTTP uses whatever other event loop you already have loaded. If you have no event loop loaded, it uses HTTP::Tiny, which is synchronous.

        The "correct" approach is to load the event loop first (use IO::Async; (or whatever)) and then load/use HTTP::Future.

        I should make this more clear in the documentation resp. fix the usage so that this trap does not exist anymore.

Re: LWP::UserAgent non-blocking calls
by ikegami (Patriarch) on Oct 22, 2024 at 09:57 UTC

    You could use threads.

    But there are ways of making LWP::UserAgent do parallel calls. I'd recommend LWP::Protocol::AnyEvent::http for that.

    But LWP is quite slow. Using libcurl is much faster, and it supports parallel requests. For a serious application, I'd use this.

      Speed and parallel requests are not a requirement in my case. Basically, i need to start a transaction on a credit/debit card payment terminal. Most have a sane API where you just start a transactions, then poll every few seconds to see if there is a result for that transaction (accepted/rejected). This one company made the idiotic decision to go for long polls (with a 60 second transaction timeout), like it's 1995.

      With a non-blocking HTTP client library, i can still simulate the standard behaviour. Marshall suggested HTTP::Async which, at first glance, seems to fit the bill (but i haven't tested it yet).

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Also check out my sisters artwork and my weekly webcomics

        Speed and parallel requests are not a requirement in my case

        But you want parallelism, and the three options I mentioned provide that. And there are again, plus a fourth:

        You didn't say what you are doing in parallel. You didn't say what event loop you are using. So it's hard to evaluate which option will work and which is best. Provide more information, and we can provide a more detailed answer.

Re: LWP::UserAgent non-blocking calls
by bliako (Abbot) on Oct 22, 2024 at 08:04 UTC

    Sometimes breaking a big program into smaller programs focused into doing one thing, can work nicely and be able to deal better with future long-calls. E.g. separate this call into a new script which does exactly that and when is done, saves the results in a redis db whose checking takes millis. Or use some kind of "bus" to connect them all. D-BUS for example.

    Edit: additionally what LWP gets back is not a perl data structure or object, so you will not need to de/serialiase when passing that data around.

      This is for a modular framework with a webserver and configurable workers. The workers automatically have database connections and Net::Clacks access.

      The workers and non-forking, single threaded. And i prefer to keep it that way to simplify development and debugging. And no, using something like redis is definitely not an option, a lot of my stuff runs on embedded cash registers that only sport 4-8 gigs of RAM...

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Also check out my sisters artwork and my weekly webcomics
        This is for a modular framework with a webserver and configurable workers.The workers automatically have database connections and Net::Clacks access.

        That's great. You already have a database and a communication bus, exactly what functionality redis/D-Bus, in my suggestion, would have provided. The crucial question is why, if and when the workers block (as workers do regularly block), the whole system blocks? I would decouple the workers from the "system", whatever that is, even more. Use the bus to send requests to the LWP worker (edit: LWP worker script) which posts its response, when it comes, to the bus+db and poll the bus (non-blocking) at the supervisor. I guess it is simpler for me to say these than to implement or adjust an already implemented system.

Re: LWP::UserAgent non-blocking calls
by Marshall (Canon) on Oct 22, 2024 at 09:46 UTC
Re: LWP::UserAgent non-blocking calls
by sectokia (Friar) on Oct 31, 2024 at 21:51 UTC

    My advice is is to learn event based programming and switch over to a fully async operation based on call backs and promises.

    For example what you are doing is very easy to achieve with:

    AnyEvent AnyEvent::HTTP Promise

      I do a lot of "event based" programming and async stuff. But it is NOT always the best option. I work on a lot of stuff where things are required to happen in a very specific order to make sure data is consistent across multiple systems and complies with local financial law.

      Async operations have their uses, but they are not always the best solution.

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Also check out my sisters artwork and my weekly webcomics
Re: LWP::UserAgent non-blocking calls
by Bod (Parson) on Oct 23, 2024 at 16:38 UTC
    without forking

    No help with your question...sorry...

    But...I'm curious why forking is not an option

      Open file handles, open database transactions and (other) open network connections. For forking to be clean without side effects, i would need to close all those, fork and then reopen them in the parent process.

      My worker processes are cyclic executives that run many modules, each doing it's own thing. The main loop is basically something like this:

      ... ##### MAIN TIMING LOOP ##### sub run($self) { my $runok = 0; eval { # Let STDOUT/STDERR settle down first sleep(0.1); my $nextCycleTime = $self->{config}->{mincycletime} + time; while(1) { my $workCount = $self->{worker}->run(); my $now = time; if($now < $nextCycleTime) { my $sleeptime = $nextCycleTime - $now; #print "** Fast cycle ($sleeptime sec to spare), sleep +ing **\n"; sleep($sleeptime); $nextCycleTime += $self->{config}->{mincycletime}; #print "** Wake-up call **\n"; } else { #print "** Slow cycle **\n"; $nextCycleTime = $self->{config}->{mincycletime} + $no +w; } } $runok = 1; }; if(!$runok) { suicide('RUN FAILED', $EVAL_ERROR); } return; } ... #### WORKER CYCLE #### sub run($self) { my $workCount = 0; # Run cleanup functions in case the last cycle bailed out with cro +ak foreach my $worker (@{$self->{cleanup}}) { my $module = $worker->{Module}; my $funcname = $worker->{Function} ; #$workCount += $module->$funcname(); $module->$funcname(); } # Notify all registered workers about dead children while((my $child = shift @deadchildren)) { foreach my $worker (@{$self->{sigchld}}) { my $module = $worker->{Module}; my $funcname = $worker->{Function} ; $workCount++; $module->$funcname($child); } } # Run all worker functions foreach my $worker (@{$self->{workers}}) { my $module = $worker->{Module}; my $funcname = $worker->{Function} ; $workCount += $module->$funcname(); } # Run cleanup functions foreach my $worker (@{$self->{cleanup}}) { my $module = $worker->{Module}; my $funcname = $worker->{Function} ; #$workCount += $module->$funcname(); $module->$funcname(); } return $workCount; } ...

      It may not be very elegant, but it's simple, easy to debug and easy to balance worker loads by moving module configurations between different XML config files. Yeah, the stuff dynamically configures itself on startup.

      Yes, depending on strict limits to the configuration of a specific worker, i can support forking. But this makes it a pain to work with, which is why i have been going to full "use non-blocking stuff" route for the last decade in my workers. Most of the stuff is low level protocols or web APIs that are already fire-and-check-back-later. I haven't encountered a web API that blocks for up to a minute in like 15 years, so i was unaware which modules were available to solve/circumvent that specific problem.

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Also check out my sisters artwork and my weekly webcomics