Re: LWP::UserAgent non-blocking calls
by Corion (Patriarch) on Oct 22, 2024 at 07:15 UTC
|
There is/was Coro::LWP, which integrated LWP with Coro, but so far I have not found a good way to make LWP::UserAgent-based scripts parallel without rewriting them.
I am fond of using Future, but also had good success using Mojo::UserAgent. Both have APIs somewhat different from LWP::UserAgent though.
With both systems, the logic would very much like the approach you outlined above:
# Using (my own) Future::HTTP, which provides an AnyEvent::HTTP-like A
+PI
my $ua = Future::HTTP->new();
# Fire off all requests at once, rate limiting etc. is left as an exer
+cise
my @outstanding;
for my $url (@requests) {
my $res = $ua->http_get($url)->then(sub {
my( $body, $data ) = @_;
# ... handle the response
return $body
});
push @outstanding, $res;
};
while( @outstanding ) {
my $body = (shift @outstanding)->get;
...
};
With Mojolicious you can look at COWS::Crawler, which also implements something like the following API:
my $crawler = COWS::Crawler->new();
$crawler->submit_request({ method => 'GET', url => $url, info => { u
+rl => $url }} );
while( my ($page) = $crawler->next_page ) {
my $body = $page->{res}->body;
my $url = $page->{req}->req->url;
...
};
But all of these approaches need a real rewrite of the formerly sequential program logic into asynchronous execution. | [reply] [d/l] [select] |
|
Maybe i'm doing something wrong or i'm not understanding how Future::HTTP works. But i can't get it to work in the background while the worker does other stuff.
It always returns false on $ua->is_async(). I can't change my event loop to a different type, and i don't see a way to poll Future::HTTP directly.
| [reply] |
|
I fell into the trap myself. Future::HTTP uses whatever other event loop you already have loaded. If you have no event loop loaded, it uses HTTP::Tiny, which is synchronous.
The "correct" approach is to load the event loop first (use IO::Async; (or whatever)) and then load/use HTTP::Future.
I should make this more clear in the documentation resp. fix the usage so that this trap does not exist anymore.
| [reply] [d/l] |
Re: LWP::UserAgent non-blocking calls
by ikegami (Patriarch) on Oct 22, 2024 at 09:57 UTC
|
You could use threads.
But there are ways of making LWP::UserAgent do parallel calls. I'd recommend LWP::Protocol::AnyEvent::http for that.
But LWP is quite slow. Using libcurl is much faster, and it supports parallel requests. For a serious application, I'd use this.
| [reply] |
|
Speed and parallel requests are not a requirement in my case. Basically, i need to start a transaction on a credit/debit card payment terminal. Most have a sane API where you just start a transactions, then poll every few seconds to see if there is a result for that transaction (accepted/rejected). This one company made the idiotic decision to go for long polls (with a 60 second transaction timeout), like it's 1995.
With a non-blocking HTTP client library, i can still simulate the standard behaviour. Marshall suggested HTTP::Async which, at first glance, seems to fit the bill (but i haven't tested it yet).
| [reply] |
|
Speed and parallel requests are not a requirement in my case
But you want parallelism, and the three options I mentioned provide that. And there are again, plus a fourth:
You didn't say what you are doing in parallel. You didn't say what event loop you are using. So it's hard to evaluate which option will work and which is best. Provide more information, and we can provide a more detailed answer.
| [reply] [d/l] |
Re: LWP::UserAgent non-blocking calls
by bliako (Abbot) on Oct 22, 2024 at 08:04 UTC
|
Sometimes breaking a big program into smaller programs focused into doing one thing, can work nicely and be able to deal better with future long-calls. E.g. separate this call into a new script which does exactly that and when is done, saves the results in a redis db whose checking takes millis. Or use some kind of "bus" to connect them all. D-BUS for example.
Edit: additionally what LWP gets back is not a perl data structure or object, so you will not need to de/serialiase when passing that data around.
| [reply] |
|
This is for a modular framework with a webserver and configurable workers. The workers automatically have database connections and Net::Clacks access.
The workers and non-forking, single threaded. And i prefer to keep it that way to simplify development and debugging. And no, using something like redis is definitely not an option, a lot of my stuff runs on embedded cash registers that only sport 4-8 gigs of RAM...
| [reply] |
|
This is for a modular framework with a webserver and configurable workers.The workers automatically have database connections and Net::Clacks access.
That's great. You already have a database and a communication bus, exactly what functionality redis/D-Bus, in my suggestion, would have provided. The crucial question is why, if and when the workers block (as workers do regularly block), the whole system blocks? I would decouple the workers from the "system", whatever that is, even more. Use the bus to send requests to the LWP worker (edit: LWP worker script) which posts its response, when it comes, to the bus+db and poll the bus (non-blocking) at the supervisor. I guess it is simpler for me to say these than to implement or adjust an already implemented system.
| [reply] [d/l] |
Re: LWP::UserAgent non-blocking calls
by Marshall (Canon) on Oct 22, 2024 at 09:46 UTC
|
| [reply] |
|
I have still to write tests for all recommendations and make my final decision, but HTTP::Async, at least from its POD, seems like it could fulfill my needs nicely.
Thanks!
| [reply] |
Re: LWP::UserAgent non-blocking calls
by sectokia (Friar) on Oct 31, 2024 at 21:51 UTC
|
My advice is is to learn event based programming and switch over to a fully async operation based on call backs and promises.
For example what you are doing is very easy to achieve with:
AnyEvent
AnyEvent::HTTP
Promise
| [reply] [d/l] |
|
I do a lot of "event based" programming and async stuff. But it is NOT always the best option. I work on a lot of stuff where things are required to happen in a very specific order to make sure data is consistent across multiple systems and complies with local financial law.
Async operations have their uses, but they are not always the best solution.
| [reply] |
Re: LWP::UserAgent non-blocking calls
by Bod (Parson) on Oct 23, 2024 at 16:38 UTC
|
without forking
No help with your question...sorry...
But...I'm curious why forking is not an option
| [reply] |
|
Open file handles, open database transactions and (other) open network connections. For forking to be clean without side effects, i would need to close all those, fork and then reopen them in the parent process.
My worker processes are cyclic executives that run many modules, each doing it's own thing. The main loop is basically something like this:
...
##### MAIN TIMING LOOP #####
sub run($self) {
my $runok = 0;
eval {
# Let STDOUT/STDERR settle down first
sleep(0.1);
my $nextCycleTime = $self->{config}->{mincycletime} + time;
while(1) {
my $workCount = $self->{worker}->run();
my $now = time;
if($now < $nextCycleTime) {
my $sleeptime = $nextCycleTime - $now;
#print "** Fast cycle ($sleeptime sec to spare), sleep
+ing **\n";
sleep($sleeptime);
$nextCycleTime += $self->{config}->{mincycletime};
#print "** Wake-up call **\n";
} else {
#print "** Slow cycle **\n";
$nextCycleTime = $self->{config}->{mincycletime} + $no
+w;
}
}
$runok = 1;
};
if(!$runok) {
suicide('RUN FAILED', $EVAL_ERROR);
}
return;
}
...
#### WORKER CYCLE ####
sub run($self) {
my $workCount = 0;
# Run cleanup functions in case the last cycle bailed out with cro
+ak
foreach my $worker (@{$self->{cleanup}}) {
my $module = $worker->{Module};
my $funcname = $worker->{Function} ;
#$workCount += $module->$funcname();
$module->$funcname();
}
# Notify all registered workers about dead children
while((my $child = shift @deadchildren)) {
foreach my $worker (@{$self->{sigchld}}) {
my $module = $worker->{Module};
my $funcname = $worker->{Function} ;
$workCount++;
$module->$funcname($child);
}
}
# Run all worker functions
foreach my $worker (@{$self->{workers}}) {
my $module = $worker->{Module};
my $funcname = $worker->{Function} ;
$workCount += $module->$funcname();
}
# Run cleanup functions
foreach my $worker (@{$self->{cleanup}}) {
my $module = $worker->{Module};
my $funcname = $worker->{Function} ;
#$workCount += $module->$funcname();
$module->$funcname();
}
return $workCount;
}
...
It may not be very elegant, but it's simple, easy to debug and easy to balance worker loads by moving module configurations between different XML config files. Yeah, the stuff dynamically configures itself on startup.
Yes, depending on strict limits to the configuration of a specific worker, i can support forking. But this makes it a pain to work with, which is why i have been going to full "use non-blocking stuff" route for the last decade in my workers. Most of the stuff is low level protocols or web APIs that are already fire-and-check-back-later. I haven't encountered a web API that blocks for up to a minute in like 15 years, so i was unaware which modules were available to solve/circumvent that specific problem.
| [reply] [d/l] |