in reply to Fast efficient webpage section monitoring

Thank you all for you input, and for the words of wisdom related to the ethics of all this. It has me thinking for sure.
From the top down:

Your Mother, despite your obvious disagreement with any automation of this process (and I do see your point, although this is not quite a matter of cheating for money, but for work, which is quite different), regardless, you've come forward with a very interesting proposal with Mechanize::Firefox and I want to thank you for your open-mindedness.
Now I think of it, the webpage with the available translations does change when a new one is available, and no reload is required (nor running periodically as far as I can tell) so by looking into the way the site is achieving this I should be able to determine a better monitoring option, am I correct? And this will no doubt involve Javscript as you said from the start.

Marshall I did know there were APIs for clients, as translators get specific instructions not to attempt to message clients when they use the API. But I hadn't considered how this might mean the company could provide APIs for translators. As Your Mother pointed out, the system probably works just fine as far as they are concerned without the API: where as some of the clients might not use the service without, translators will kind of accept whatever is available I suppose. The only options they provide is an RSS feed which is slower to update than the webpage (doesn't seem to make sense to me but I checked it and there is sometimes not even time for the RSS feed to show a new translation before it's gone, where as the webpage shows it), or an e-mail system, which as you can imagine is even slower than the RSS.

BrowserUK thank you for all the precious leads concerning the head, I never knew there were so many potential items to an HTML header! Unfortunately this one is disappointingly bare: response code, content-type, date (which is almost always request date as the page is dynamic), location, server and that's it!

flexvault thank you for the idea, but I'm not sure how I could balance the timer with the condition of having to be as fast as possible on getting the information from the site. Any "wait" is basically a hole in which an update could be missed right? But I do understand that if resources were really getting eaten up, I would have to introduce a timer to give the system time to "relax".

Let me thank you all again, you are very helpful, and I'd like to say also that I do hear and respect your objections. They're not lost on me.

Regards
Mark.
  • Comment on Re: Fast efficient webpage section monitoring

Replies are listed 'Best First'.
Re^2: Fast efficient webpage section monitoring
by Your Mother (Archbishop) on Apr 03, 2016 at 14:50 UTC

    Like Marshall, I am also impressed by your reply and attitude.

    I am a bit of a WWW::Mechanize expert but not so with WWW::Mechanize::Firefox which I am not sure I ever even used. I have reached for WWW::Selenium but it's been probably four years since I did much with it and my impression is WMF will be more direct and maybe easier; Corion is likely to help you here if you get stuck on some point. That said, I'm reminded that the Selenium IDE will record your interactions and write them into scripts for you. Though apparently they have split Perl off and you have to download it separately now. :( http://www.seleniumhq.org/download/

    In any case, this code won't be easy unless your Perl, your JS, and your HTML/HTTP chops are solid. Like most web programming, no single part of it is hard but the coagulation of a thousand points of failure makes it so.

    All modern browsers have excellent developer tools panels to help you see cause and effect while you whittle the problem to a solution. Stack Overflow is usually an excellent place to get JavaScript answers; though if they are at least slightly related to a Perl issue, they're usually welcome here.

    Update: fixed a link.

Re^2: Fast efficient webpage section monitoring
by Marshall (Canon) on Apr 03, 2016 at 12:48 UTC
    Great post. Glad to hear that you are seriously considering what is being said.

    I don't know much about AJAX, but I found this link with an explanation of what goes on AJAX is "Asychronous Java Script and XML". Click on their demo button to see a dynamic graphic loaded to an existing page without a complete page reload.

    I've only done very simple playing with Mechanize::Firefox, but I was able to talk to Firefox from Perl. The idea seems to be to let the browser run the Javascript and then monitor with the I/F what has happened. I think you can get a callback when the part of the page you are interested in changes. That way you don't have to poll Firefox, just wait for something to happen.

    If you are just watching what Firefox is doing while displaying the page, then you aren't adding any more traffic than what the webpage does on its own. This short AJAX message to update part of the page will be considerably faster than a complete page reload which I think is what you are doing now. So if done right, you should get faster answers while at the same time not generate excessive traffic to the site.

    I will defer to the Mechanize::Firefox experts, but I think this is possible. Sounds like you will have to understand the JavaScript in the page, but its not clear to me how much JavaScript you will have to write yourself. Firefox does the "heavy lifting" and you just watch what it is doing.

    I do suspect that these folks will implement procedures and possibly api's to help them manage a process that protects their brand from bad translations or undependable translators, etc. I worked for a while for a German company and all of the engineers spoke English to one degree or another. The professional translators did an amazing job on the documents. The proper English translation wound up being about 30% shorter. A computer cannot do that - its just too complicated. And the translator has to be a native English speaker.