in reply to Re: Parsing a large html with perl
in thread Parsing a large html with perl

OP, please do use the URL at https://wiki.oceannetworks.ca/display/O2A/API+Reference that haukex pointed out.

Note:

If you do it right, you could get a Perl client listed in there. Also, see if it'll accept the query string via POST body, be sure to set your content-type header in the request to be application/x-www-form-urlencoded. Reason is, sending your special token via GET request is gonna get it logged everywhere and it's not protected by https .. and sometimes end points will accept it just the same as a POST. If it's just http then sending it via POST if it's accepted will at least eliminate your URL from getting logged everywhere with that token in it.

If you insist on parsing the HTML and it really is just a large simple table, take a look at HTML::TableExtract.

Replies are listed 'Best First'.
Re^3: Parsing a large html with perl
by marto (Cardinal) on Jun 03, 2020 at 07:49 UTC

    Usually makes more sense to reply to OP if that is who you are addressing. Your advice assumes they have API access, which may not be the case. The Mojo solution provided can deal just as easily with a JSON response as the HTML.

Re^3: Parsing a large html with perl
by zesys (Novice) on Jun 04, 2020 at 05:16 UTC
    Thanks @perlfan. I will try your first suggestion. I admit, as a non-developer, I often find it a daunting task making sense of a JSON response.

      G'day zesys,

      Welcome to the Monastery.

      "... I often find it a daunting task making sense of a JSON response."

      You don't say what aspects of this you find daunting. Here's a few tips.

      JSON is often presented as a single string many hundreds or thousands of characters long. I typically find this impossible to read at a glance; no doubt, you do too. The solution is to format that string into a more humanly readable structure. I use "JSON Formatter and Validator" for this; if you don't like that one, there are many others available, so just search for something that better suits you.

      Now that you have a readable structure, just think of each ':' as a '=>' and you have a Perl hashref. That's a slight oversimplification but, in nearly all cases, it will hold true.

      # JSON: { "string" : "value", "array" : [ 1, 2, 3 ], "hash" : { "key1" : "val1", "key2" : "val2" } } # Perl: { "string" => "value", "array" => [ 1, 2, 3 ], "hash" => { "key1" => "val1", "key2" => "val2" } }

      The JSON syntax is actually very simple. It's described, clearly and succinctly, in "Introducing JSON".

      If you're not completely familiar with hashrefs, take a look at the Hashes section of "perlintro: Perl variable types". That section — indeed, the entirety of the perlintro page — is peppered with with links to more detailed descriptions, additional information, and more advanced, related topics: don't be put off by the idea that this page is just an introduction for complete novices.

      There's also a few gotchas which may not be immediately obvious; in some cases, they're highly unintuitive. Here's a couple that have tripped me up in the past:

      • Valid JavaScript is not necessarily valid JSON. Strings in JSON must be delimited by double-quotes, so { "answer": 42 } is valid in both. These, however, are valid in JavaScript but not in JSON: { 'answer': 42 } and { answer: 42 }.
      • In Perl, the final element in a list may be optionally followed by a comma; in JSON, that final comma is not allowed. So, [ 1, 2, 3 ] is valid in both; however, [ 1, 2, 3, ] is valid Perl but invalid JSON.

      — Ken