Jacob_Kold has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to webscrape this website: https://horsens.dk/Politik/PolitiskeUdvalg/47

The data in the table comes from some API. There is a div on the website that collects the data.

 <div id="catalog-meetings" data-jsonurl="/-/api/MeetingApi/Meetings/{0748B00B-3E46-42BE-9D24-1061A2CD345B}"></div>

Is there a way to webscrape this website with perl?

Update (10 Sep 2019, by footpad): Added <p> tags and link to site.

Replies are listed 'Best First'.
Re: Data coming from API
by Corion (Patriarch) on Aug 15, 2019 at 12:28 UTC

    Have you accessed /-/api/MeetingApi/Meetings/{0748B00B-3E46-42BE-9D24-1061A2CD345B} to see what is served there? This is easily done using (for example) HTTP::Tiny.

    Maybe if you show us your existing code and tell us where exactly you are having a problem, we can give you advice more geared towards where your actual problem llies.

Re: Data coming from API
by NetWallah (Canon) on Aug 15, 2019 at 17:57 UTC
    The data from
    https://horsens.dk/-/api/MeetingApi/Meetings/%7B0748B00B-3E46-42BE-9D24-1061A2CD345B%7D
    seems to be JSON, with some embedded HTML.
    {"comittee":{"committeeInfo":{"title":"Børne- og Uddannelses­udvalget" +,"summary":"","text":"<h2>Medlemmer</h2>\n<table bordercolor=\"#00000 +0\" border=\"1\" rules=\"rows\" frame=\"below\">\n <thead>\n + <tr>\n <th>Navn&nbsp;</th>\n <th>E-mail&nbsp; +</th>\n <th>Parti&nbsp;</th>\n </tr>\n </thead>\ +n <tbody>\n <tr>\n <td>Lone &Oslash;rsted (forma +nd)&nbsp;</td>\n <td><span style=\"text-decoration: underl +ine;\"><a hre... ... 20T14:00:00Z","year":2014,"month":1,"title":"20. januar 2014","url":"/ +Politik/PolitiskeUdvalg/53/181"}]},"filter":{"year":[{"value":"2019", +"text":"2019"},{"value":"2018","text":"2018"},{"value":"2017","text": +"2017"},{"value":"2016","text":"2016"},{"value":"2015","text":"2015"} +,{"value":"2014","text":"2014"}],"month":null,"committee":null},"labe +ls":{"yeardefaultoption":"År","monthdefaultoption":"Måned","all":"All +e","results":"resultater","committeedefaultoption":"Udvalg","title":" +Titel"},"statistics":[{"title":"TimeSpan in ms","value":"1453"}]}
    You could use something like LWP to directly fetch the JSON contents, then use JSON to parse it.

                    "From there to here, from here to there, funny things are everywhere." -- Dr. Seuss