adambot has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying to use perl to parse http://puzzledragonx.com/en/monster.asp?n=1645 however, i can't seem to get Mojo::DOM to work properly... Here is the code i currently have:
#!/usr/bin/perl use warnings; use strict; use diagnostics; use LWP::Curl; use Mojo::DOM; use Data::Dumper; my $lwpcurl = LWP::Curl->new(); my $content = $lwpcurl->get('http://puzzledragonx.com/en/monster.asp?n +=1645'); my $dom = Mojo::DOM->new($content); my $result = $dom->at('#tableprofile > tbody > tr:nth-child(1) > td.da +ta')->text; print Data::Dumper::Dumper($result);
with out the ->text i get undef, with the ->text i get an error saying that you can't get text from an undefined item. I'm trying to use the css selector that i got from chrome and firefox (where the chrome one is above and the firefox selector is: html.js body div#wrapper div#main div#right div#content table tbody tr td.section div div#compareprofile table#tableprofile tbody tr td.data but either way i never get any results

Replies are listed 'Best First'.
Re: CSS Selector in Perl
by Corion (Patriarch) on Dec 20, 2015 at 10:46 UTC

    Maybe the page uses Javascript to create some nodes?

    Try to look at the page as LWP::Curl sees it instead of using a browser.

Re: CSS Selector in Perl
by Anonymous Monk on Dec 20, 2015 at 10:47 UTC

    Employ Basic debugging checklist and break it down

    See what at('#tableprofile') returns, then what '#tableprofile > tbody' returns....

    Thats programming :)

      #tableprofile returns nothing at all, so i must be having a syntax error there but after reading the cpan page, i'm not sure what the proper syntax is -- thanks for reminding me to break it down!!
Re: CSS Selector in Perl
by kcott (Archbishop) on Dec 20, 2015 at 23:25 UTC

    G'day adambot,

    In CSS, #identifier indicates a unique identifier which, in HTML, would be represented by an attribute that looks like id="identifier".

    From http://www.w3.org/TR/css3-selectors/#id-selectors:

    Document languages may contain attributes that are declared to be of type ID. What makes attributes of type ID special is that no two such attributes can have the same value in a conformant document, regardless of the type of the elements that carry them; whatever the document language, an ID typed attribute can be used to uniquely identify its element. In HTML all ID attributes are named "id"; XML applications may name ID attributes differently, but the same restriction applies.

    I viewed the source of http://puzzledragonx.com/en/monster.asp?n=1645 and searched for 'tableprofile'. I stopped searching when I found a second id="tableprofile" (there might be more). Given that '#tableprofile' should be a unique identifier, but isn't, may be the root cause of your problem.

    I'm not a user of Mojo::DOM; however, I see from its DESCRIPTION:

    "... It will even try to interpret broken HTML ..."

    Accordingly, you may be able to find a workaround by formulating a selector which does not include '#tableprofile'. Do note: that's just a guess on my part!

    Furthermore, I'd be questioning any other supposedly unique identifiers in selectors provided by a browser. A simple search of the HTML source (as I did) is a quick and easy way to do this.

    The full W3C specification is "Selectors Level 3" and is a fairly lengthy document. I rarely need to reference more than the "Summary Table of Selectors": there's plenty of links to more information if you need them.

    — Ken