Re: Extracting HTML content between the h tags

Replies are listed 'Best First'.
Re^2: Extracting HTML content between the h tags by vagabonding electron (Curate) on Aug 05, 2012 at 14:22 UTC
Thank you a lot! I did not know this syntax. One more question if I dare :-) In about 10 pages the last h2-tag is missing, so that I used the following workaround: `my @solution_2 = $content->findvalues( './h2[4]/preceding-sibling::' +); unless ( @solution_2 ) { @solution_2 = $content->findvalues( '//hr/preceding-sibling::' ); }` [download] I tried the same with your syntax as: `@solution_2 = $content->findvalues( '//hr/preceding-sibling::p[precedi +ng-sibling::h2[3]]' );` [download] but I get an uninitialized value only. I understood the syntax so: "search the siblings but stop if the tag in brackets appears". Is this correct? If so, what am I doing false with the above attempt? Spasibo!	[reply] [d/l] [select]
Re^3: Extracting HTML content between the h tags by Gangabass (Vicar) on Aug 06, 2012 at 01:01 UTC
According to your HTML `preceding-sibling` for `hr` will be `div` tag but not `p` tag... So this code will find all `p`s after last `h2`: `$p->findnodes('//h2[4]/following-sibling::p');` [download] Or (more flexible): `$p->findnodes('//h2[last()]/following-sibling::p');` [download]	[reply] [d/l] [select]
Re^4: Extracting HTML content between the h tags by vagabonding electron (Curate) on Aug 06, 2012 at 15:31 UTC
Thank you very much!	[reply]