Shout out to
Web::Scraper
. It takes some time to wrap your head around it, but it's pretty good for writing robust scrapers. Relying on the DOM itself makes your scraper very brittle, especially if this is an HTML source that you do not control.
In reply to
Re: getting text from HTML
by
perlfan
in thread
getting text from HTML
by
IB2017
Title:
Use:
<p> text here (a
p
aragraph) </p>
and:
<code> code here </code>
to format your post, it's "
PerlMonks-approved HTML
":
Posts are HTML formatted.
Put
<p> </p>
tags around your paragraphs. Put
<code> </code>
tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read
Where should I post X?
if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
How do I compose an effective node title?
How do I post a question effectively?
Markup in the Monastery
Posts may use any of the
Perl Monks Approved HTML tags
:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
For:
Use:
&
&
<
<
>
>
[
[
]
]
Link using PerlMonks shortcuts!
What shortcuts can I use for linking?
See
Writeup Formatting Tips
and other pages linked from there for more info.