But it is perl. Or the calling structure is perl. Now if I had just posted
lynx -dump htmlDocument.html > htmlDocument.txt
I would certainly agree with you about that not being perl. But even this does what the OP requested. I realize that lynx isn't perl, but neither are a lot of gnulinux/unix system calls that are easier and shorter than the alternative "pure" perl methods. I use whatever allows me to get the job done in the shortest amount of time with the least trouble. In the case of stripping html tags out of pages lynx works better and quicker than any regex I've seen so far. Then if there are formatting changes that need to be made, once the tags are stripped out, you can use perl to modify the document as needed.
As I said in my original post - TIMTOWTDI ;-)
Daeve
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.