OK two things, thanks very much for your input, it's really appreciated.

marius I'm not sure that any of my regexes can get by without the /s modifier though, why did you say that?It's almost my one concession to the absolute vagueness of HTML practice that I use it. You're right about keeperlength, I failed to cut and paste that from the actual code, and yes, you're right about a hash being better. Just habit on my part, that "by twos" thing. Thanks.

davorg, yes, of course you're right. I think there's a psychological reason why people like me want to do it the "hard" way rather than using a module, but I've learnt something from this. I want to put on record my complete stupidity though, which will chime nicely with the "use a module, dummy" refrain.

My variable $html didn't revert at all, what happened in that I had an html file in which someone had foolishly put two titles, really far apart, so that when I did the

($pagetitle) = $html =~ /<TITLE>(.*)<\/TITLE>/sgi;

thing, it actually pulled out nearly the whole document!

The final output, as you can see, consisted of the title, then the document, but the title, due to the somewhat random HTML, was the document.

All I can do is apologise for wasting your time and try to get more sleep and be more sensible in future. And use

($pagetitle) = $html =~ /<TITLE>(.*?)<\/TITLE>/sgi;

instead... Your humble servant h17


In reply to Re: Re: Harvesting and Parsing HTML from other sites by hostile17
in thread Harvesting and Parsing HTML from other sites by hostile17

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.