In the example you give, a regular expression will probably do what you want, because it is very unlikely that a document will contain two TITLE elements. However, in a slightly different example, e.g., if we were looking for certain text in a CAPTION element, then the regular expression that works for your example might fail, if the text in question occurs between two of the elements in question but not within either of them. It is possible to work around that with a much more complicated regular expression, but it's hairy, and it will still fail if the element in question can be nested within itself, either directly or indirectly. In such cases, you really need to use a module that parses the SGML and hands you a DOM. HTML::TreeBuilder and XML::Twig make this sort of thing easy for HTML and XML respectively, and there are various alternatives to them as well. I don't know as much about SGML modules, since I've never worked much with SGML (except for legacy versions of HTML that were SGML-based), but you might check the CPAN.

Of course, if the example you gave is really all you want to do, then you may not need a parser, since the regex will probably be good enough.


Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.

In reply to Re: //s modifier by jonadab
in thread //s modifier by kettle

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.