Hi,

Ok, this is your standard type of problem that perl should be good at. I'm trying to see a way to solve it without using HTML::Parser. I've got a string that may or may not include some HTML formatting, and the string is broken up into sections by double-dashed lines (think of breadcrumb links)

This is a -- string of -- words
<b>This is a -- string of -- words</b>
This <b>is a -- string</b> of -- words

So, the first one is plain text, the second is wrapped completely in a tag, and the third has a few words wrapped in the bold tag which spans the first and second sections.

And do note that the bold is just an example, it might be some other type of formatting such as a style tag or a font change or whatever, I do know it has a start tag and an end tag, and that's all.

Also, I don't know how many dashes, if any will be there.

Now, I need to break that into three links (split by the double dashed line), in a kind of "breadcrumb" type of link. So, looking at the third example, and just the first section, by splitting on the double-dash I'm also splitting up the start and end tags, so I'll need to supply the closing tag after the word "a":

This <b>is a -- string</b> of -- words
Would result in (for the first segment):
This <b>is a</b>
I can do this by tokenizing and keeping track of what words are in the tag and what words are not, but I'm wondering how others would attack this problem. Thanks,

In reply to A nice text processing question by moseley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.