I have a string which can contain several tags and I have an input list of tag names. Content which is not between tags shall always be kept. Content within a tag shall only be kept if the tag is active, i.e. is in the given tag list. Attention: tags can be nested.

my $str = 'word1 <tag0> word2 <tag1>word3 word4</tag1> word5 </tag0> w +ord6 <tag2>word7 word8</tag2> word9 <tag3>word10</tag3> word11'; ## Examples: # @tags = ('tag0', 'tag1'): 'word1 word2 word3 word4 word5 word6 word9 + word11' # @tags = ('tag3'): 'word1 word6 word9 word10 word11' # @tags = ('tag1', 'tag2', 'tag3'): 'word1 word6 word7 word8 word9 wor +d10 word11'

Dependent on the given tags you can see the desired results in the commented examples above.

Now I ask you because I'm confused how to solve it. It is neither XML nor is it HTML. It's a string with tags. Would you recommend regular expressions or any module from CPAN? It would be very kind of you if you could give me some advice. Thank you very much!

And it would be cool if the string would be denied if the tagging is invalid, e.g. if a start tag has no end tag or something like that. But this would be bonus.


In reply to Parsing string with tags by Dirk80

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.