I have a string which can contain several tags and I have an input list of tag names. Content which is not between tags shall always be kept. Content within a tag shall only be kept if the tag is active, i.e. is in the given tag list. Attention: tags can be nested.
my $str = 'word1 <tag0> word2 <tag1>word3 word4</tag1> word5 </tag0> w +ord6 <tag2>word7 word8</tag2> word9 <tag3>word10</tag3> word11'; ## Examples: # @tags = ('tag0', 'tag1'): 'word1 word2 word3 word4 word5 word6 word9 + word11' # @tags = ('tag3'): 'word1 word6 word9 word10 word11' # @tags = ('tag1', 'tag2', 'tag3'): 'word1 word6 word7 word8 word9 wor +d10 word11'
Dependent on the given tags you can see the desired results in the commented examples above.
Now I ask you because I'm confused how to solve it. It is neither XML nor is it HTML. It's a string with tags. Would you recommend regular expressions or any module from CPAN? It would be very kind of you if you could give me some advice. Thank you very much!
And it would be cool if the string would be denied if the tagging is invalid, e.g. if a start tag has no end tag or something like that. But this would be bonus.
In reply to Parsing string with tags by Dirk80
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |