rsriram has asked for the wisdom of the Perl Monks concerning the following question:

Hi, In a text file, if there are any listed structures, the start and end will be marked by <bl>..</bl>. All the lines between these elements should be tagged as <listitem>..</listitem>. I wrote a code as below:

if (m|<bl>| .. m|</bl>|) { s/^(?!<bl>)|(?<=<bl>)/<listitem>/g; s-(?=</bl>)|(?<!</bl>)(?=\n)-<\/listitem>-g; }

This is working fine if the structure is plain like below:

<bl>List item 1 List item 2 List item 3</bl>

But, I also encounter instances where <bl> appears within a <bl> like below:

<bl>List item 1 List item 2 <bl>List item a List item b List item c</bl> List item 3</bl>
If I run the above text with my code, the conversion fails. I intend to get the output as follows
<listitem>List item 1</listitem> <listitem>List item 2</listitem> <listitem>List item a</listitem> <listitem>List item b</listitem> <listitem>List item c</listitem></listitem> <listitem>List item 3</listitem>

All the entires of the sublist should be tagged and later the end tag of item 2 should be inserted. Also after finishing the sublist, tagging for the main list should continue until </bl> for the main list is encountered. I could not derive a structure to the script for this. Could someone help me on this?

Replies are listed 'Best First'.
Re: Help in using Regular Expressions
by cdarke (Prior) on Jun 12, 2008 at 07:37 UTC
    I came up with the following:
    use warnings; use strict; while (<DATA>) { s|^(?:<bl>)?(.*?)(?:</bl>)?$|<listitem>$1</listitem>|; print } __DATA__ <bl>List item 1 List item 2 <bl>List item a List item b List item c</bl> List item 3</bl>
    It does not produce exactly the intended output, but:
    <listitem>List item 1</listitem> <listitem>List item 2</listitem> <listitem>List item a</listitem> <listitem>List item b</listitem> <listitem>List item c</listitem> <listitem>List item 3</listitem>
    Which, to me at least, looks more likely to be useful.
Re: Help in using Regular Expressions
by Anonymous Monk on Jun 12, 2008 at 07:29 UTC
    Here's an idea
    $_ = q~<bl>List item 1 List item 2 <bl>List item a List item b List item c</bl> List item 3</bl> ~; s~</?bl>~~g; s~(List.+?)$~<listitem>$1</listitem>~gm; print $_,$/; __END__ <listitem>List item 1</listitem> <listitem>List item 2</listitem> <listitem>List item a</listitem> <listitem>List item b</listitem> <listitem>List item c</listitem> <listitem>List item 3</listitem>
Re: Help in using Regular Expressions
by Anonymous Monk on Jun 12, 2008 at 07:35 UTC
    Since look ahead/behind are zero width, your substitution will never eliminate any <bl> tags, it will just insert listitem tags
Re: Help in using Regular Expressions
by pc88mxer (Vicar) on Jun 12, 2008 at 14:20 UTC
    All the entires of the sublist should be tagged and later the end tag of item 2 should be inserted. Also after finishing the sublist, tagging for the main list should continue ...
    I have a feeling that the OP wants this:
    <listitem>List item 1</listitem> <listitem>List item 2 <listitem>List item a</listitem> <listitem>List item b</listitem> <listitem>List item c</listitem> </listitem> <listitem>List item 3</listitem>
    (indentation added for emphasis.) Is this correct?