rsriram has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a markup file, in which there is a element like <ins cnt="#">. Every time I encounter this element, it has to be replaced with <TAG>, the number of times specified in the attribute.

For example, if the tag is <ins cnt="4">, the output file should have <TAG><TAG><TAG><TAG>. This ins appears several times in the input file. I am reading through the file and used a for loop as:

$file =~ /<ins cnt="([^>]+)">/g;
for ($x=0; $x != $1; $x++) {
print F2 <TAG>;
}

But this is not producing the result I needed. Can anyone help me with a syntax/logic for this replacement

Replies are listed 'Best First'.
Re: Recursive insertion of tags
by gellyfish (Monsignor) on Jul 20, 2006 at 11:05 UTC

    You could do it in a simple subtitution:

    $file =<<EOF; <ins cnt="4"> blah <ins cnt="2"> EOF + $file =~ s/<ins cnt="(\d+)">/"<TAG>" x $1/egs; + print $file;
    Note the /e modifier to the substitution that permits the evaluation of code in the RHS.

    /J\

Re: Recursive insertion of tags
by swkronenfeld (Hermit) on Jul 20, 2006 at 14:52 UTC
    gellyfish's code is the way to solve this problem, but I'll point out a couple mistakes in your code to help you avoid them in the future.

    print F2 <TAG>;
    You want to put TAG in quotes. The line should be print F2 "<TAG>";

    This is something that using warnings would have helped you catch. Your program is attempting to read a line from the filehandle TAG, and print that to filehandle F2.
    # ./test.pl Name "main::TAG" used only once: possible typo at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9.

    Also, although it isn't broken in your example, your regular expression can use some work. You are matching anything that isn't ">", and then using it in a numerical comparison. This will be a problem if you capture something non-numeric. A better idea would be to write your regex like this:
    if($file =~ /<ins cnt="(\d+)">/) { for ($x=0; $x < $1; $x++) { print F2 "<TAG>"; } }
    What I changed:
    • Matching on digits only for the count.
    • If $line doesn't match the pattern, your code doesn't attempt to use $1 in a for loop.
    • I removed the g modified from your regex, as I don't think you intended for it in this case.
    • Nitpicking: I changed the for loop condition from $x != $1. It does not matter for this example, but it's less likely to get caught in an infinite loop when you are doing more complex things (like possibly modifiying $x inside your loop. Note that there are more Perlish ways of writing this, including print F2 "<TAG>" for(1 .. $1).