Recursive insertion of tags

rsriram has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a markup file, in which there is a element like <ins cnt="#">. Every time I encounter this element, it has to be replaced with <TAG>, the number of times specified in the attribute.

For example, if the tag is <ins cnt="4">, the output file should have <TAG><TAG><TAG><TAG>. This ins appears several times in the input file. I am reading through the file and used a for loop as:

$file =~ /<ins cnt="([^>]+)">/g;
for ($x=0; $x != $1; $x++) {
print F2 <TAG>;
}

But this is not producing the result I needed. Can anyone help me with a syntax/logic for this replacement

Comment on Recursive insertion of tags Select or Download Code

Replies are listed 'Best First'.
Re: Recursive insertion of tags by gellyfish (Monsignor) on Jul 20, 2006 at 11:05 UTC
You could do it in a simple subtitution: `$file =<<EOF; <ins cnt="4"> blah <ins cnt="2"> EOF + $file =~ s/<ins cnt="(\d+)">/"<TAG>" x $1/egs; + print $file;` [download] Note the /e modifier to the substitution that permits the evaluation of code in the RHS. /J\	[reply] [d/l]
Re: Recursive insertion of tags by swkronenfeld (Hermit) on Jul 20, 2006 at 14:52 UTC
gellyfish's code is the way to solve this problem, but I'll point out a couple mistakes in your code to help you avoid them in the future. `print F2 <TAG>;` You want to put `TAG` in quotes. The line should be `print F2 "<TAG>";` This is something that using warnings would have helped you catch. Your program is attempting to read a line from the filehandle TAG, and print that to filehandle F2. `# ./test.pl Name "main::TAG" used only once: possible typo at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9. readline() on unopened filehandle TAG at ./test.pl line 9.` [download] Also, although it isn't broken in your example, your regular expression can use some work. You are matching anything that isn't ">", and then using it in a numerical comparison. This will be a problem if you capture something non-numeric. A better idea would be to write your regex like this: `if($file =~ /<ins cnt="(\d+)">/) { for ($x=0; $x < $1; $x++) { print F2 "<TAG>"; } }` [download] What I changed: Matching on digits only for the count. If $line doesn't match the pattern, your code doesn't attempt to use $1 in a for loop. I removed the `g` modified from your regex, as I don't think you intended for it in this case. Nitpicking: I changed the for loop condition from `$x != $1`. It does not matter for this example, but it's less likely to get caught in an infinite loop when you are doing more complex things (like possibly modifiying $x inside your loop. Note that there are more Perlish ways of writing this, including `print F2 "<TAG>" for(1 .. $1)`.	[reply] [d/l] [select]