in reply to Continuing after replacing nth occurrence

I agree with GrandFather's advice to not parse HTML with regexes, in general.  That being said, however, it's sometimes helpful to understand why some approach didn't work, as well as what would have been a way around the problem.

So, the thing with your code is the s/// operator not being aware of the incremental search position that you're trying to handle with the /c option of the outer matches. I.e., it always starts from the beginning of the string (even if you attempt to save/restore the current position using $pos = pos($tbltext); before, and pos($tbltext) = $pos; after the substitution code)...

One way around the problem would be to move the column substitution code into a subroutine, which you then simply call in a normal repeated (/g) substitution;

my $tbltext = qq(<tr><td>a1</td><td>a2</td><td>a3</td><td>a4</td></tr> +<tr><td>b1</td><td>b2</td><td>b3</td><td>b4</td></tr><tr><td>c1</td>< +td>c2</td><td>c3</td><td>c4</td></tr>); sub fix_column { my $s = shift; my $nth = shift; my $counter = 0; $s =~ s/(<td.*?\/td>)/ ++$counter == $nth #is this the nth cell? ? "${1}newtext" #if yes, add "newtext" : $1 #otherwise, leave it /ge; return $s; } $tbltext =~ s/(<tr.*?>)(.*?)(<\/tr>)/$1.fix_column($2, 3).$3/ge; print "$tbltext\n"; __END__ <tr><td>a1</td><td>a2</td><td>a3</td>newtext<td>a4</td></tr><tr><td>b1 +</td><td>b2</td><td>b3</td>newtext<td>b4</td></tr><tr><td>c1</td><td> +c2</td><td>c3</td>newtext<td>c4</td></tr>

BTW, are you sure you want to insert the text in between the cells, not within?

(P.S.: I know, the <tr>...</tr> matching regex is not perfect... but I deliberately kept it simple for this demo.)

Replies are listed 'Best First'.
Re^2: Continuing after replacing nth occurrence
by knlst8 (Initiate) on Mar 10, 2009 at 11:17 UTC
    Thanks for your reply. You're right that I was wondering why my approach wouldn't work, in addition to needing to find a working approach. I think I understand now. And yes, I did intend to insert text in between cells. Odd, I know. I can't explain it without describing the whole project in detail, so I'm going to leave you wondering about that one. :)