in reply to s/// only replacing most of the time...

Have you considered that maybe you are doing this the hard way? At the very least HTML::Strip may make your life easier without having to make highly complex regexen.

Update never mind...
I just realized you are adding the span tags yourself, so you probably want them to stay. Have you considered just marking all the text positions you need and then going by from the right hand side of the string back putting in your tags. This is pretty simple unless you have overlaps, and even then you could do it easier.

For instance, keep a list of tags and their positions in the original string, storing the start and end tags, then strip off everything in the original string until the next tag, print the string, than print any tags that belong at that position, and go to the next position... FAR easier than re-iterating over and over again...

Here is an example

$_ = 'TTTDDDTTTTTTX'; my @patterns = qw(TTT DDD TD TTTTTT); my %tags; for my $pat (@patterns) { while(/($pat)/g) { print "$pat = $-[0] $+[0]\n"; push @{$tags{$-[0]}}, "<SPAN CLASS=$pat>"; push @{$tags{$+[0]}}, "</SPAN>"; } } my $currentpos = 0; print "$_\n"; for my $pos (sort { $a <=> $b } keys %tags) { print substr($_,0,($pos-$currentpos),''); print join('',@{$tags{$pos}}); $currentpos = $pos; } print $_,"\n";

                - Ant
                - Some of my best work - (1 2 3)

Replies are listed 'Best First'.
Re^2: s/// only replacing most of the time...
by mdunnbass (Monk) on May 29, 2007 at 15:34 UTC
    I have given that some though. I just haven't worked out in my head how to deal with overlaps. However, I do agree that the way I'm doing it now is really really overblown and silly.

    One thing I noticed though, was the output of your code:

    TTT = 0 3 TTT = 6 9 TTT = 9 12 DDD = 3 6 TD = 2 4 TTTTTT = 6 12
    It shows the TTT bit matching to 6-9 and 9-12. Ideally, I would need that to match 6-9, 7-10, 8-11, and 9-12. No biggie though. Let me play around with it a bit, and see how it works.

    Thanks
    Matt

      A little bit of smoke and mirrors, then...
      $_ = 'TTTDDDTTTTTTX'; my @patterns = qw(TTT DDD TD TTTTTT); my %tags; for my $pat (@patterns) { my $pat2 = substr($pat,0,1).'(?='.substr($pat,1,length($pat)).')'; + # T(?=TT) only REALLY matches the first char, but that still works while(/($pat2)/g) { print "$pat = $-[0] $+[0]\n"; push @{$tags{$-[0]}}, "<SPAN CLASS=$pat>"; push @{$tags{$-[0]+length($pat)}}, "</SPAN>"; # instead of usi +ng $+[0] we use $-[0]+the length of the string matched, since it is a + fixed string, no problems. } } my $currentpos = 0; print "$_\n"; for my $pos (sort { $a <=> $b } keys %tags) { print substr($_,0,($pos-$currentpos),''); print join('',@{$tags{$pos}}); $currentpos = $pos; } print $_,"\n";

                      - Ant
                      - Some of my best work - (1 2 3)