in reply to Regular Expression Doubt

$text = '1 &plus; 2 <maths> &plus; dfdf</maths>'; for( split '\s', $text ){ if(/<maths>/.../<\/maths>/){ next }else{ if(/&plus;/){ $text =~ s/$_/&thinsp;&plus;&thinsp;/ } } } #STDOUT: 1 &thinsp;&plus;&thinsp; 2 <maths> &plus; dfdf</maths>


Replies are listed 'Best First'.
Re^2: Regular Expression Doubt
by reasonablekeith (Deacon) on Apr 05, 2005 at 13:45 UTC
    You seem to be arbitrarily splitting on spaces. This will break unless there's a clear space either side of the +. A better option would be as follows.
    my $text = '1&plus;2<maths> &plus; dfdf</maths>'; my $output = ""; while ($text =~ m/((<maths>.*?<\/maths>)|([^<]*))/gs) { if ($2) { $output .= $2; } else { my $segment = $3; $segment =~ s/&plus;/&thinsp;&plus;&thinsp;/g; $output .= $segment; } } print $output . "\n";
    The basic premise being to scoop up and ignore (push onto output) anything in <maths>, or scope up as much that can easily be determined not to be in <maths> (ie no angle brackets) and parse that before putting it on the output.
      my $text = '1 &plus; 2 <br> <maths> &plus; dfdf</maths>'; my $output = ""; while ($text =~ m/((<maths>.*?<\/maths>)|([^<]*))/gs) { if ($2) { $output .= $2; } else { my $segment = $3; $segment =~ s/&plus;/&thinsp;&plus;&thinsp;/g; $output .= $segment; } } print $output . "\n"; __END__ 1 &thinsp;&plus;&thinsp; 2 br> <maths> &plus; dfdf</maths>
      You lost a < here.
        dammit
        my $output = ""; while ($text =~ m/((<maths>.*?<\/maths>)|&plus;|.)/gs) { if ($2) { $output .= $1; } else { my $segment = $1; $segment =~ s/&plus;/&thinsp;&plus;&thinsp;/g; $output .= $segment; } } print $output . "\n";
      # in rare cases with no spaces: ... $text =~ s/(?!\s)(<maths>)/ $1/g; ...