in reply to Re^3: Remove text between two Start and End Tags (Regex)
in thread Remove text between two Start and End Tags (Regex)

One more question related to this. Is there a way to try and find a variable outside of a start and end tag and then replace this with something new? So the start sentence would look like this,

"The increase in sensitivity of HIV - infected cells to <GENE> Fas </GENE> killing mapped to <GENE> vpu </GENE> , while nef , <GENE> vif </GENE> , <GENE> vpr </GENE> , and second exon of <GENE> tat</GENE> did not appear to contribute"

And end up looking like this:

"The increase in sensitivity of HIV - infected cells to <GENE> Fas </GENE> killing mapped to <GENE> vpu </GENE> , while <PGENE> nef </PGENE> , <GENE> vif </GENE> , <GENE> vpr </GENE> , and second exon of <GENE> tat</GENE> did not appear to contribute"

Notice the addition of a PGENE tag following the match of nef (outside of the start and end gene tags) to a variable.

I appreciate this might be slightly confusing, but at the moment I'm racking my brain trying to figure out a way to join it all back together correctly if I do this using lots of splits.

Thanks

  • Comment on Re^4: Remove text between two Start and End Tags (Regex)

Replies are listed 'Best First'.
Re^5: Remove text between two Start and End Tags (Regex)
by choroba (Cardinal) on Apr 19, 2011 at 21:26 UTC
    Lots of splits? Use just one:
    my $s = 'vif The increase in sensitivity of HIV - infected cells to <G +ENE> vif Fas </GENE> killing mapped to <GENE> vpu </GENE> , while nef + , vif, <GENE> vif </GENE> , <GENE> vpr </GENE> , and second exon of +<GENE> tat</GENE> did not appear to contribute'; my @ar = split m%(</?GENE>)%, $s; for my $i (0 .. @ar/4) { $ar[4*$i]=~ s%vif%<PGENE>vif</PGENE>%; } $s = join q[],@ar; print "$s\n";