in reply to How do I extract a text between some delimiters

If you can be sure that it will also be in the  <NPx> .. </NP> format, you can use this pseudo-code:

split the sentence on  <NP\d> and  </NP>; remove tags on the even elements; join the sentence elements again.

@parts = split(/<NP.*?>|<\/NP>/, $sentence); $even = 0; for $part (@parts) { if ($even) { $part =~ s/\/\w\w//g; $part .= '/NP'; } $even = ! $even; } $result = join('', @parts);

Untested!

Replies are listed 'Best First'.
Re: Re: How do I extract a text between some delimiters
by fglock (Vicar) on Sep 17, 2002 at 15:43 UTC

    Ok. Just replace this

    $part =~ s/\/\w\w//g;

    by this

    $part =~ s/\/\w{2,3}//g;

    jkahn's solution below doesn't have this problem. Using his solution it would be:

    $part =~ s!/\S*!!g;
Re: Re: How do I extract a text between some delimiters
by Anonymous Monk on Sep 17, 2002 at 15:36 UTC
    Hi Monk
    I tested the code and it works well when the tag has only two characters, but it can also has three character

    the following was the output for one sentence:

    <S> What/WP was/VBD the monetary value /NP of/IN the NobelP PeaceP PrizeP /NP in/IN 1989 /NP ?/./*end-of-sentence*</S>

    The original input was:
    <S> What/WP was/VBD <NP5> the/DT monetary/JJ value/NN </NP> of/IN <NP6> the/DT Nobel/NNP Peace/NNP Prize/NNP </NP> in/IN <NP7> 1989/CD </NP> ?/./*end-of-sentence*</S>
    I tried to change the code but it did not work for two or three How I evaluate the case for two "OR" three characters
    Thanks