jck has asked for the wisdom of the Perl Monks concerning the following question:

I need to strip the paragraph tags from the beginning and end of a string. i thought this would work:
$testtext =~ s/(^<p>|<\/p>$)//;
it does replace the beginning tag, but not the ending tag. i know this should be incredibly simple, but can't figure out what i'm doing wrong....

Replies are listed 'Best First'.
Re: simple replace question
by graff (Chancellor) on Jul 06, 2009 at 13:55 UTC
    If the string does not have <p> or </p> tags in the middle (or if it does and you want to get rid of those as well), you could do it like this:
    $testtext =~ s{</?p>}{}g;
    (but when removing these tags from the middle of a string, it might be better to replace them with some sort of whitespace, to avoid creating run-on words)
Re: simple replace question
by JavaFan (Canon) on Jul 06, 2009 at 13:47 UTC
    It removes <p> at the beginning, or </p> at the end of the string, but not both.

    You might want to use the /g modifier. Or what I would do:

    $testtext =~ s/^<p>//; $testtext =~ s!</p>$!!;
      JavaFan - thanks! In your second line, the "!"s should be "/"s, right?
Re: simple replace question
by poolpi (Hermit) on Jul 06, 2009 at 19:32 UTC

    If you need to deal with a HTML document:

    #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser; # Simple example : my $doc = <<END; <html> <head> </head> <body> <p>FooOooo barbar baz</p> <p>Babar Foofoo zba zba</p> <p>oofOOoof zzbb aarr</p> </body> </html> END my $p = HTML::TokeParser->new( \$doc ); while ( $p->get_tag("p") ) { my $text = $p->get_trimmed_text; print "Text: $text\n"; }


    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb
Re: simple replace question
by morgon (Priest) on Jul 06, 2009 at 17:02 UTC
    Rather then capturing the tags I would capture the content:

    $testtext =~ s|^<p>(.*?)</p>$|$1|;
      morgon,

      this looks like the best suggestion!! given the greediness of grep, this approach would ignore any internal <p> tags? that's what i want - just to strip off the first and last tag.

      thanks -
      janaki
Re: simple replace question
by stevemayes (Scribe) on Jul 06, 2009 at 17:19 UTC
    Text::Trim's function is to strip whitespace off the ends of strings (although I think that I really like morgan's approach).
Re: simple replace question
by Marshall (Canon) on Jul 07, 2009 at 08:27 UTC
    There are a number of regex's that would work here. The basic idea is to replace stuff that looks like <p> or <Xp>with nothing via a match global.
    #!/usr/bin/perl -w use strict; my $test = '<p>this is some<1p> paragraph</p>'; $test =~ s/<.?p>//g; #s/<.??p>//g; #also ok print "$test\n"; __END__ prints: this is some paragraph