in reply to Re: How to extract a pattern in Perl regex?
in thread How to extract a pattern in Perl regex?

Yes, I'm looking at the recommended methods, and that "^" was a typo.

However part of my question was how do I extract in one statement what's in between the "title tags".

The way I worked around it was:

$result = =~ /(<title>.*<\/title>)/mgi; my $newresult = $1; $newresult =~ s/<title>//i; $newresult =~ s/<\/title>//i;

Surely there's a simpler way?

Replies are listed 'Best First'.
Re^3: How to extract a pattern in Perl regex?
by marto (Cardinal) on May 01, 2020 at 09:27 UTC

    Using Mojo::DOM (pulling live data use Mojo::UserAgent):

    #!/usr/bin/perl use strict; use warnings; use feature 'say'; use Mojo::Util 'trim'; use Mojo::UserAgent; # get perlmonks my $ua = Mojo::UserAgent->new; my $dom = $ua->get('https://perlmonks.org')->res->dom; say 'Title: ' . trim( $dom->at('title')->text ); say 'Image src: ' . trim( $dom->at('img')->attr->{'src'} ); say 'Image alt: ' . trim( $dom->at('img')->attr->{'alt'} );

    Output:

    Title: PerlMonks - The Monastery Gates Image src: //promote.pair.com/i/pair-banner-current.gif Image alt: Beefy Boxes and Bandwidth Generously Provided by pair Netwo +rks

    Mojo::DOM makes parsing fun and simple.

      Mojo::DOM makes parsing fun and simple.

      Agreed, and ojo makes it even more fun ;-)

      $ perl -Mojo -e 'say g("https://perlmonks.org")->dom->at("title")->all +_text=~s/^\s+|\s+$//gr' PerlMonks - The Monastery Gates

        ojo is very nice, but I find myself more often writing programs that just aren't practical as one liners

Re^3: How to extract a pattern in Perl regex?
by hippo (Archbishop) on May 01, 2020 at 09:10 UTC
    Surely there's a simpler way?

    Just capture what you want. Let's change the task to remove the elephant in the room of parsing HTML with regex which you now know you shouldn't do. Instead suppose you want to extract everything between 'foo' and 'bar' and ignore all the rest. Here's the simple approach:

    use strict; use warnings; use Test::More tests => 1; my $in = 'abcfooHellobarxyz'; my $want = 'Hello'; my ($have) = ($in =~ /foo(.*)bar/); is $have, $want, "Extracted $want";

    The only real caveat to this is to remember to use the /s modifier if the text you are extracting might contain \n.

      Thank you! Yes, this was the main part of what I was looking for. I remember going through a rather large Perl handbook, and it ended the Regex chapter (or started it) by saying that "there is so much to Regex that whole books are written on it." I really see why now.
      ... caveat ... is to remember to use the /s modifier if the text you are extracting might contain \n.

      Simpler still is to always use  /s (along with  /x and  /m in a consistent  /xms modifier tail) on every  qr// m// s/// you write. Then the rule is simply "Dot matches all." Period.


      Give a man a fish:  <%-{-{-{-<

        We've had this conversation before. Let's agree to disagree. :-)

Re^3: How to extract a pattern in Perl regex? (updated)
by AnomalousMonk (Archbishop) on May 01, 2020 at 03:58 UTC

    c:\@Work\Perl\monks>perl -wMstrict -le "my $result = '<title>The Rain in Spain</tItLe>'; my ($newresult) = $result =~ m{ <title> (.*?) </title> }xmsi; print qq{'$newresult'}; " 'The Rain in Spain'

    Update: Or, going a step further:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(dd); ;; my $result = 'yada <title>The Rain in Spain</tItLe> blah <TITLE>How N +ow Brown Cow</TitlE> foo'; my @titles = $result =~ m{ (?i) <title> (.*?) </title> }xmsg; dd \@titles; " ["The Rain in Spain", "How Now Brown Cow"]


    Give a man a fish:  <%-{-{-{-<