monkeybus has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, it's been a long time since this kitchen table hobbyist asked you all a question.

Suppose I wants to parse a text file and extract all URL's that begin with  http:// and finish with .mp3

How would I go about doing that?

It is just the regrep that is having me tear my hair out, what is left of it.

Thanks a lot, folks.

Replies are listed 'Best First'.
Re: Simple regrep question
by Corion (Patriarch) on Jun 26, 2016 at 07:46 UTC

    Maybe just look at HTML::LinkExtor?

    Depending on how your URLs are structured, maybe the following will also suffice:

    m!\b(http://.*?\.mp3)\b!

    But that would also capture sequences of URLs like

    http://example.com/ this is some text about my .mp3 files.

      Easily fixed by excluding whitespace:

      #! perl use strict; use warnings; for ('Look here: http://example1.com/cool.mp3 for a great listen!', 'http://example.com/ this is some text about my .mp3 files.') { print "Match: $1\n" if m!\b(http://\S*?\.mp3)\b!; }

      Output:

      18:26 >perl 1664_SoPW.pl Match: http://example1.com/cool.mp3 18:26 >

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Simple regrep question
by ww (Archbishop) on Jun 26, 2016 at 12:34 UTC

    Your SOPW would be more in line with local standards if you presented your (newest||best) code rather than merely saying "(i)t is just the regrep that is having me tear my hair out....

    If you present the failing code, we can disect that...providing information that will serve you better in the future than that info embedded in code you can copy.


    Questions containing the words "doesn't work" (or their moral equivalent) will usually get a downvote from me unless accompanied by:
    1. code
    2. verbatim error and/or warning messages
    3. a coherent explanation of what "doesn't work actually means.
Re: Simple regrep question
by BillKSmith (Monsignor) on Jun 27, 2016 at 03:27 UTC
    I recommend using a "canned" regex to extract all http url's and then use grep to select the .mp3's.
    use strict; use warnings; use Regexp::Common qw /URI/; my $string = "xxx http://perlmonks.com yyy http://foo.mp3 zzz"; $_ = $string; my @urls = grep {/\.mp3$/} /($RE{URI}{HTTP})/g; local $" = "\n"; print "@urls\n"; OutPUT: http://foo.mp3

    This approach should find all valid http url's with no false matches while saving your hair.

    Bill