Hello Monks, I am trying to extract a list of URLs for all subdomains of the domain "foo.com" from a very messy DB dump. Here's my code and some sample data:

while (<DATA>) { my @urls = ( $_ =~ /(https?\:\/\/.*?\.foo\.com)/g); foreach my $url (@urls) { print "$url\n" if $1; } } __DATA__ http://www.foo.com/fishnuts http://smtp.foo.com https://www.foo.com/?(bunch-of-stuff):{}https://sv +n.foo.com/docs https://yahoo.de/?search:{width}-https://www.foo.com https://google.com https://foo.com:(More-random-stuff)https://yahoo.de +::http://pubdocs.foo.com/top/index.html

The desired result would be:

http://www.foo.com http://smtp.foo.com https://www.foo.com https://svn.foo.com https://www.foo.com https://foo.com http://pubdocs.foo.com
However, that regex does not work with the last two lines of data, as it also matches starting from the first "http". I tried this:

/(https?\:\/\/(?!http.)*?\.foo\.com)/g

But get "matches null string many times in regex"

Thanks for any help!

In reply to Regex: Extract base URL for a specific domain by alpha-lemming

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.