alpha-lemming has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks, I am trying to extract a list of URLs for all subdomains of the domain "foo.com" from a very messy DB dump. Here's my code and some sample data:
while (<DATA>) { my @urls = ( $_ =~ /(https?\:\/\/.*?\.foo\.com)/g); foreach my $url (@urls) { print "$url\n" if $1; } } __DATA__ http://www.foo.com/fishnuts http://smtp.foo.com https://www.foo.com/?(bunch-of-stuff):{}https://sv +n.foo.com/docs https://yahoo.de/?search:{width}-https://www.foo.com https://google.com https://foo.com:(More-random-stuff)https://yahoo.de +::http://pubdocs.foo.com/top/index.html
The desired result would be:
However, that regex does not work with the last two lines of data, as it also matches starting from the first "http". I tried this:http://www.foo.com http://smtp.foo.com https://www.foo.com https://svn.foo.com https://www.foo.com https://foo.com http://pubdocs.foo.com
/(https?\:\/\/(?!http.)*?\.foo\.com)/g
But get "matches null string many times in regex"
Thanks for any help!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex: Extract base URL for a specific domain
by hdb (Monsignor) on Aug 29, 2013 at 14:00 UTC | |
by alpha-lemming (Novice) on Aug 29, 2013 at 14:51 UTC | |
|
Re: Regex: Extract base URL for a specific domain
by MidLifeXis (Monsignor) on Aug 29, 2013 at 13:47 UTC | |
|
Re: Regex: Extract base URL for a specific domain
by daxim (Curate) on Aug 29, 2013 at 14:01 UTC | |
by alpha-lemming (Novice) on Aug 29, 2013 at 14:53 UTC |