Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Unable to split $ARGV[0] variable. Can it be done?

by McDarren (Abbot)
on Dec 10, 2012 at 15:50 UTC ( [id://1008125]=note: print w/replies, xml ) Need Help??


in reply to [SOLVED]Unable to split $ARGV[0] variable. Can it be done?

..so that I can get the bare host name 'google.com'

um, google.com is not a hostname, it's a domain name.
Also, you start your example with 'www.google.com', and then you say you want 'google.com'
Is that correct, or was it a typo?

I'll assume you want to extract the Fully Qualified Domain Name

..appreciate some advice and whether split is the right function to use or not?

Although you could get what you want with split, I wouldn't consider it the best thing to use here. Especially if you're dealing with more complex URL's.
Personally, I'd use URI::Split

use URI::Split qw/uri_split/; my $url = 'http://www.google.com'; my ($proto, $fqdn) = uri_split($url); print "Protocol:$proto Domain:$fqdn\n";
Prints:
Protocol:http Domain:www.google.com

Cheers,
Darren

Replies are listed 'Best First'.
Re^2: Unable to split $ARGV[0] variable. Can it be done?
by Doozer (Scribe) on Dec 10, 2012 at 16:03 UTC
    Sorry, domain name was what I meant yes. No it wasn't a typo. 'http://www.google.com' is passed in to the script and a 'get' request is made against that URL using LWP. If the get request fails, it then tries a different prefix 'https://www.google.com' or 'http://google.com' for example. I want to split the domain name away from the prefix so I can chop and change the combinations as I please. It may be easier to have just the domain name passed in to the script and then the script can handle ALL of the prefixes itself.

    I appreciate all the responses and am currently working through the suggestions to see what I can work with.

      It may be easier to have just the domain name passed in to the script and then the script can handle ALL of the prefixes itself.

      Yeah, that sounds sensible.
      Here is an example of how you might implement that approach:

      #!/usr/bin/perl use strict; use warnings; use LWP::Simple; DOMAIN: while (my $domain = <DATA>) { chomp($domain); for my $protocol (qw/http https/) { next DOMAIN if test_url("$protocol://$domain"); for my $sub (qw/www web/) { next DOMAIN if test_url("$protocol://$sub.$domain"); } } print "Couldn't get anything from $domain\n"; } sub test_url { my $url = shift; print "Trying $url ..."; my $ua = LWP::UserAgent->new( timeout => 5, agent => 'Mozilla/5.0', ssl_opts => { verify_hostname => 0 }, ); my $response = $ua->get($url); if ($response->is_success) { print "OK\n"; return 1; } else { print "FAILED because " . $response->status_line . "\n"; return undef; } } __DATA__ google.com apple.com fred.com dschjksdbckjqh.com
      Output:
      Trying http://google.com ...OK Trying http://apple.com ...OK Trying http://fred.com ...OK Trying http://dschjksdbckjqh.com ...FAILED because 500 Can't connect t +o dschjksdbckjqh.com:80 (Bad hostname 'dschjksdbckjqh.com') Trying http://www.dschjksdbckjqh.com ...FAILED because 500 Can't conne +ct to www.dschjksdbckjqh.com:80 (Bad hostname 'www.dschjksdbckjqh.com +') Trying http://web.dschjksdbckjqh.com ...FAILED because 500 Can't conne +ct to web.dschjksdbckjqh.com:80 (Bad hostname 'web.dschjksdbckjqh.com +') Trying https://dschjksdbckjqh.com ...FAILED because 500 Can't connect +to dschjksdbckjqh.com:443 (getaddrinfo: nodename nor servname provide +d, or not known) Trying https://www.dschjksdbckjqh.com ...FAILED because 500 Can't conn +ect to www.dschjksdbckjqh.com:443 (getaddrinfo: nodename nor servname + provided, or not known) Trying https://web.dschjksdbckjqh.com ...FAILED because 500 Can't conn +ect to web.dschjksdbckjqh.com:443 (getaddrinfo: nodename nor servname + provided, or not known) Couldn't get anything from dschjksdbckjqh.com

      HTH,
      Darren

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1008125]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-19 02:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found