in reply to Re^4: Grab input from the user and Open the file
in thread Grab input from the user and Open the file

Hi valerydolce,

Designing your own regex is certainly very good practice. For a production system I'd recommend existing modules, for example Regexp::Common to find URIs and URI to parse them. For example:

use warnings; use strict; my $str = <<'END_STR'; I am an example http://www.perlmonks.org/?parent=1176663;node_id=3333 +text that contains <https://perlmonks.pair.com/?node_id=1176651> two URIs END_STR use Regexp::Common qw/URI/; use URI; while ($str=~/$RE{URI}{-keep}/g) { my $uri = URI->new($1); print "$uri\n"; print " Scheme: ", $uri->scheme, "\n"; print " Host: ", $uri->host, "\n"; print " Path: ", $uri->path, "\n"; print " Query: ", $uri->query, "\n"; }

See the URI documentation for lots more ways to access the different parts of the URI. I did notice that unfortunately Regexp::Common apparently doesn't match the #fragment part of the URI, so here's an attempt at an alternate solution, using a regex based on the characters allowed in URIs from RFC 3986.

# NOTE this is based on a quick skim of RFC 3986 and may not be comple +te! my $url_re = qr{ # https://tools.ietf.org/html/rfc3986#section-2 # URI = scheme ":" hier-part ...; hier-part = "//" ... # scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) [A-Za-z][A-Za-z0-9+\-.]* :// # gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" # / "*" / "+" / "," / ";" / "=" # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" ( [:/?#\[\]@!\$&'()*+,;=A-Za-z0-9\-._~] # pct-encoded = "%" HEXDIG HEXDIG | %[0-9A-Fa-f]{2} )* }x; while ($str=~/($url_re)/g) { my $uri = URI->new($1); print "$uri\n"; }

Hope this helps,
-- Hauke D