I'm just learning Perl and as an exercise I'm writing a small script that reads a file and outputs every URL it can find. I know there are modules for this, this is just for learning purposes, and to brush up on regular expressions. I came up with the following regular expression:
my $re = qr( ( (?:[a-z][a-z0-9+-.]*) :// (?: (?: [a-z0-9._~!:;=&'\$\(\)\*\+\-\,]+@ )? (?: \[${ipv6}\] | ${ipv4} | [a-z0-9._~!;=&'\$\(\)\*\+\-\,]+ ) ) ) )xi; foreach ($ARGV[0]) { open my $fh, '<', $_ or die("Error opening file $_.\n"); while (my $row = <$fh>) { chomp $row; next if $row eq ""; if ($row =~ $re) { print "$1\n"; } } close($fh); }
As you can see, I'm using qr to define the regular expression, as it's composed of other regular expressions defined in the code (omitted here for brevity). This gives me the most flexibility to later on refactor this script to make it more general purpose, or at least that is the idea.
The file is read line by line, comparing against $re, and correctly printing the first URL it finds on that line. And that's the issue, it only finds the first match even when there are multiple URLs on that line. Typically, this is where I'd use the global flag, except that apparently I cannot use it with qr as I get an error: Unknown regexp modifier "/g".
I've been reading about this but haven't been able to figure out a way to search the entire line to capture all matches. I tried using the s flag, different delimiters for qr, in case that made any difference, and of course tried modifying $re to use operators like + and *, but without any results.
So, I don't know if I'm misunderstanding the problem that I need to solve, or I just don't know enough about Perl to use it effectively. I would say the issue is that declaring regular expressions with qr is not what I need for this particular case but I'm just not sure. Any ideas? Thank you!
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |