Hello!

I'm just learning Perl and as an exercise I'm writing a small script that reads a file and outputs every URL it can find. I know there are modules for this, this is just for learning purposes, and to brush up on regular expressions. I came up with the following regular expression:

my $re = qr( ( (?:[a-z][a-z0-9+-.]*) :// (?: (?: [a-z0-9._~!:;=&'\$\(\)\*\+\-\,]+@ )? (?: \[${ipv6}\] | ${ipv4} | [a-z0-9._~!;=&'\$\(\)\*\+\-\,]+ ) ) ) )xi; foreach ($ARGV[0]) { open my $fh, '<', $_ or die("Error opening file $_.\n"); while (my $row = <$fh>) { chomp $row; next if $row eq ""; if ($row =~ $re) { print "$1\n"; } } close($fh); }

As you can see, I'm using qr to define the regular expression, as it's composed of other regular expressions defined in the code (omitted here for brevity). This gives me the most flexibility to later on refactor this script to make it more general purpose, or at least that is the idea.

The file is read line by line, comparing against $re, and correctly printing the first URL it finds on that line. And that's the issue, it only finds the first match even when there are multiple URLs on that line. Typically, this is where I'd use the global flag, except that apparently I cannot use it with qr as I get an error: Unknown regexp modifier "/g".

I've been reading about this but haven't been able to figure out a way to search the entire line to capture all matches. I tried using the s flag, different delimiters for qr, in case that made any difference, and of course tried modifying $re to use operators like + and *, but without any results.

So, I don't know if I'm misunderstanding the problem that I need to solve, or I just don't know enough about Perl to use it effectively. I would say the issue is that declaring regular expressions with qr is not what I need for this particular case but I'm just not sure. Any ideas? Thank you!


In reply to Use global flag when declaring regular expressions with qr? by unmatched

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.