Extracting the lines that match for an array of lines using the Perl function grep (as opposed to the program) is no more complicated than this:

my @matches = grep /PATTERN/, @lines;

Now, since you will be extracting the usernames from these matches as well, you might as well do that while matching, as explained by Popcorn Dave.

Don't use "dot start" (.*) in your regex (although some regexes above do), because it will cause unnecessary backtracking. Dot matches anything but a newline by default and the star indicates "zero or more of the preceeding". So, when trying to match a line and getting to "dot star" this will match to the end of the line and after that the dot will let go, bit by bit, anything necessary for an overall match. Things will get worse when "dot star" makes more appearances in the regex.

As far as the regex goes, it seems from your code that this will do just fine:

/<!-- USER \d+ - (\S+) -->/i

That is, match <!-- USER followed by a space, some number, a space, a minus, a space, one or more occurences of a non-whitespace, a space, and finally -->. All this case-insensitively.

Although non-backtracking subpatterns admittedly will help you somewhat in making your code faster, I would not use them if they're not really needed: they would just obscure what is happening.

Putting it all together, you would end up with something like this:

my @users; foreach (@lines) { /<!-- USER \d+ - (\S+) -->/i and push @users, $1; }

You may see people doing the same thing like this:

my @users = map { /<!-- USER \d+ - (\S+) -->/i ? $1 : () } @lines;

What is happening here is that for each element of @lines you check if the line matches your regex. If so, you add the value of $1 (the username) to the list of @users; if not, you add an empty list (ie. nothing) to @users. This might come in handy when reading other peoples' code.

Hope this helps.

— Arien

Edit: Also, if you know what you are looking for can only appear at the start of the line you can speed things up by anchoring your regex (using ^) like this:

/^<!-- USER \d+ - (\S+) -->/i

In reply to Re: Regex simplification by Arien
in thread [untitled node, ID 192753] by Samn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.