Since I got bitten with a couple of my public attempts at regex construction, I decided to re-read the pod, japhy's book and take a (private) crack at any regex Q's that came up here in an attempt to improve my skills.

Whats below was started by this [untitled node, ID 192753] SoPW, and I am posting for two reasons.

  1. I would like feedback on my mechanism for deriving my regex. The idea being to do a little as necessary (laziness) and to use as few wildcard components as possible for best performance.
  2. There are half a dozen questions in the comments (look for the ?????) that I would request explanations or pointers on.

#! perl -w use strict; my @site; push @site, "<!-- USER $_ - donkey_pusher_$_ -->" for (1..10); =pod ### Start with a typical sample /<!-- USER 20 - donkey_pusher_6 -->/ ### Escape anything that might cause a problem (Nothing to do here!) /<!-- USER 20 - donkey_pusher_6 -->/ ### Add some anchors if I KNOW that they are true /^<!-- USER 20 - donkey_pusher_6 -->$/ ### Bracket the bit(s) I want to keep. /^<!-- USER 20 - (donkey_pusher_6) -->$/ ### Substitute appropriate wildcards (Right term?) for the bits I know + will change ### The start/end of html comments have to be fixed pretty much. ### The whitespace could vary, but the /x modifier ??should?? handle t +hat nicely /^<!-- USER \d+ - (\w+) -->$/ ### Add any modifiers that might help. ### /x so each space will match any number or combination of whitespac +e. ### /o to compile for speed, ??not clear to me if this is necessary o +r advantagous ### if there are no variables to be interpolated in the regex?? ### /i incase "USER" might vary in case. ### If its not needed, don't! I think it is probably expensive. /^<!-- USER \d+ - (\w+) -->$/xio =cut # qr// can have some speed benefits, ??when?? my $regex1 = qr/^<!-- USER \d+ - (\w+) -->$/iox; # I had to remove the /x else it didn't match SOMETIMES???????? my $regex2 = qr/^<!-- USER \d+ - (\w+) -->$/io; for (@site ) { ## Standalone with /x on qr// NEVER matches ????????? # next unless $_ =~ $regex1; # This FAILS to match also ??????????? # next unless m/$regex1/; # Adding the /x to $regex2 this way works ok # next unless m/$regex2/x; next unless $_ =~ $regex2; # And this works. my $userid = $1; print $userid,$/; } # use the map trick # using $regex1 with the /x modifier WORKS ok here!! my @users = map { $regex1 ? $1 : () } @site; my $doc=join "\n", @site; # Simulating slurp mode here!!! study $doc; ## Could give big boost on long strings? # Again, adding the /x modifier here.... my $regex3 = qr/^<!-- USER \d+ - (\w+) -->$/oi; #my @users2 = $doc =~ /$regex3/mgc; # means this FAILS! ?? my @users2 = $doc =~ /$regex3/xmgc; # Adding here it works. print 'results:', ~~@users, ' ', ~~@users2, $/;

What's this about a "crooked mitre"? I'm good at woodwork!

In reply to Improving my regex skills and a few questions. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.