Ah, fellow monks, I am a depraved regex junky. I just cant get enough and find myself coding them for the heck of it. Is there help for me? But, I digress. Let us examine this code:
my ($ptid, $total, $used, $avail, $pct, $mp) = $element =~ m!^(/dev/.+[0-9]+) # which partition $ptid \s+([0-9.]+[MGK]) # total size of parition $total \s+([0-9.]+[MGK]) # used space $used \s+([0-9.]+[MGK]) # available space $avail \s+(\d{2})% # percent usage $pct \s+(.*)$!x; # mounting point $mp
This regex is being used to match this data:
---- Sat Feb 3 12:01:01 EST 2001 Filesystem Size Used Avail Use% Mounted on /dev/hda7 904M 261M 598M 30% / /dev/hda12 852M 378M 474M 44% /devel /dev/hda10 9.8G 9.6G 256M 97% /home /dev/hda9 1.8G 1.6G 225M 88% /home/dl /dev/hda5 768M 751M 17M 98% /mnt/macos /dev/hda8 3.9G 3.4G 304M 92% /usr /dev/hda6 387M 93M 275M 25% /var /dev/hdb5 1008M 591M 365M 62% /home/ftp /dev/hdb6 1008M 209M 748M 22% /home/httpd /dev/hdb9 1.5G 1.1G 358M 75% /mnt/build /dev/hdb8 640M 456M 151M 75% /mnt/mp3
which is being repeated on an hourly cronjob. So this could easily turn into several megs (or even dozens of megs) of text. Therefore, speed will be an issue.

So I'm looking at this and see a pretty specific regex. I thought of substituting \S+ for .*. However, in unix (nt compatibility, obviously, is not an issue here) mounting points can include awful characters like *, \n, \a, and so on. So, basically, I see two flaws to the expression. First, the use of .* (and .+), and second the part where \s+([0-9.]+[MGK]) is captured seems repetitive. Has anyone got some regex-tuning hints here?

thanks,
brother dep.

--
i am not cool enough to have a signature.


In reply to Getting rid of (.*) from a not-quite-complex regex. by deprecated

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.