That regex isn't too bad. Your assumption about : not being allowed in a filename is completely wrong. About the only character not allowed in a filename is a directory separator, such as / on Unix.

But apart from that, I have a few minor suggestions on your regex.

  1. Inside a character class [] a period does not need to be escaped.
  2. You should use /x. It makes your regex easier to read.
  3. \d is usually preferred to [0-9]. It makes your regex more portable.
  4. You have an unnecessary set of parens in your regex: (?: ).

Rewritten, it would read:

/ ^ [^.]+ \. \s # Ignore the line numbers (.+?) # Capture the file name (?: (\d) \s # Capture the optional leading size digit ) ? ( \d+ \. \d {2} # Capture the rest of the size ) \s MB .* $ /x
After this, your file name is $1, and your size is, as you stated, $2$3.

And yes, there is an ambiguity, where if the line was:

1. go 2 123.45 MB
The regex would parse "go" as the file name and "2123.45" as the file size. There's no way around this given the format of the input.


In reply to Re: Regex hackery by markkawika
in thread Regex hackery by PoorLuzer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.