That regex isn't too bad. Your assumption about
: not being allowed in a filename is completely wrong. About the only character not allowed in a filename is a directory separator, such as
/ on Unix.
But apart from that, I have a few minor suggestions on your regex.
- Inside a character class [] a period does not need to be escaped.
- You should use /x. It makes your regex easier to read.
- \d is usually preferred to [0-9]. It makes your regex more portable.
- You have an unnecessary set of parens in your regex: (?: ).
Rewritten, it would read:
/
^
[^.]+ \. \s # Ignore the line numbers
(.+?) # Capture the file name
(?:
(\d) \s # Capture the optional leading size digit
) ?
(
\d+ \. \d {2} # Capture the rest of the size
)
\s MB .*
$
/x
After this, your file name is
$1, and your size is, as you stated,
$2$3.
And yes, there is an ambiguity, where if the line was:
1. go 2 123.45 MB
The regex would parse "go" as the file name and "2123.45" as the file size. There's no way around this given the format of the input.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.