Actually, there's a finer point of regular expressions at work against this perl code. It is as follows:

if (m/\/(\w+\.?\w*)$/){ $filename = $1; }

The assumption here is that the first "\/" means a literal forward slash, then the (\w+\.?\w*)$ means "some word characters, followed by (possibly) a literal period, followed by 0 or more word characters, then the end of line.

However, the "\w" metacharacter is intended to match only alphanumerics and the underscore character, which leaves out a whole bevy of other characters which may be present in file names, e.g. spaces, hyphens, parenthesis, etc.

Granted, "sensible" UNIX filenames often don't contain those characters because they are also often used as shell metacharacters, but this sample dataset looks suspiciously like DOS/Win32 filenames (X: being a giveway) and I can't count the number of times I've had to deal with filenames like "Sales Figures - Dec 19 - Dec 26.doc" and the like!

One possible alternative regex which still gives all characters after the last forward slash to the end of the string is:

/\/([^\/]+)$/
Meaning "A literal forward slash, followed by anything that's NOT a forward slash, to the end of string

Or, as was pointed out in another reply to this post, File::Basename is an alternative if you wish to extract the entire path expression and figure out the $filename from there

Hope that helps,

Paul

When there is no wind, row.


In reply to Re: Re: parse by shelob101
in thread simple parse question by Sara

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.