Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Re: Parse out the extension of a filename - return base of filename.

by snafu (Chaplain)
on Mar 13, 2002 at 12:42 UTC ( [id://151358]=note: print w/replies, xml ) Need Help??


in reply to Re: Parse out the extension of a filename - return base of filename.
in thread Parse out the extension of a filename - return base of filename.

According to everything I have read on PM, the use of rindex() and substr() is a poor way to do this task. Therefore, I wrote a solution that didn't use them. Obviously, using a pure regex solution would work, but that is cludgy and even ill-advised here on PM.
It would seem that many have felt I have done something wrong here?

Its just a code snippet....one that works, no less!

_ _ _ _ _ _ _ _ _ _
- Jim
Insert clever comment here...

Replies are listed 'Best First'.
Re: Re: Re: Parse out the extension of a filename - return base of filename.
by Juerd (Abbot) on Mar 13, 2002 at 13:25 UTC

    According to everything I have read on PM, the use of rindex() and substr() is a poor way to do this task.

    What the...? Can anyone second that (with examples and vivid explanation please)?
    I kind of refuse to believe this job should not be done with substr and rindex.

    Its just a code snippet....one that works, no less!

    One that prints data to the screen, even when not debugging. Not quite useable in most circumstances.
    Besides, it doesn't work with all valid filenames:

    parse_out_extension 'foo.b(ar'; parse_out_extension 'foo.**';

    No, using arrays and several iterations, printing useless text and not escaping is not a better solution than a pure regex one.
    Substr+rindex is the best solution for this, followed by a substitution, but "solutions" like yours are, imho, out of the question.

    No offense meant.

    U28geW91IGNhbiBhbGwgcm90MTMgY
    W5kIHBhY2soKS4gQnV0IGRvIHlvdS
    ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
    geW91IHNlZSBpdD8gIC0tIEp1ZXJk
    

      I can't speak intelligently on the index/regex issue (they are both plenty fast for me....), but the first time I saw it mentioned here on PM was in this comment by merlyn.

      -Blake

        According to everything I have read on PM, the use of rindex() and substr() is a poor way to do this task.

          What the...? Can anyone second that
      To a certain degree I can second that. Using rindex and index can cause issues. Consider if you have directories with dots in em? Consider if you have multiple extensions, eg File.pl.bak. And for using rindex and index for other aspects of path parsing is also bad especially when you consider the the various path delimiters, and portability issues.

      Actually it was the use of rindex that caused a script here to behave very strangely just the other day.

      BTW, you and snafu might like to have a look at one of my earlier posts (and the thread its in) for some interesting discussion about parsing filenames/urls with a regex.

      Yves / DeMerphq
      ---
      Writing a good benchmark isnt as easy as it might look.

      I can't find the nodes that I read where it is suggested to not use those functions. I promise I read those suggestions, though, which is the only reason why I didn't use those functions.

      Im not sure what data you are talking about that prints to the screen. I don't get any such data, just the return from the function if I print it.

      I see your point as far as my "solution" (heh, gotta quote it since you are right) is concerned. I need to go back to the drawing board.

      I asked for comments, I got comments. I feel like I gotta flogging, though. :)

      _ _ _ _ _ _ _ _ _ _
      - Jim
      Insert clever comment here...

        Im not sure what data you are talking about that prints to the screen. I don't get any such data, just the return from the function if I print it.

        Here's a part of your original post, before you updated it and removed the print statements:

        print "\n-----------\n"; print "There are ". scalar @pieces . " elements in \@pieces from ". "filename: $file\n"; my $c = 0; for ( @pieces ) { print "element ", ++$c, ": $_\n"; } print "-----------\n";
        In case your memory has failed you.

        U28geW91IGNhbiBhbGwgcm90MTMgY
        W5kIHBhY2soKS4gQnV0IGRvIHlvdS
        ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
        geW91IHNlZSBpdD8gIC0tIEp1ZXJk
        

Re: Re: Re: Parse out the extension of a filename - return base of filename.
by rob_au (Abbot) on Mar 14, 2002 at 00:40 UTC
    It would seem that many have felt I have done something wrong here?

    Not necessarily wrong, just another way to do it ... :-)

    There are a couple of points that I would highlight with your code:

    • The use of the perlfunc:map command in a void context is generally considered bad form as the strength of this function lies in the list context that it returns. The use of this function in a void context generally means that you are using the wrong function for the task - For example, where you have:

      map { push(@pieces,$_) } split(/\./,$file);

      It would be much better to perform this with a simple for or foreach loop. eg.

      push( @pieces, $_ ) foreach split( /\./, $file );

      This is also referenced in the node - What's wrong with using grep or map in a void context?.

    • At the point in the function where you remove the file extension and return the remaining portion of the file name, there are a couple of ways by which this could be done possibly more efficiently (not having performed code benchmarks at this point), the most immediate being the following:

      pop @pieces; return join '.', @pieces;

      In this instance, there is no need for the regular expression against the original file name or test for definition of file name as join will return an empty string if @pieces is empty.

    One aspect about your code however that did seem redundant given that the functionality which you are seeking is available within File::Basename (see my node here for a code example that gives you exactly what you desire from the fileparse method).

     

    perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://151358]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-28 14:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found