mikedshelton has asked for the wisdom of the Perl Monks concerning the following question:

Good Monks...

I would like to create a one line regex (assuming this is faster than the two line solution) that insures a directory has one and only one "/" at the beginning and at the end...

Two liner

$directory =~ s<^/*></>; #beginning

$directory =~ s</*$></>; #ending

One line not working

$directory =~ s<^/*></>.</*$></>; #beginning and ending

Those with benchmarking experience...

Is the one line solution faster than the two line solution?

Thank you in advance

janitored by ybiC: Balanced <code> tags around snippets

Replies are listed 'Best First'.
Re: Regex - one and only one / at beginning and end of file path
by tachyon (Chancellor) on Dec 14, 2003 at 23:31 UTC

    Those with benchmarking experience...

    If you really want to know (see 2) then why not do it yourself Benchmark made easy

    Is the one line solution faster than the two line solution?

    2) Do you REALLY care? Does it REALLY matter? Do you for a second think that this is likely to make any measurable difference to the execution speed of your code? It won't.

    The most common similar usage is to simultateously strip leading and trailing whitespace from a string. japhy benchmarked the two line and one line examples somewhere, but due to 2) you can Super Search for that yourself :-)

    To do it in one line simply need alternation with | and /g to do all two

    $_ = '/////usr///bin///perl////'; s!^/+|/+$!/!g; print $_, $/; # but as you can see this is still broken so # assuming it is dealing with *nix paths # you probably want just $_ = '/////usr///bin///perl////'; s!/+!/!g; print $_, $/; # but as tr/// is faster than s/// this will probably be the winner $_ = '/////usr///bin///perl////'; tr!/!/!s; print $_, $/; __DATA__ /usr///bin///perl/ /usr/bin/perl/ /usr/bin/perl/

    PS The answer FWIW is that two s/// lines are marnigally faster than one s/// with alternation, but the last s/// regex example is (probably) faster than either option. tr/// will be faster again as tr/// is faster than s/// all other things being equal. Also /* is slower than /+ is slower than /{2,} becuase we only want to do a sub if we have to and we want to effectively fail fast if we have nothing that requires doing. I'll leave it to you to prove these suppositions.....

    cheers

    tachyon

      tachyon++

      tr/ // is faster than s/// all other things being equal

      Actually, s/// can be faster than tr/// for some tasks. I'll let you find the nodes that contain examples of that. (:

                      - tye
Re: Regex - one and only one / at beginning and end of file path
by cchampion (Curate) on Dec 14, 2003 at 22:46 UTC

    If you want to use one regex only, you must add an alternation and thus a /g option, because you need to match two times.

    $directory =~ s{ (?: ^ /* | (?<=[^/])/* $ ) }</>xg;

    Update added a look-behind assertion to make the expression work in all cases. All in all, it's better with two regexes!

Re: Regex - one and only one / at beginning and end of file path
by Abigail-II (Bishop) on Dec 14, 2003 at 23:31 UTC
    Unlike what cchampion is saying, you can do it in one regex, and even without using alternation and /g. A short benchmark even suggest it to be slightly faster than using two regexes. The regex:
    s!^/*(.*[^/])/*$!/$1/!;
    The benchmark:
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; chomp (our @lines = <DATA>); our (@r1, @r2, @r3); cmpthese -10 => { mikedshelton => '@r1 = map {local $_ = $_; s!^/*!/!; s!/*$!/!; $_} @lines', cchampion => '@r2 = map {local $_ = $_; s{ (?: ^ /* | (?<=[^/])/* $ ) }</>xg +; $_} @lines', abigail => '@r3 = map {local $_ = $_; s!^/*(.*[^/])/*$!/$1/!; $_} @lines', }; die "Unequal\n" unless "@r1" eq "@r2" && "@r2" eq "@r3"; __END__ /foo/bar/baz/ foo/bar/baz/ /foo/bar/baz foo/bar/baz / foo/bar foo
    And the results:
    Rate cchampion mikedshelton abigail cchampion 8069/s -- -20% -27% mikedshelton 10038/s 24% -- -9% abigail 11057/s 37% 10% --

    Abigail

      Misread the question

        Didn't we lose the action of adding '/' when not present at the ends somewhere?   It seems that s!/{2,}!/!g; at least leaves 'foo/bar' as itself.   Which of these RE's are for explication only?
      ^/*(.*[^/])/*$
      That will only match if the path ends in a slash. Duh! I think I checked my brain at the door. Sigh..

      Makeshifts last the longest.

        $ perl -wle '"foo/bar" =~ m!^/*(.*[^/])/*$! and print "Aristotle i +s wrong."' Aristotle is wrong.

        Abigail

Re: Regex - one and only one / at beginning and end of file path
by shenme (Priest) on Dec 14, 2003 at 23:38 UTC
    First, your single line doesn't work because you can't string two separate s/// together like that.   What you wrote was
    $directory =~ s<^/*></> . </*$></>;
    Turning on warnings would've helped here.

    cchampion fixed his RE so I can't explain why it doesn't work, cause now it does.

    What I've been able to make work is something that recognizes the _middle_ of your string, not either end.   This is a translation out of Perl Cookbook, 2ndEd.   (cargo-cult of a higher order?)

    $directory =~ s< ^ /? # match leading '/' if present ( # capture what is between ends [^/]* # match anything not a '/' (?: # might be a '/', (?! /$ ) # but don't allow a '/' at EOL . # okay, eat the '/' ) * ) /? $ # match trailing '/' > </$1/>x; # now surround the middle with '/'
    Update:   I take too long typing!   And my solution only takes one '/' off each end.   I guess I took so long typing that "I forgot the question"   ;-)
Re: Regex - one and only one / at beginning and end of file path
by Aristotle (Chancellor) on Dec 15, 2003 at 04:18 UTC
    A solution without alternation and without assertions:
    s { \A /* ( (?: [^/]+ /+ )*? [^/]* ) /* \z } { length $1 ? "/$1/" : '/' }ex;
    I got lazy and copped out with the /e for that edge case.. :)

    Makeshifts last the longest.

      Just TIMTOWTDI:
      unless ('/' eq substr($directory, 0, 1) and '/' eq substr($directory, -1, 1)) { ...
      I don't know how fast it is, but I generally prefer substr over RegExes in fixed-width-fixed-content problems.
      Cheers,
      CombatSquirrel.
      Entropy is the tendency of everything going to hell.

        What concern does that snippet address? I don't see how it has anything to do with anything posted anywhere on this thread?

        As for performance, despite appearances, you're actually going to have a hard time beating the regex engine. With substr, you usually have to do a (relative) lot of explicit work in Perl code, which means many more ops to interpret; with regexen, most of the infrastructure is implicit in the guts of Perl, which execute much faster.

        Makeshifts last the longest.

Re: Regex - one and only one / at beginning and end of file path
by delirium (Chaplain) on Dec 14, 2003 at 23:00 UTC
    There has been discussion about this before. If memory serves me correctly, the two regexes are faster than one big one. Also, since you're only killing one character off of the end, it may even be faster to do something crazy like:

    s/^\//; { local $/='/'; chomp; }

    Didn't read the question correctly the first time. Sorry.

Re: Regex - one and only one / at beginning and end of file path
by Wassercrats (Initiate) on Dec 15, 2003 at 03:52 UTC

    I added a feature to my site map utility that does that. My routine is much more complex than what others have posted so far, and handles directory detection in some cases by comparing webpages retrieved through URLs with and without a trailing slash. A slashed target would break some links that end with a file, and some URL's point to an extensionless file that looks like a directory. My code is proprietary, but the instructions in the configuration utility currently read:

    Slash Directory
    Type yes to append a slash to each link target that points to a directory, when a trailing slash isn't already present (speeds loading of target page). Extensionless files will be detected and not slashed. Otherwise, type no or leave blank.

    Slash Base
    Type yes to append a slash to each link target that points to a base directory (domain name), when a trailing slash isn't already present. Otherwise, type no or leave blank. Targets that specify a home page's file, such as domain.com/index.html, will not be slashed. This feature adds consistency to link targets, but does not speed loading of target page.