Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Common Substrings

by robin (Chaplain)
on Nov 15, 2005 at 21:15 UTC ( #508794=note: print w/replies, xml ) Need Help??

in reply to Common Substrings

This code;
sub common_path { my ($p, $q) = ("$_[0]/", "$_[1]/"); (my $pos = ($p ^ $q)) =~ /[^\0]/g; my $i = 1 + rindex($p, "/", pos($pos) - 2); return ( substr($p, 0, $i), substr($p, $i, -1), substr($q, $i, -1) ); }
behaves the same as the last one you posted. It's more efficient as well as simpler (according to my benchmarks).

I don't think there's any good reason to use bytes for this. Were you doing it for efficiency reasons?

Replies are listed 'Best First'.
Re^2: Common Substrings
by Anonymous Monk on Nov 16, 2005 at 04:24 UTC

    Thank you, that is what I was looking for!

    It only needs a test for equality at the beginning to behave the same. Otherwise there is no match and pos() will return undef, which will generate the relevant warning and a zero $i. (A test on the match could also resolve this.)

    Plus, doing it this way makes a separate routine to find the common path redundant for the purposes of generating relative paths, as there are no byte/character issues.

    In fact, the use bytes was only needed to determine $len0 and $len1, and should have been enclosed: do{use bytes;length $string;} is recommended in the camel book.

    Some comments on what I meant by efficient...

    Needless to say that my program will spend most of its life busy with filesystem interaction, and there isn't much that can be done in this area. In fact, any improvement on the relative path generation may not even be noticed when doing a recursive: fixlinks /

    My first implementation used split() in a similar way to the answer given above by Moron. But I wasn't happy with the need to change the structure of the arguments data; i.e. if I receive scalars and compute another scalar, there must be a very good reason to transform the arguments into lists!

    In my opinion this is a reasonable concern, that is only about taking care in the solution to a given problem. (Maybe I should state that I have never been a C programmer.) Anyway, this got me thinking about the way coreutils/lib/canonicalize.c does things, and took me to a use bytes solution.

    If I wasn't under the wrong impression that substr() was expensive, I could have written a solution like the method by salva, and after time might have remembered bisection. (The irony being that one is generating a list of numbers that converge to the desired position.)

    Still, the xor approach by Corion is the best. As you say there is no good reason to use bytes, it is just a case (for me, very instructional) when a binary operation helps processing strings.

    Regards to All!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://508794]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2023-03-28 20:26 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (69 votes). Check out past polls.