in reply to Re: Common Substrings
in thread Common Substrings

The scalars returned by substr inside a block where use bytes holds, never have the utf8 flag set. For instance:
$ perl -de 1 ... DB<43> $a="\x{1234}/foo" DB<44> x ord substr $a, 0, 1 0 4660 DB<45> sub bsubstr { use bytes; substr $_[0], $_[1], $_[2] } DB<46> x ord bsubstr $a, 0, 1 0 225 DB<47> x ord bsubstr $a, 1, 1 0 136 DB<48> x ord bsubstr $a, 2, 1 0 180 DB<49> x ord bsubstr $a, 3, 1 0 47

Replies are listed 'Best First'.
Re^3: Common Substrings
by Anonymous Monk on Nov 15, 2005 at 15:39 UTC

    Does that mean that I simply need no bytes; before returning the values?

      no, because then, the offsets would be wrong... maybe you could use the utf8::upgrade function to mark the returned strings as unicode but your code would become really ugly and unmaintainable.

      If you are looking for common file paths why don't you use something simpler like:

      sub diffpath_ix { my ($a, $b) = @_; my $last = 0; while ( $a=~m{(\G[^/]*/)}g ) { if ( substr($b, $last, length $1) eq $1 ) { $last = pos $a } else { return $last } } if (substr($a, $last) eq substr($b, $last)) { return length $a } return $last; }