Re^2: Common Substrings

The scalars returned by substr inside a block where use bytes holds, never have the utf8 flag set. For instance:

$ perl -de 1
...
  DB<43>  $a="\x{1234}/foo"
  DB<44> x ord substr $a, 0, 1
0  4660
  DB<45> sub bsubstr { use bytes; substr $_[0], $_[1], $_[2] }
  DB<46> x ord bsubstr $a, 0, 1
0  225
  DB<47> x ord bsubstr $a, 1, 1
0  136
  DB<48> x ord bsubstr $a, 2, 1
0  180
  DB<49> x ord bsubstr $a, 3, 1
0  47
[download]

Comment on Re^2: Common Substrings Select or Download Code

Replies are listed 'Best First'.
Re^3: Common Substrings by Anonymous Monk on Nov 15, 2005 at 15:39 UTC
Does that mean that I simply need `no bytes;` before returning the values?	[reply] [d/l]
Re^4: Common Substrings by salva (Canon) on Nov 15, 2005 at 16:00 UTC
no, because then, the offsets would be wrong... maybe you could use the `utf8::upgrade` function to mark the returned strings as unicode but your code would become really ugly and unmaintainable. If you are looking for common file paths why don't you use something simpler like: `sub diffpath_ix { my ($a, $b) = @_; my $last = 0; while ( $a=~m{(\G[^/]*/)}g ) { if ( substr($b, $last, length $1) eq $1 ) { $last = pos $a } else { return $last } } if (substr($a, $last) eq substr($b, $last)) { return length $a } return $last; }` [download]	[reply] [d/l] [select]