in reply to Re: unicode string comparison (perl 5.26)
in thread unicode string comparison (perl 5.26)

What more info do we need? The error does not occur when removing whitespaces (cleanup), but on checking if $last involves a digit or if $last is unicode, whichever comes first (tried either version, i.e.):

I1: if($last=~/[^/x00-\x7f]){...#here the exception occurs} I2: if($last !~/\d/){... # here the exception occurs is I1 is commented out }

Replies are listed 'Best First'.
Re^3: unicode string comparison (perl 5.26)
by haj (Vicar) on Nov 01, 2019 at 14:20 UTC
    What more info do we need?

    With the information you've provided so far, I just can't help. I am pretty sure that none of the lines you've shown so far can throw a "Malformed UTF-8 character". Instead, two of the lines are syntax errors. Please take care when providing code samples that they actually demonstrate your point.

    You also haven't quoted the exact error message, which might contain more information about the offending character, as in the following examples:

    Malformed UTF-8 character: \xa4 (unexpected continuation byte 0xa4, wi +th no preceding start byte) at /tmp/a.pl line 3 Malformed UTF-8 character: \xe4\x22\x20 (unexpected non-continuation b +yte 0x22, immediately after start byte 0xe4; need 3 bytes, got 1) at +/tmp/a.pl line 7.

    Finally, you haven't answered my question about your decoding routine. Perl complains about malformed UTF-8 characters when you feed it a string which you declare as UTF-8 but it isn't, but I can't see any of this in your code.

Re^3: unicode string comparison (perl 5.26)
by swl (Prior) on Nov 01, 2019 at 06:15 UTC

    haj was noting that the line

    $last=$s/\s+//g;  #clean it up

    should be

    $last =~ $s/\s+//g;  #clean it up

    Are you able to provide some example data for others to test with?

    Also, your code does not compile. The if ($last !~/\d/) {...} or if($last=~/[^/x00-\x7f]) {...} block should be if ($last !~/\d/) {...} elsif ($last=~/[^/x00-\x7f]) {...}

      Note also that the inverted character set in  $last=~/[^/x00-\x7f] (missing closing / forward slash for regex, unescaped forward slash in pattern) should probably be  [^\x00-\x7f] (backslash vice forward slash); otherwise,  [^/x00-\x7f] is the same as  [^\/-\x7f].

      Update: Oops... choroba already mentioned the point about  [^\x00-\x7f] vs  [^/x00-\x7f] here; didn't read that far before I posted.


      Give a man a fish:  <%-{-{-{-<