zentara has asked for the wisdom of the Perl Monks concerning the following question:

Hi, this question has also been posted on the perl-gtk2 maillist, but I post it also here as a bit of a lesson in hacking unicode problems for the monks.

It all started when I wanted to make my vgrep utility unicode aware. See Gtk2 Visual Grep. As I worked thru the problems of unicode input, I was scratching my head and couldn't figure out why I needed to manually decode @ARGV, even though the utf8::all module did this. Then after hacking out UTF-8 and File::Find, I noticed that I SHOULD NOT have to manually decode @ARGV if I used utf8::all.

So I started experimenting, and found out that if the use utf8::all line came before use Gtk2 I needed to manually decode @ARGV, but if the utf8::all line came after Gtk2, @ARGV was decoded properly.

Here is the code if you want to test. Any insights into why this behavior occurs is welcome. I'm using Perl 14.1 and on linux.

See script below.
####################################
Commandline: ./vgrep 日
Good output:
argv-> 日
search-> 日

Bad output:
argv-> це
search-> це

####################################

#!/usr/bin/perl use warnings; use strict; #use utf8::all; # placed here the decoding dosn't work right use Gtk2 -init; # commenting out Gtk2 modules fixes problem use Glib qw(FALSE TRUE); use Gtk2::Pango; use Encode qw(decode); use utf8::all; #placed here, the decoding works properly $|++; my $search_str = $ARGV[0]; $search_str ||= undef; print "argv-> @ARGV \n"; print "search-> $search_str\n"; __END__

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

Replies are listed 'Best First'.
Re: oddity when loading Gtk2 with utf8::all
by Anonymous Monk on Jul 01, 2012 at 16:04 UTC

    Whenever I encounter such things I think: undocumented assumptions are bleh

    So I grep Gtk2 for utf8, nothing relevant, then I grep for @ARGV and I find that its Gtk2s -init option is known to manipulate @ARGV -- mystery solved :)

    Though I can't see ( GPerlArgv* gperl_argv_new () ) how/why the utf8 bits don't get copied properly -- could be the g_strdup stuff