sad723 has asked for the wisdom of the Perl Monks concerning the following question:

hello

i'm usin a perl script to extract news from a site (whith HTML::TokeParser)

the problem is that this site use arab language so i get a result like this :

found : ?????? ?????? ???????? ???? ???? ????? .. ?? ???? ??????? found : ???????? ??? .. ????? ?????? ?????? found : ???? ???? ???? ????? ????????? : ???????? ??? ????? ???? ?? ?? +??
how can i view arabic character !!??

Replies are listed 'Best First'.
Re: character problem
by thomas895 (Deacon) on Dec 28, 2011 at 09:54 UTC

    I believe that your system does not have the correct character sets installed. You will need to find out which character set Arabic uses, usually ISO/IEC_8859-6.
    Install it on your system, and then see what happens.

    ~Thomas~

      in my system arab language is installed, i can write read with it

      but with my script why it can't diplay text correctly !!!

        sorry it is not a perl problem

        Write and display of Arabic in the command prompt

Re: character problem
by NetWallah (Canon) on Dec 28, 2011 at 20:58 UTC
    From the docs:
    Note that the parsing result will likely not be valid if raw undecoded UTF-8 is used as a source.
    When parsing UTF-8 encoded files turn on UTF-8 decoding:

    open(my $fh, "<:utf8", "index.html") || die "Can't open 'index.html': $!";
    my $p = HTML::TokeParser->new( $fh );

    Have you done this ?

                "Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
            -Friedrich Nietzsche: A Dynamic Translation