in reply to how to change unicode string to byte array ?
If the results show that your database contains only ascii characters, it probably means that the numeric character entities (like "ஸ") are being stored in the table (you should print some of the entries to confirm this, and you should probably look into fixing things so that you store actual utf8 characters instead).#!/usr/bin/perl use strict; use DBI; use Encode; # set up connection parameters $dbname, $host, $user, $password, then: my $dbh = DBI->connect( "DBI::mysql:database=$dbname;host=$host", $use +r, $password ); my $sth = $dbh->prepare( "select item_name from table1" ); $sth->execute; my ( $is_wide_utf8, $is_not_utf8, $is_ascii ); while ( my ( $item ) = $sth->fetchrow_array ) { my $nbytes = length( $item ); my $nchars = eval "length( decode( 'utf8', $item, Encode::FB_CROAK + ))"; if ( $@ ) { # decode croaked -- text string is not utf8 $is_not_utf8++; # figure out something else to try (maybe shiftjis?) } elsif ( $nchars == $nbytes ) { $is_ascii++; } else { $is_wide_utf8++; } } print "$is_ascii entries found to be ascii-only\n" if ( $is_ascii ); print "$is_wide_utf8 entries found to be utf8\n" if ( $is_wide_utf8); print "$is_not_uft8 entries found to be non-utf8\n" if ( $is_not_utf8 + );
Next, in your cgi script, you need to figure out what sort of data is coming from your clients -- numeric entities? actual utf8 characters?
As for doing table searches for unicode strings based on input from a cgi client, the best setup is:
my $srchString = $cgi-param( ... ); $srchString =~ s/\&#(\d+);/chr($1)/ge; # convert numeric entities to +characters $srchString .= '%' unless $srchString =~ /%$/; # add a wildcard for t +he search my $sth = $dbh->prepare( "select item_name,blah from table1 where item +_name like ?" $sth->execute( $srchString ); my $rows = $sth->fetchall_arrayref; for my $row ( @$rows ) { my ( $item, $blah ) = @$row; $item = decode( 'utf8', $item ); # make sure perl knows this is u +tf8 data # ... do other stuff... }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: how to change unicode string to byte array ?
by nythu (Initiate) on Sep 29, 2008 at 19:05 UTC | |
by graff (Chancellor) on Sep 29, 2008 at 23:30 UTC |