xabbu has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I just got hired to do a classbook-ish for the university I'm attending. And the handling of all the photos and students is done with a couple of perl-scripts so far so good. My problem is that I only know enough Perl to fix small problems and bugs.

And now I have hit a larger problem. One of the scripts builds a catalouge-structure looking something like this;

"basedir/chapter/program/year"

The thing is all of this is in swedish. And for some reason the Perlscript doesn't seem to handle Å, Ä and Ö as it should always. What I think is strange is that the script handles åäö great when writing to a file, as can be seen from the output taken from the controlfile the script creates. But in the catalouge-structure åäö is all but normal, e.g Civilrätt, Data- o systemvetenskap, magister ämnesdjup

I get all the data from a MS access database and I'm running a fresh install of Windows XP and compiling the code with Active Perl 5.10.1 Build 1006.

Does anyone have a clue as to what the problem may be?

Here is a copy of the script

#!/usr/bin/perl use strict; use DBI; use Lingua::EN::NameCase qw( nc ); use File::Copy; my $targetDir = "./fotokatalogen_sektion_4_v6"; my $imageDir = "./foton"; my $enrolementperiod; my $addresser = "addresser_2009" my @pictureTableList = ("nyantagna_2009","nyantagna_2008","nyantagna_2 +007","nyantagna_2006","nyantagna_2005","nyantagna_2004","nyantagna_20 +03"); mkdir($targetDir); #the ODBC name of the database my $ODBCDataSource = "Namndatabas"; # connects to the database. if this part fails, make sure the database + is set up as a System-wide ODBC datasource. # Google knows how to do it. my $dbh = DBI->connect("dbi:ODBC:$ODBCDataSource") || die "could not c +onnect to datbabase"; my $n = 4; my %section; #Variabeldeklarationer my $p; my $sek; my @program; my $query; my $querySektion; my $query_handle; my $query_handle_sektion; while($n != 13){ #Get programnames $query = "SELECT program FROM program WHERE sektion = $n"; $query_handle = $dbh->prepare($query); $query_handle->execute(); #get sektionsnames $querySektion = "SELECT Sektion FROM sektion WHERE ID = '$n'"; $query_handle_sektion = $dbh->prepare($querySektion); $query_handle_sektion->execute(); $sek = $query_handle_sektion->fetch(); $query_handle->bind_columns(undef, \$p); while($query_handle->fetch()) { push(@program, $p); } @{$section{ $n } } = @program; print "@{$sek}: @{$section{$n}}\n"; undef(@program); $n++; } my $k; my $sth; my $counter= 0; foreach $k ( keys(%section )) { print "$k:\n"; my @programs = @{$section{$k}}; mkdir($targetDir."/".$k); foreach my $p (@programs) { my $sth2 = $dbh->prepare("select programnamn from program wher +e program = '$p'") || print "select programnamn from program where program li +ke '$p%'"; $sth2-> execute || die "could not fetch data"; my $pr; if( $pr = $sth2->fetchrow_array ) { mkdir($targetDir."/".$k."/".$pr); } else { die "$p,$pr\n"; } print "$p,$pr\n"; $sth = $dbh->prepare("select firstname,surname,username,actual +year from $addresser where program = '$p' order by surname,firstname" +); $sth-> execute || die "could not fetch data"; while(my ($firstname, $surname, $username,$year) = $sth->fetch +row_array ) { $username =~ s/\s+$//; $firstname =~ s/\s+$//; $firstname = nc($firstname); $surname =~ s/\s+$//; $year =~ s/\s+$//; while(length($year) > 4){ $enrolementperiod = chop($year); } if($enrolementperiod eq "1") { $year--; } #This need to be year + 1 $year -= 2010; print $year; $year = abs($year); if($year == 0) { next; } mkdir($targetDir."/".$k."/".$pr."/".$year); $surname = nc($surname); my $picture; foreach my $t (@pictureTableList) { my $sth2 = $dbh->prepare("select picturenumber from $t + where username = '$username'")|| print "select picturenumber from $t + where username like '$username%'"; $sth2-> execute || die "could not fetch data"; $picture = $sth2->fetchrow_array; if($picture ne undef) { last; } } $counter++; my $success = 1; # print "$pr,$firstname,$surname,$username\n"; if($picture ne undef || $picture == 0) { $picture =~ s/\s+$//; $success = copy($imageDir."/".$picture.".jpg",$targetD +ir."/".$k."/".$pr."/".$year."/".$surname." ".$firstname.".jpg") || copy($imageDir."/".$picture.".JPG",$targetDir."/".$k." +/".$pr."/".$year."/".$surname." ".$firstname.".jpg") || copy($imageDir."/".$picture.".jpeg",$targetDir."/".$k. +"/".$pr."/".$year."/".$surname." ".$firstname.".jpg") || copy($imageDir."/".$picture.".JPEG",$targetDir."/".$k. +"/".$pr."/".$year."/".$surname." ".$firstname.".jpg"); } if(($success == 0 && $! !~ /exists/) ) { open MISSING, "+>>$targetDir/$k/$pr/$year/missing.txt" + || die "could not open $targetDir/$k/$pr/$year/missing.txt"; print MISSING "$surname $firstname\n"; close MISSING; $success = 0; } } } } controlfile("1",$targetDir."/control.txt"); controlfile("2",$targetDir."/control.txt"); controlfile("3",$targetDir."/control.txt"); controlfile("4",$targetDir."/control.txt"); controlfile("5",$targetDir."/control.txt"); controlfile("6",$targetDir."/control.txt"); controlfile("7",$targetDir."/control.txt"); controlfile("8",$targetDir."/control.txt"); controlfile("9",$targetDir."/control.txt"); controlfile("10",$targetDir."/control.txt"); controlfile("11",$targetDir."/control.txt"); controlfile("12",$targetDir."/control.txt"); sub controlfile { my ($section,$outfile,@junk) = @_; my @results; foreach my $p (@{$section{$section}}) { my $sth2 = $dbh->prepare("select programnamn from program where pr +ogram like '$p%'")|| print "select programnamn from program where pro +gram like '$p%'"; $sth2-> execute || die "could not fetch data"; if( my $pr = $sth2->fetchrow_array ) { $pr =~ s/\s+$//; push(@results,$pr); } } my %seen = (); my @out = (); foreach my $item (@results) { unless ($seen{$item}) { # if we get here, we have not seen it before $seen{$item} = 1; push(@out, $item); } } #my $prev = 'nonesuch'; #my @out = grep($_ ne $prev && ($prev = $_), @results); open FD,">>$outfile"; print FD "$section\t"; foreach my $p (sort(@out)) { print FD "$p\t"; } print FD "\n"; close fd; } print "$counter\n";

Some of the controlfile output.

1 2 3 4 Beteendevetenskap Civilrätt Data- o systemvetenskap, magist +er ämnesdjup(...)

Thank you, Freddie

Replies are listed 'Best First'.
Re: Locale and character encoding-trouble
by kennethk (Abbot) on Oct 16, 2009 at 16:01 UTC
    I think what's going on here is a failure in file creation to recognize you are using wide characters. I could be wrong here, since the characters you list off should be supported by a normal open (code points 0xd6, 0xe5 and 0xc5). Unfortunately, it seems based on some research I've done (with some guidance from tye in the CB) that this is not really trivial to deal with. You need to specifically tell Windows that you are trying to create files w/ wide characters in the names. To do this, you need to use the CreateFileW method from Win32API::File, followed by OsFHandleOpen to retrieve the appropriate file handle. At that point, I believe you should be good to go. There are a good number of PerlMonks nodes on this, so check out site:www.perlmonks.org createfilew. Of the resultant links, I found Re^3: Saving file name with Chinese characters useful.

    As a side note, your open statements aren't doing what you think: when you say

    open MISSING, "+>>$targetDir/$k/$pr/$year/missing.txt" || die "could not open $targetDir/$k/$pr/$year/missing.txt";

    the || binds tighter than the list operator (comma), so you will never test if the open succeeded. It's equivalent to

    open MISSING, ("+>>$targetDir/$k/$pr/$year/missing.txt" || die "could not open $targetDir/$k/$pr/$year/missing.txt");

    which always equivalent to

    open MISSING, "+>>$targetDir/$k/$pr/$year/missing.txt";

    Since "+>>$targetDir/$k/$pr/$year/missing.txt" evaluates to true. You either need to add parens or use the low-precedence or. See Burned by precedence rules and Operator Precedence and Associativity.