Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Been busting my head on this for two days now, So I seek some wisdom of others.

I have read everything I can find and have yet been able to solve the issue, it seems like others have not either or at least never responded with the solution.

I've tried various code pages, encode ... I'm forgetting what all I've tried.

FBSD7.1R
Perl 5.8
TagLib 1.5

The Error:

Malformed UTF-8 character (unexpected end of string) in subroutine ent +ry at /scripts/audio/audio line 90. Wide character in print at /scripts/audio/audio line 63.

Comes from strings with extended characters (ie: é).
In the code:   %{$hSong} = (artist      => $m->tag()->artist()->toCString()
In the sub HashMPEG (line 90)

The odd thing to me is the '$filename' prints correctly and it contains the same characters.

Is this an issue with TagLib???
Is there a solution???
Ideas???
Comments???

Thanks

-Enjoy
fh : )_~

The code...
#!/usr/bin/perl ###################################################################### # # BCE Project 20c # # Search for *.mp3, look up each mp3's tag info and display it. # # print structured info # use warnings; use strict; use Getopt::Std; use Audio::TagLib; my %Popts; # Program Options use Data::Dumper; #!REM!PROD! debug $Data::Dumper::Indent = 1; #!REM!PROD! debug ###################################################################### sub ShowUsage { print "\n $Popts{progname} v$Popts{progvers +ion}\nUSAGE:\n"; print " ", $1, "\n" if ($0 =~ m%^.*/(.*)$%); print " OPTIONS: [-f] \"Filename\" Filename, ('*.mp3' see 'pattern' in FIND(1)) +. [-h] Show Help [-i] Interactive [-s] Show Settings \n\n"; } sub ShowSettings { print "\n$Popts{progname} v$Popts{progversion} Current Settings:\n"; print " -f: ", $Popts{directory}, '/', $Popts{filename}, "\n"; print " -i: ", $Popts{i} ? "Interactive\n" : "Commandline\n"; print "\n"; } sub Setup { getopts ("f:his", \%Popts); $Popts{progname} = "BCE #20c"; # Duh! # BCE = Brain Cell Exerc +ise $Popts{progversion} = "0.0.1"; # Duh! $Popts{f} = './*' if ( ! defined($Popts{f}) ); if ($Popts{f} =~ m,(.*\/|^)(.*?)(?:[\.]|$)([^\.\s]*$),) { $Popts{directory} = $1; $Popts{filename} = $2 . ($3 ? '.' . $3 : '.mp3'); } $Popts{directory} =~ s/\/$// if ( length($Popts{directory}) > 1 ) +; } ###################################################################### sub PrintSong { my ($hSong) = @_; print "Artist : ", $hSong->{artist}, "\n"; print "Album : ", $hSong->{album}, "\n"; print "Title : ", $hSong->{title}, "\n"; print "Track : ", $hSong->{track}, "\n"; print "Genre : ", $hSong->{genre}, "\n"; print "Year : ", $hSong->{year}, "\n"; printf "Length : %d:%02d\n", $hSong->{length} / 60, $hSong->{length} % 60; print "Bit rate : ", $hSong->{bitrate}, "\n"; print "Sample rate : ", $hSong->{samplerate}, "\n"; print "Channels : ", $hSong->{channels}, "\n"; print "Comment : ", $hSong->{comment}, "\n"; print "MPEG Version : ", $hSong->{version}, "\n"; print "Layer Version: ", $hSong->{layer}, "\n"; # print "Protected : ", $hSong->{protection}, "\n"; print "Channel mode : ", $hSong->{channelmode}, "\n"; print "Copyrighted : ", $hSong->{copyrighted} ? 'True' : 'False', " +\n"; print "Original : ", $hSong->{original} ? 'True' : 'False', "\n" +; print "File name : ", $hSong->{filename}, "\n"; } sub HashMPEG { my ($filename, $hSong) = @_; my $m = Audio::TagLib::MPEG::File->new($filename, "Accurate"); %{$hSong} = (artist => $m->tag()->artist()->toCString(), album => $m->tag()->album()->toCString(), title => $m->tag()->title()->toCString(), filename => $filename, comment => $m->tag()->comment()->toCString(), genre => $m->tag()->genre()->toCString(), year => $m->tag()->year(), track => $m->tag()->track(), length => $m->audioProperties->length(), bitrate => $m->audioProperties->bitrate(), samplerate => $m->audioProperties->sampleRate(), channels => $m->audioProperties->channels(), version => $m->audioProperties->version(), layer => $m->audioProperties->layer(), # Not found??? protection => $m->audioProperties->prote +ctionEnabled(), channelmode => $m->audioProperties->channelMode(), copyrighted => $m->audioProperties->isCopyrighted(), original => $m->audioProperties->isOriginal(), ); # print Dumper($hSong); } sub FindFiles { my ($dir, $filename) = @_; my @alist = `find '$dir' -type f -iname '$filename'`; return @alist; } Setup; # Process command line, verify input, set defaults. (ShowUsage()), exit(0) if $Popts{h}; (ShowSettings()), exit(0) if $Popts{s}; if ( ! $Popts{directory} eq '.') { (print $Popts{directory}, " does not exists!\n"), exit(127) if (! -d + $Popts{directory}); } #if ($Popts{i}) { # if interactive my $Currec = 0; my $Listem = 1; my @Flist = FindFiles $Popts{directory}, $Popts{filename}; my $NumItems = scalar @Flist; if ( ! $NumItems > 0 ) { print "Not found\n"; exit(255); } while (1) { my $Input; if ($Listem) { my %hSong; my $file = $Flist[$Currec]; chomp $file; HashMPEG $file, \%hSong; PrintSong \%hSong; $Listem = 0; } my $curr = $Currec + 1; print '(' . $curr . '/' . $NumItems . ') ... Q)uit, N)ext, P)revio +us: '; $Input = <STDIN>; chomp $Input; # q to quit last if $Input =~ m/^q/i; # p for Previous if ( $Input =~ m/^p/i ) { if ($Currec > 0) { $Currec--; $Listem = 1; } next; } # n for Next if ( $Input =~ m/^n/i ) { if ($Currec < $NumItems - 1) { $Currec++; $Listem = 1; } next; } # Enter for Next if ( $Input =~ m/^$/i ) { if ($Currec < $NumItems - 1) { $Currec++; $Listem = 1; } next; } } #}

Replies are listed 'Best First'.
Re: Malformed UTF-8 character, TagLib
by bichonfrise74 (Vicar) on Nov 09, 2009 at 18:16 UTC
    Hmm, try to add this. See if it helps you.
    use warnings; use strict; use bytes; --> this one.
Re: Malformed UTF-8 character, TagLib
by Anonymous Monk on Nov 09, 2009 at 19:50 UTC

    Another hint maybe ... Maybe a system problem??

    I just ran this script under KDE/Konsole for the first time, Typically run from a tcsh shell...

    Inside the KDE/Konsole, Artist names show correctly, however the filenames do not ... In the shell it is the other way around, Artist is incorrect, filename is correct.

    The error is still there though.

    Also, Copying the files from a M$lop XPlode box to the FBSD boxes cause the filenames to get wacked as well...

    These files do reside on a XPlode box accessed via samba, I would love to move them to the FBSD server however the paths/filenames get all messed up.

    bichonfrise74: The 'use bytes;' suggestion did nadda, Thanks for trying!

    -Enjoy
    fh : )_~

Re: Malformed UTF-8 character, TagLib
by ikegami (Patriarch) on Nov 09, 2009 at 20:15 UTC

    Malformed UTF-8 character (unexpected end of string) in subroutine entry

    At a fundamental level, files can only contain bytes. That means the characters need to be encoded into bytes to exist in a tag.

    It looks like the module expects the text of tag to have been encoded using UTF-8, but a different encoding was actually used.

Re: Malformed UTF-8 character, TagLib
by Anonymous Monk on Nov 11, 2009 at 16:51 UTC

    Would this help resolve the issue???
    And bring this issue to a working solution.

    The 'print' on line 224 fails, however, the 'print' on line 225 does not, They are the same characters!

    The characters show correctly in this post, they are from the Latin1 (8859-1) code page.

    Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(audio:8): my $MyDEBUG = 1; DB<1> c 222 main::HashMPEG(audio:222): my $t = $m->tag()->artist()->toCString +(); DB<2> n main::HashMPEG(audio:224): print $t . "\n"; DB<2> p $t Wide character in print at (eval 11)[/usr/local/lib/perl5/5.8.8/perl5d +b.pl:628] line 2. at (eval 11)[/usr/local/lib/perl5/5.8.8/perl5db.pl:628] line 2 eval '($@, $!, $^E, $,, $/, $\\, $^W) = @saved;package main; $^D = + $^D | $DB::db_stop; print {$DB::OUT} $t; ;' called at /usr/local/lib/perl5/5.8.8/perl5db.pl line 628 DB::eval called at /usr/local/lib/perl5/5.8.8/perl5db.pl line 3410 DB::DB called at audio line 224 main::HashMPEG('/TestMusic/Blue \x{99}yster Cult/Don\'t Fear the R +e...', 'HASH(0x845ea50)') called at audio line 292 Blue Öyster Cult DB<3> n Wide character in print at audio line 224. at audio line 224 main::HashMPEG('/TestMusic/Blue \x{99}yster Cult/Don\'t Fear the R +e...', 'HASH(0x845ea50)') called at audio line 292 Blue Öyster Cult main::HashMPEG(audio:225): print $filename . "\n"; DB<3> q
Re: Malformed UTF-8 character, TagLib
by ikegami (Patriarch) on Nov 11, 2009 at 18:35 UTC

    "Wide character in print" means you printed characters to a file handle (STDOUT) without converting them to bytes (or telling Perl how) first.

    One simple way of telling Perl how to decode STDIN and encode STDOUT and STDERR is:

    use open ':std', ':locale';

      Thanks, Tried that, Using 'locale' causes the following error.

      ascii "\x99" does not map to Unicode at /scripts/audio/audio line 254. TagLib: Could not open file /TestMusic/Blue \x99yster Cult/Don't Fear +the Reaper- The Best of Blue \x99yster Cult/06 - (Don't Fear) The Rea +per.mp3 /TestMusic/Blue \x99yster Cult/Don't Fear the Reaper- The Best of Blue + \x99yster Cult/06 - (Don't Fear) The Reaper.mp3 Can't call method "length" on an undefined value at /scripts/audio/aud +io line 227.

      Using 'utf8', Gets rid of the error, however it still does not properly display the Artists or Album names ... The $filename is still correct.

      -Enjoy
      fh : )_~

        however it still does not properly display the Artists or Album names

        We've already covered that. The library expects them to be UTF-8 encoded and they're not.