in reply to MP3::Tag encoding problem

According to this source: http://www.id3.org/id3v2.3.0, it seems most likely that your mp3 tag strings are iso-8859-1. To get them to appear properly in a text window, it depends on the nature of the text window.

Try this little experiment on the command line in the same window where you want to see the tag text displayed correctly:

perl -le 'print "\xa1"'
If you see an inverted exclamation mark, your terminal window works with iso-8859-1. If you get a question mark instead, try this next:
perl -CS -le 'print "\xa1"'
If you now see the inverted exclamation mark, you now know that your terminal wants utf8.

For an 8859-based display, perl should probably do nothing to the tag text before printing it. But I doubt this is the situation, because I don't think you would have been seeing "?" in your tag text if this were the case.

For a utf8-based display, it should sufficed to do  binmode STDOUT, ":utf8"; which will automatically (and quietly) "upgrade" the 8859-1 text to utf8 when printing to STDOUT.

If you are storing the tag text to a file, and are seeing question marks when looking at the file contents, it's the same basic issue. Use binmode on that file handle instead of STDOUT.

Replies are listed 'Best First'.
Re^2: MP3::Tag encoding problem
by mfearby (Initiate) on Sep 22, 2008 at 11:16 UTC

    Your knowledge of unicode is impressive, and I wish to subscribe to your newsletter :-)

    I saw the upside-down question mark for the second command, confirming that my terminal (Konsole) wants utf8. It also seems that specifying the binmode and commenting-out my misguided attempts at trying to solve the problem now means I'm seeing umlauts above my o's.

    Thank you very much!

      I've had a similar problem and for me the solution was to identify binary/character data and handle it appropriately.
      my $mp3 = MP3::Tag->new($t);
      The module returns a character string (Dec '08); so that's OK. Make sure any applications writing the tags encode properly. For linux, Easytag seems to work very well.
      my $tag_dir = "/music/$a_artist/$a_name";
      What you want as a human but no good for mkpath() et al
      my $binary_tag_dir = encode_utf8($tag_dir);
      ah, now this is mkpath()-able

      Now, File::Find (properly) returns a binary string so needs decoding. Assuming your filesystem uses utf8 encoding:

      my $char_file_find_dir = decode("utf8",$File::Find::dir);
      At this point you can print and compare $char_file_find_dir and $tag_dir.

      You can also compare and do filename tests etc with $binary_tag_dir and $File::Find::dir.

      When printing (including debugging) I had:

      binmode STDOUT, ":utf8";
      This tells perl that my terminal is utf8 aware and to print accented characters appropriately.

      You should also encode() the binary strings before printing them if you want to read them (or not if you want to 'od' them)

      HTH

      Corretcions welcome ;)

      Not sure if you have an answer to your question. I've run into the same situation with the 'black/inverse' question mark. I'm no perl guru, as you will see in the post: http://perlguru.com/gforum.cgi?post=34634 There you will see where I use MP3::Tag to write versions 1 and 2 mp3 tags. If you comment-out the id3v1 calls (three lines), tags for Artist, Albums and Tracks appear correctly in the tags. My issue was with items like Queensr˙che, Mötley Crüe, etc.
        Disregard my earlier post. I've come to the conclusion that MP3::Tags just isn't writing UTF8 encoded data the way other programs are expecting it. I've used the following to write:
        $Artist = encode_utf8($Artist); $Album = encode_utf8($Album);
        and the following to read:
        print "* Song: ", decode_utf8($info[0]), "\n"; print "* Artist: ", decode_utf8($info[2]), "\n"; print "* Album: ", decode_utf8($info[3]), "\n";
        The data will print correctly to STDOUT, but other programs that are reading the data, show the extended characters incorrectly. If I use easytag to manipulate the same tags after the write process above, the data is incorrect in easytag. I fix it inside of easytag and other programs like Mythmusic show the tags correctly. Also, post easytag, I remove the decode UTF8 instructions in my script and just print the data to STDOUT, the tags print correctly. The problem has to be with MP3::Tags. Here is my code. You will see that I am also writing FLAC tags when the fils has a flac extension, The same data that gives me grief for MP3 works just fine when I write to a FLAC file. I've commented out my UTF8 attempts in the MP3 code as it did not yield any difference to other applications.
        #!/us/bin/perl use strict; use warnings; use Cwd; use Audio::FLAC::Header; use MP3::Info; use MP3::Tag; use sort '_qsort'; use File::Glob qw(:globally :nocase); #use Encode qw(encode decode); #use utf8; use Encode; our $DEBUG = 0; sub Get_Artist_Album { return ( split m!/!, cwd() )[-2,-1]; } sub Process_Files { my @MP3_Files = <*.mp3> ; my @FLAC_Files = <*.flac> ; my $Num_MP3_Files = @MP3_Files; my $Num_FLAC_Files = @FLAC_Files; my ($Artist, $Album) = Get_Artist_Album; if ($Num_MP3_Files > 0) { # $Artist = encode_utf8($Artist); # $Album = encode_utf8($Album); foreach my $MP3_File (@MP3_Files) { chomp $MP3_File; if ($MP3_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = Format_Text ($Title); # $Title = encode_utf8($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($MP3_File, $New_File) unless $MP3_File eq $New_Fil +e; remove_mp3tag($New_File,'ALL'); my $mp3 = MP3::Tag->new($New_File); # my $id3v1 = $mp3->new_tag("ID3v1"); # $id3v1->all($Title,$Artist,$Album,"","",$Track,"Rock"); # $id3v1->write_tag; my $id3v2 = $mp3->new_tag("ID3v2"); $id3v2->add_frame('TRCK',$Track); $id3v2->add_frame('TIT2',$Title); $id3v2->add_frame('TPE1',$Artist); $id3v2->add_frame('TALB',$Album); $id3v2->add_frame('TCON',"17"); $id3v2->write_tag; } } } if ($Num_FLAC_Files > 0) { foreach my $FLAC_File (@FLAC_Files) { chomp $FLAC_File; if ($FLAC_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = Format_Text ($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($FLAC_File, $New_File) unless $FLAC_File eq $New_Fil +e; my $flac = Audio::FLAC::Header->new($New_File); my $tags = $flac->tags(); %$tags = (); $tags->{TRACKNUMBER} = $Track; $tags->{TITLE} = $Title; $tags->{ARTIST} = $Artist; $tags->{ALBUM} = $Album; $tags->{GENRE} = "Rock"; $flac->write(); } } } } sub Format_Text { my $Text = $_[0] or exit 1; $Text = lc($Text); #Make everything lowercase $Text =~ tr/_/ /; #Remove underscores $Text =~ s/\[/\(/g; $Text =~ s/\]/\)/g; $Text =~ tr/ / /s; #Remove unnecessary spaces $Text =~ s/\.$//; #Some titles have an extra period - bye $Text =~ s/(\d)\./$1/g; #Do not need period after numbers here my @Words = split(/ /,$Text); foreach my $Word (@Words) { $Word = ucfirst($Word); } $Text = "@Words"; $Text =~ s/([(-])([a-z])/$1\u$2/g; $Text =~ s/(\W'\S)/uc($1)/eg; #Some items following ' should b +e uc $Text =~ s/(\.)([a-z])/$1\u$2/g; #Letter.Letter.Letter... is u +c $Text =~ s/Dis[ck]\ /Cd/; $Text =~ s/Dis[ck](\d)/Cd$1/; $Text =~ s/Cd\ (\d)/Cd$1/; $Text =~ s/\((Cd\d+)\)/$1/; my $x = $Text =~ tr/(/(/; #Count open parens my $y = $Text =~ tr/)/)/; #Count closing parens if ($x > $y) { $Text = $Text.")"; } return ($Text); } Process_Files; my $Artist_Dir = cwd(); opendir (Artist_DH, $Artist_Dir) || die "can't opendir $Artist_Dir: $! +"; my @Albums = grep { !/^\./ && -d "$_" } sort readdir(Artist_DH); foreach my $Album (@Albums) { my $NewAlbum = Format_Text ($Album); rename ($Album, $NewAlbum) unless $Album eq $NewAlbum; if ($DEBUG) { print "$NewAlbum \n"; } chdir $NewAlbum or warn "Cannot change to $NewAlbum\n"; Process_Files; chdir ".."; } closedir Artist_DH;
        I just wanted to update the last post I sent a few weeks ago. The following seems to allow MP3::Tag to update tags with UTF-8 strings. Programs like easytag, mythmusic, etc seem to be handling the data correctly or at least as expected. I have other programs that share code I've used below. This particular script is run inside the album or artist directories. My preferred structure in the end is Artist/Album/Track. Title. My platform is Linux (OpenSUSE 11.1 as of this writing).
        #!/usr/bin/perl use strict; use warnings; use Cwd; use Audio::FLAC::Header; use MP3::Info; use MP3::Tag; use sort '_qsort'; use File::Glob qw(:globally :nocase); use Encode qw(encode decode); our $DEBUG = 0; sub Get_Artist_Album { return ( split m!/!, cwd() )[-2,-1]; } sub Process_Files { my @MP3_Files = <*.mp3> ; my @FLAC_Files = <*.flac> ; my $Num_MP3_Files = @MP3_Files; my $Num_FLAC_Files = @FLAC_Files; if ($Num_MP3_Files > 0) { my ($Artist, $Album) = Get_Artist_Album; $Artist = decode('UTF-8',$Artist); $Album = decode('UTF-8',$Album); foreach my $MP3_File (@MP3_Files) { chomp $MP3_File; if ($MP3_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = decode('UTF-8',$Title); $Title = Format_Text ($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($MP3_File, $New_File) unless $MP3_File eq $New +_File; remove_mp3tag($New_File, 'ALL'); my $mp3 = MP3::Tag->new($New_File); my $id3v1 = $mp3->new_tag("ID3v1"); $id3v1->all($Title,$Artist,$Album,"","",$Track,"Rock") +; $id3v1->write_tag; my $id3v2 = $mp3->new_tag("ID3v2"); $id3v2->add_frame('TRCK',$Track); $id3v2->add_frame('TIT2',$Title); $id3v2->add_frame('TPE1',$Artist); $id3v2->add_frame('TALB',$Album); $id3v2->add_frame('TCON',"17"); $id3v2->write_tag; } } } if ($Num_FLAC_Files > 0) { my ($Artist, $Album) = Get_Artist_Album; foreach my $FLAC_File (@FLAC_Files) { chomp $FLAC_File; if ($FLAC_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = Format_Text ($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($FLAC_File, $New_File) unless $FLAC_File eq $N +ew_File; my $flac = Audio::FLAC::Header->new($New_File); my $tags = $flac->tags(); %$tags = (); $tags->{TRACKNUMBER} = $Track; $tags->{TITLE} = $Title; $tags->{ARTIST} = $Artist; $tags->{ALBUM} = $Album; $tags->{GENRE} = "Rock"; $flac->write(); } } } } sub Format_Text { my $Text = $_[0] or exit 1; $Text = lc($Text); #Make everything lowercase $Text =~ tr/_/ /; #Remove underscores $Text =~ s/\.\.\./\.\.\.\ /g; $Text =~ s/(\d),(\d)/$1$2/g; $Text =~ s/,/ /g; $Text =~ tr/\`\´\’/\'/s; $Text =~ s/ and / \& /g; $Text =~ s/\[/\(/g; $Text =~ s/\]/\)/g; $Text =~ tr/ / /s; #Remove unnecessary spaces $Text =~ s/\( /\(/g; $Text =~ s/ \)/\)/g; $Text =~ s/\·/-/g; $Text =~ s/\s*-\s*/-/g; # $Text =~ s/\.$//; #Some titles have an extra period - bye $Text =~ s/(\d)\./$1/g; #Do not need period after numbers here my @Words = split(/ /,$Text); foreach my $Word (@Words) { $Word = ucfirst($Word); } $Text = "@Words"; $Text =~ s/([(-])([a-z])/$1\u$2/g; $Text =~ s/(\W'\S)/uc($1)/eg; #Some items following ' should be uc $Text =~ s/(\.)([a-z])/$1\u$2/g; #Letter.Letter.Letter... is uc $Text =~ s/Dis[ck]\ /Cd/; $Text =~ s/Dis[ck](\d)/Cd$1/; $Text =~ s/Cd\ (\d)/Cd$1/; $Text =~ s/\((Cd\d+)\)/$1/; $Text =~ s/-Cd/ Cd/; my $x = $Text =~ tr/(/(/; #Count open parens my $y = $Text =~ tr/)/)/; #Count closing parens if ($x > $y) { $Text = $Text.")"; } return ($Text); } Process_Files; my $Artist_Dir = cwd(); opendir (Artist_DH, $Artist_Dir) || die "can't opendir $Artist_Dir: $! +"; my @Albums = grep { !/^\./ && -d "$_" } sort readdir(Artist_DH); foreach my $Album (@Albums) { my $NewAlbum = Format_Text ($Album); rename ($Album, $NewAlbum) unless $Album eq $NewAlbum; if ($DEBUG) { print "$NewAlbum \n"; } chdir $NewAlbum or warn "Cannot change to $NewAlbum\n"; Process_Files; chdir ".."; } closedir Artist_DH;