in reply to Re: MP3::Tag encoding problem
in thread MP3::Tag encoding problem

Your knowledge of unicode is impressive, and I wish to subscribe to your newsletter :-)

I saw the upside-down question mark for the second command, confirming that my terminal (Konsole) wants utf8. It also seems that specifying the binmode and commenting-out my misguided attempts at trying to solve the problem now means I'm seeing umlauts above my o's.

Thank you very much!

Replies are listed 'Best First'.
Re^3: MP3::Tag encoding problem
by jwhit61 (Initiate) on Dec 28, 2008 at 21:50 UTC
    Not sure if you have an answer to your question. I've run into the same situation with the 'black/inverse' question mark. I'm no perl guru, as you will see in the post: http://perlguru.com/gforum.cgi?post=34634 There you will see where I use MP3::Tag to write versions 1 and 2 mp3 tags. If you comment-out the id3v1 calls (three lines), tags for Artist, Albums and Tracks appear correctly in the tags. My issue was with items like Queensr˙che, Mötley Crüe, etc.
      Disregard my earlier post. I've come to the conclusion that MP3::Tags just isn't writing UTF8 encoded data the way other programs are expecting it. I've used the following to write:
      $Artist = encode_utf8($Artist); $Album = encode_utf8($Album);
      and the following to read:
      print "* Song: ", decode_utf8($info[0]), "\n"; print "* Artist: ", decode_utf8($info[2]), "\n"; print "* Album: ", decode_utf8($info[3]), "\n";
      The data will print correctly to STDOUT, but other programs that are reading the data, show the extended characters incorrectly. If I use easytag to manipulate the same tags after the write process above, the data is incorrect in easytag. I fix it inside of easytag and other programs like Mythmusic show the tags correctly. Also, post easytag, I remove the decode UTF8 instructions in my script and just print the data to STDOUT, the tags print correctly. The problem has to be with MP3::Tags. Here is my code. You will see that I am also writing FLAC tags when the fils has a flac extension, The same data that gives me grief for MP3 works just fine when I write to a FLAC file. I've commented out my UTF8 attempts in the MP3 code as it did not yield any difference to other applications.
      #!/us/bin/perl use strict; use warnings; use Cwd; use Audio::FLAC::Header; use MP3::Info; use MP3::Tag; use sort '_qsort'; use File::Glob qw(:globally :nocase); #use Encode qw(encode decode); #use utf8; use Encode; our $DEBUG = 0; sub Get_Artist_Album { return ( split m!/!, cwd() )[-2,-1]; } sub Process_Files { my @MP3_Files = <*.mp3> ; my @FLAC_Files = <*.flac> ; my $Num_MP3_Files = @MP3_Files; my $Num_FLAC_Files = @FLAC_Files; my ($Artist, $Album) = Get_Artist_Album; if ($Num_MP3_Files > 0) { # $Artist = encode_utf8($Artist); # $Album = encode_utf8($Album); foreach my $MP3_File (@MP3_Files) { chomp $MP3_File; if ($MP3_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = Format_Text ($Title); # $Title = encode_utf8($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($MP3_File, $New_File) unless $MP3_File eq $New_Fil +e; remove_mp3tag($New_File,'ALL'); my $mp3 = MP3::Tag->new($New_File); # my $id3v1 = $mp3->new_tag("ID3v1"); # $id3v1->all($Title,$Artist,$Album,"","",$Track,"Rock"); # $id3v1->write_tag; my $id3v2 = $mp3->new_tag("ID3v2"); $id3v2->add_frame('TRCK',$Track); $id3v2->add_frame('TIT2',$Title); $id3v2->add_frame('TPE1',$Artist); $id3v2->add_frame('TALB',$Album); $id3v2->add_frame('TCON',"17"); $id3v2->write_tag; } } } if ($Num_FLAC_Files > 0) { foreach my $FLAC_File (@FLAC_Files) { chomp $FLAC_File; if ($FLAC_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = Format_Text ($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($FLAC_File, $New_File) unless $FLAC_File eq $New_Fil +e; my $flac = Audio::FLAC::Header->new($New_File); my $tags = $flac->tags(); %$tags = (); $tags->{TRACKNUMBER} = $Track; $tags->{TITLE} = $Title; $tags->{ARTIST} = $Artist; $tags->{ALBUM} = $Album; $tags->{GENRE} = "Rock"; $flac->write(); } } } } sub Format_Text { my $Text = $_[0] or exit 1; $Text = lc($Text); #Make everything lowercase $Text =~ tr/_/ /; #Remove underscores $Text =~ s/\[/\(/g; $Text =~ s/\]/\)/g; $Text =~ tr/ / /s; #Remove unnecessary spaces $Text =~ s/\.$//; #Some titles have an extra period - bye $Text =~ s/(\d)\./$1/g; #Do not need period after numbers here my @Words = split(/ /,$Text); foreach my $Word (@Words) { $Word = ucfirst($Word); } $Text = "@Words"; $Text =~ s/([(-])([a-z])/$1\u$2/g; $Text =~ s/(\W'\S)/uc($1)/eg; #Some items following ' should b +e uc $Text =~ s/(\.)([a-z])/$1\u$2/g; #Letter.Letter.Letter... is u +c $Text =~ s/Dis[ck]\ /Cd/; $Text =~ s/Dis[ck](\d)/Cd$1/; $Text =~ s/Cd\ (\d)/Cd$1/; $Text =~ s/\((Cd\d+)\)/$1/; my $x = $Text =~ tr/(/(/; #Count open parens my $y = $Text =~ tr/)/)/; #Count closing parens if ($x > $y) { $Text = $Text.")"; } return ($Text); } Process_Files; my $Artist_Dir = cwd(); opendir (Artist_DH, $Artist_Dir) || die "can't opendir $Artist_Dir: $! +"; my @Albums = grep { !/^\./ && -d "$_" } sort readdir(Artist_DH); foreach my $Album (@Albums) { my $NewAlbum = Format_Text ($Album); rename ($Album, $NewAlbum) unless $Album eq $NewAlbum; if ($DEBUG) { print "$NewAlbum \n"; } chdir $NewAlbum or warn "Cannot change to $NewAlbum\n"; Process_Files; chdir ".."; } closedir Artist_DH;
      I just wanted to update the last post I sent a few weeks ago. The following seems to allow MP3::Tag to update tags with UTF-8 strings. Programs like easytag, mythmusic, etc seem to be handling the data correctly or at least as expected. I have other programs that share code I've used below. This particular script is run inside the album or artist directories. My preferred structure in the end is Artist/Album/Track. Title. My platform is Linux (OpenSUSE 11.1 as of this writing).
      #!/usr/bin/perl use strict; use warnings; use Cwd; use Audio::FLAC::Header; use MP3::Info; use MP3::Tag; use sort '_qsort'; use File::Glob qw(:globally :nocase); use Encode qw(encode decode); our $DEBUG = 0; sub Get_Artist_Album { return ( split m!/!, cwd() )[-2,-1]; } sub Process_Files { my @MP3_Files = <*.mp3> ; my @FLAC_Files = <*.flac> ; my $Num_MP3_Files = @MP3_Files; my $Num_FLAC_Files = @FLAC_Files; if ($Num_MP3_Files > 0) { my ($Artist, $Album) = Get_Artist_Album; $Artist = decode('UTF-8',$Artist); $Album = decode('UTF-8',$Album); foreach my $MP3_File (@MP3_Files) { chomp $MP3_File; if ($MP3_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = decode('UTF-8',$Title); $Title = Format_Text ($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($MP3_File, $New_File) unless $MP3_File eq $New +_File; remove_mp3tag($New_File, 'ALL'); my $mp3 = MP3::Tag->new($New_File); my $id3v1 = $mp3->new_tag("ID3v1"); $id3v1->all($Title,$Artist,$Album,"","",$Track,"Rock") +; $id3v1->write_tag; my $id3v2 = $mp3->new_tag("ID3v2"); $id3v2->add_frame('TRCK',$Track); $id3v2->add_frame('TIT2',$Title); $id3v2->add_frame('TPE1',$Artist); $id3v2->add_frame('TALB',$Album); $id3v2->add_frame('TCON',"17"); $id3v2->write_tag; } } } if ($Num_FLAC_Files > 0) { my ($Artist, $Album) = Get_Artist_Album; foreach my $FLAC_File (@FLAC_Files) { chomp $FLAC_File; if ($FLAC_File =~ /^(\d+[\.\_\-\ ]+)(.*)(\.\w+)$/) { my ($Track, $Title, $Ext) = ($1, $2, lc($3)); $Track =~ s/^(\d+)(.*)/$1/; $Track = sprintf("%02d", $Track); $Title = Format_Text ($Title); my $New_File = "$Track. $Title$Ext"; if ($DEBUG) { print "\t$New_File\n"; } rename ($FLAC_File, $New_File) unless $FLAC_File eq $N +ew_File; my $flac = Audio::FLAC::Header->new($New_File); my $tags = $flac->tags(); %$tags = (); $tags->{TRACKNUMBER} = $Track; $tags->{TITLE} = $Title; $tags->{ARTIST} = $Artist; $tags->{ALBUM} = $Album; $tags->{GENRE} = "Rock"; $flac->write(); } } } } sub Format_Text { my $Text = $_[0] or exit 1; $Text = lc($Text); #Make everything lowercase $Text =~ tr/_/ /; #Remove underscores $Text =~ s/\.\.\./\.\.\.\ /g; $Text =~ s/(\d),(\d)/$1$2/g; $Text =~ s/,/ /g; $Text =~ tr/\`\´\’/\'/s; $Text =~ s/ and / \& /g; $Text =~ s/\[/\(/g; $Text =~ s/\]/\)/g; $Text =~ tr/ / /s; #Remove unnecessary spaces $Text =~ s/\( /\(/g; $Text =~ s/ \)/\)/g; $Text =~ s/\·/-/g; $Text =~ s/\s*-\s*/-/g; # $Text =~ s/\.$//; #Some titles have an extra period - bye $Text =~ s/(\d)\./$1/g; #Do not need period after numbers here my @Words = split(/ /,$Text); foreach my $Word (@Words) { $Word = ucfirst($Word); } $Text = "@Words"; $Text =~ s/([(-])([a-z])/$1\u$2/g; $Text =~ s/(\W'\S)/uc($1)/eg; #Some items following ' should be uc $Text =~ s/(\.)([a-z])/$1\u$2/g; #Letter.Letter.Letter... is uc $Text =~ s/Dis[ck]\ /Cd/; $Text =~ s/Dis[ck](\d)/Cd$1/; $Text =~ s/Cd\ (\d)/Cd$1/; $Text =~ s/\((Cd\d+)\)/$1/; $Text =~ s/-Cd/ Cd/; my $x = $Text =~ tr/(/(/; #Count open parens my $y = $Text =~ tr/)/)/; #Count closing parens if ($x > $y) { $Text = $Text.")"; } return ($Text); } Process_Files; my $Artist_Dir = cwd(); opendir (Artist_DH, $Artist_Dir) || die "can't opendir $Artist_Dir: $! +"; my @Albums = grep { !/^\./ && -d "$_" } sort readdir(Artist_DH); foreach my $Album (@Albums) { my $NewAlbum = Format_Text ($Album); rename ($Album, $NewAlbum) unless $Album eq $NewAlbum; if ($DEBUG) { print "$NewAlbum \n"; } chdir $NewAlbum or warn "Cannot change to $NewAlbum\n"; Process_Files; chdir ".."; } closedir Artist_DH;
Re^3: MP3::Tag encoding problem
by lbt (Initiate) on Jan 01, 2009 at 11:41 UTC
    I've had a similar problem and for me the solution was to identify binary/character data and handle it appropriately.
    my $mp3 = MP3::Tag->new($t);
    The module returns a character string (Dec '08); so that's OK. Make sure any applications writing the tags encode properly. For linux, Easytag seems to work very well.
    my $tag_dir = "/music/$a_artist/$a_name";
    What you want as a human but no good for mkpath() et al
    my $binary_tag_dir = encode_utf8($tag_dir);
    ah, now this is mkpath()-able

    Now, File::Find (properly) returns a binary string so needs decoding. Assuming your filesystem uses utf8 encoding:

    my $char_file_find_dir = decode("utf8",$File::Find::dir);
    At this point you can print and compare $char_file_find_dir and $tag_dir.

    You can also compare and do filename tests etc with $binary_tag_dir and $File::Find::dir.

    When printing (including debugging) I had:

    binmode STDOUT, ":utf8";
    This tells perl that my terminal is utf8 aware and to print accented characters appropriately.

    You should also encode() the binary strings before printing them if you want to read them (or not if you want to 'od' them)

    HTH

    Corretcions welcome ;)