igoryonya has asked for the wisdom of the Perl Monks concerning the following question:

I am working as a sysadmin.

Naturally, I have to deal with a lot of files. One of the tasks, often done, is searching and removing duplicate files.
One of the console programs, I've been using often to find duplicates is fdupes. It can find duplicates and then ask which files to leave from found duplicate sets or just output the results on the screen, so you can work with the results your own way.
It's a great program, but it shows duplicate sets in an unordered fasion, without groupping directories, that have several of the duplicate files to other directories. It becomes cumbersome after a while to do it manually, so I've decided to write a point and click interface for it in Tk. On the plus side, is that I've never done gui before, so, I am learning along the way :).
I've got it to the point, where it's usable now, but not finished yet.

So, it takes fdupes output, parses it, analyses and builds a representation of duplicate directory trees.
Some fdupe result files become over 100-200-300Mb, and it takes on different computers 15-20-30min. to parse them.
I've analysed my code to find bottlenecks and optimized the parsing routine to the point, where it now parses such big files 1-4min, but though, the parsing time got cut down significantly, it's still annoying to wait for 4 minutes to load, so, I've decided to cache the parced result. Now, what took to parse 4 minutes, loads from the cache in 20-30 seconds.

On smaller cache files, I didn't encounter the problem, but when fdupe's result file is big, I've noticed a problem with loading from cache. The cache is just hash variables, saved to a file. Wnen the program starts, it 'requires' cache as a library, if it exists and skips parsing the result file then. In that case, some keys appear as references to arrays. I've looked inside of generated cache (library) file, but didn't find any problem.

To troubleshoot this problem, I've decided to test that cache file on a separate script. Here is the script that test opens the cached variables:
#!/usr/bin/perl #Locale settings: no warnings 'layer'; use utf8; use locale; use encoding 'utf8', STDOUT => 'utf8', STDERR => 'utf8'; use POSIX qw(locale_h); setlocale(LC_TYPE, 'ru_RU.UTF-8'); use Encode; #The test code require 'fdupes-gui_chmk-dupes.txt.cache'; my $imported_vars = import_vars(); print "---test_before---\n"; for my $cvar (keys %$imported_vars){ print "$cvar:\n"; for my $ckey (keys %{$imported_vars->{$cvar}}){ print "\t$cvar: $ckey\n"; } } print "---after_test---\n";
Here is the cut down version of the generated cache file to show you an example of a structure.
The complete cache file, where I have a problem:
http://pharmacy.chukotnet.ru/files/fdupes-gui_chmk-dupes.txt.cache.7z
When you run the test prog against the cache file from the above url, some keys, especially, it's noticable on the %folders var, become array references, although, in the cache all the keys are scalars. Somehow, some array references from the value side shift to keys, I guess.
Updated:
So, I shortened an exmple.
no warnings 'layer'; use utf8; use locale; use encoding 'utf8', STDOUT => 'utf8', STDERR => 'utf8'; use POSIX qw(locale_h); setlocale(LC_TYPE, 'ru_RU.UTF-8'); use Encode; my %sameFilesOneDir = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я Серикова &#1 +040;.В/SAS_v120808/cache/map/z18/74/x76538/37/'=>[ 'y38062.png', 'y38061.png' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Buh/Ра&#1089 +;четчик/'=>[ 'Документы & +#1055;У 5_2010.lnk', 'Документы &#105 +5;У 5_2010 (2).lnk' ], '/media/igor/chmk/home/zamutnii/.repo/10.04/amd64/pool/x/xserver-xorg- +video-nouveau/'=>[ 'xserver-xorg-video-nouveau_0.0.15+git20100219+9b4118d-0ubunt.deb' +, 'xserver-xorg-video-nouveau_0.0.15+git20100219+9b4118d-0ubuntu5_amd64. +deb' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я Смирново&#10 +81; Н.Н/от Нико&#1083 +;аенко Т.М/От&# +1076;еление_пед +агогики/050501_П&#108 +8;офессиона&#10 +83;ьное_обуче&# +1085;ие_(по отра&#108 +9;лям)_ГОС/Мет& +#1086;д._материа&#108 +3;ы/Тараненк&#1 +086; РИСУНОК ДЛ +Я 018-03+ Задания/&#1 +047;АДАНИЯ/РЕБ& +#1059;СЫ МЛЕКОП&#1048 +;Т/'=>[ 'РЕБУС 2.jpg', 'РЕБ 2 .jpg' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/E +du/Stamina/Data/'=>[ 'lessons.lt', 'lessons.lv', 'lessons.da' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/u +nsorted/Временно/&#10 +57; диска D/Кар&#1090 +;а памяти 2 ги& +#1075;а для солд&#107 +2;това/Sounds/Ране&#1 +090;ки/ЛеРа/'=>[ 'лера_козло& +#1074;а_-_рядом_2c4f2ec6c8e2.mp3' +, 'лера_козло&#107 +4;а_-_рядом_1309842aff23.mp3' ] ); my %info = ( '93688'=>'26884 bytes each:', '58684'=>'79479 bytes each:' ); my %folders = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я СисАдмин&#10 +72;/recover-priyomnaya/recup_dir.2376/'=>[ 'f3484724920.doc', 'f3484724712.doc' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я Амосовой &#1 +045;.Г/Док/Прог&#1088 +;аммы и КТП В&# +1086;просы/2012-2013/Ти&#10 +90;ульники и л& +#1080;тература/949-05 +/КМ/'=>[ 'Литератур&# +1072;.doc', 'РП КМ (Ф).doc' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я СисАдмин&#10 +72;/recover-priyomnaya/recup_dir.433/'=>[ 'f1793587968.doc', 'f1793889136.doc', 'f1793885184.doc' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/u +nsorted/Временно/&#10 +52;ои докумен&# +1090;ы/Парикмма +хер 2010-2012 уч.год/ +Съемный ди&#108 +9;к (G)/парикма&#1093 +;ер/виктори&#10 +85;а/pic1-6/pic1/'=>[ '2 (3).JPG', '2 (2).JPG' ] ); my %files = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я СисАдмин&#10 +72;/recover-priyomnaya/recup_dir.2036/f3467715168.doc'=>'71514', '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/д&#1083 +;я СисАдмин&#10 +72;/recover-priyomnaya/recup_dir.2356/f3483793848.doc'=>'47380'); my %groups = ( '93688'=>[ '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Docs/ +Галина Пав&#108 +3;овна/Докум&#1 +077;нты/Кузнец& +#1086;ва Г.П/Нова&#11 +03; папка/standard/stddir1/xserver-xorg +-input-all_7.3+19_i386.deb', '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/D +istr_Unix/Repo/Repo_1/pool/main/x/xorg/xserver-xorg-input-all_7.3+19_ +i386.deb' ], '58684'=>[ '/media/igor/chmk/home/zamutnii/.chmsee/bookshelf/99a36a6da9cc659b +be4e7122a92e66d1/8250final/images/ch06fig06_0.jpg', '/media/igor/chmk/m3/zamutnii/.chmsee/bookshelf/99a36a6da9cc659bbe4e71 +22a92e66d1/8250final/images/ch06fig06_0.jpg' ] ); my %oneFileEachDir = ( ); my %foldersWithOneFile = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/deb-repo/ +1/pool/universe/libc/libconfig-mvp-perl/'=>[ 'libconfig-mvp-perl_0.093350-1_all.deb' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/deb-repo/ +6/pool/universe/p/python-tgext.admin/'=>[ 'python-tgext.admin_0.2.6-1_all.deb' ] ); sub import_vars{ return({ 'sameFilesOneDir'=>\%sameFilesOneDir, 'info'=>\%info, 'folders'=>\%folders, 'files'=>\%files, 'groups'=>\%groups, 'oneFileEachDir'=>\%oneFileEachDir, 'foldersWithOneFile'=>\%foldersWithOneFile }); } return(1);

Replies are listed 'Best First'.
Re: problem with hashes, loaded from file
by Anonymous Monk on Dec 24, 2014 at 12:09 UTC
    tldr (perlmonks is *** annoying with it's character mangling) but have you considered using Storable and not using 'encoding' and 'locale'? (i don't see what they do for you).
      What's tldr.
      I was not aware about Storable, have to read on that. Thanx.
        "too long; didn't read"

        Take a look at utf8::all, too