igoryonya has asked for the wisdom of the Perl Monks concerning the following question:
Naturally, I have to deal with a lot of files. One of the tasks, often done, is searching and removing duplicate files.
One of the console programs, I've been using often to find duplicates is fdupes. It can find duplicates and then ask which files to leave from found duplicate sets or just output the results on the screen, so you can work with the results your own way.
It's a great program, but it shows duplicate sets in an unordered fasion, without groupping directories, that have several of the duplicate files to other directories. It becomes cumbersome after a while to do it manually, so I've decided to write a point and click interface for it in Tk. On the plus side, is that I've never done gui before, so, I am learning along the way :).
I've got it to the point, where it's usable now, but not finished yet.
So, it takes fdupes output, parses it, analyses and builds a representation of duplicate directory trees.
Some fdupe result files become over 100-200-300Mb, and it takes on different computers 15-20-30min. to parse them.
I've analysed my code to find bottlenecks and optimized the parsing routine to the point, where it now parses such big files 1-4min, but though, the parsing time got cut down significantly, it's still annoying to wait for 4 minutes to load, so, I've decided to cache the parced result. Now, what took to parse 4 minutes, loads from the cache in 20-30 seconds.
On smaller cache files, I didn't encounter the problem, but when fdupe's result file is big, I've noticed a problem with loading from cache. The cache is just hash variables, saved to a file. Wnen the program starts, it 'requires' cache as a library, if it exists and skips parsing the result file then. In that case, some keys appear as references to arrays. I've looked inside of generated cache (library) file, but didn't find any problem.
To troubleshoot this problem, I've decided to test that cache file on a separate script. Here is the script that test opens the cached variables:Here is the cut down version of the generated cache file to show you an example of a structure.#!/usr/bin/perl #Locale settings: no warnings 'layer'; use utf8; use locale; use encoding 'utf8', STDOUT => 'utf8', STDERR => 'utf8'; use POSIX qw(locale_h); setlocale(LC_TYPE, 'ru_RU.UTF-8'); use Encode; #The test code require 'fdupes-gui_chmk-dupes.txt.cache'; my $imported_vars = import_vars(); print "---test_before---\n"; for my $cvar (keys %$imported_vars){ print "$cvar:\n"; for my $ckey (keys %{$imported_vars->{$cvar}}){ print "\t$cvar: $ckey\n"; } } print "---after_test---\n";
no warnings 'layer'; use utf8; use locale; use encoding 'utf8', STDOUT => 'utf8', STDERR => 'utf8'; use POSIX qw(locale_h); setlocale(LC_TYPE, 'ru_RU.UTF-8'); use Encode; my %sameFilesOneDir = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я Серикова  +040;.В/SAS_v120808/cache/map/z18/74/x76538/37/'=>[ 'y38062.png', 'y38061.png' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Buh/Рас +;четчик/'=>[ 'Документы & +#1055;У 5_2010.lnk', 'Документы i +5;У 5_2010 (2).lnk' ], '/media/igor/chmk/home/zamutnii/.repo/10.04/amd64/pool/x/xserver-xorg- +video-nouveau/'=>[ 'xserver-xorg-video-nouveau_0.0.15+git20100219+9b4118d-0ubunt.deb' +, 'xserver-xorg-video-nouveau_0.0.15+git20100219+9b4118d-0ubuntu5_amd64. +deb' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я Смирново
 +81; Н.Н/от Никол +;аенко Т.М/От&# +1076;еление_пед +агогики/050501_Пl +8;офессиона
 +83;ьное_обуче&# +1085;ие_(по отраl +9;лям)_ГОС/Мет& +#1086;д._материаl +3;ы/Тараненк +086; РИСУНОК ДЛ +Я 018-03+ Задания/ +047;АДАНИЯ/РЕБ& +#1059;СЫ МЛЕКОПИ +;Т/'=>[ 'РЕБУС 2.jpg', 'РЕБ 2 .jpg' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/E +du/Stamina/Data/'=>[ 'lessons.lt', 'lessons.lv', 'lessons.da' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/u +nsorted/Временно/
 +57; диска D/Карт +;а памяти 2 ги& +#1075;а для солдk +2;това/Sounds/Ране +090;ки/ЛеРа/'=>[ 'лера_козло& +#1074;а_-_рядом_2c4f2ec6c8e2.mp3' +, 'лера_козлоk +4;а_-_рядом_1309842aff23.mp3' ] ); my %info = ( '93688'=>'26884 bytes each:', '58684'=>'79479 bytes each:' ); my %folders = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я СисАдмин
 +72;/recover-priyomnaya/recup_dir.2376/'=>[ 'f3484724920.doc', 'f3484724712.doc' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я Амосовой  +045;.Г/Док/Прогр +;аммы и КТП В&# +1086;просы/2012-2013/Ти
 +90;ульники и л& +#1080;тература/949-05 +/КМ/'=>[ 'Литератур&# +1072;.doc', 'РП КМ (Ф).doc' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я СисАдмин
 +72;/recover-priyomnaya/recup_dir.433/'=>[ 'f1793587968.doc', 'f1793889136.doc', 'f1793885184.doc' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/u +nsorted/Временно/
 +52;ои докумен&# +1090;ы/Парикмма +хер 2010-2012 уч.год/ +Съемный диl +9;к (G)/парикмах +;ер/виктори
 +85;а/pic1-6/pic1/'=>[ '2 (3).JPG', '2 (2).JPG' ] ); my %files = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я СисАдмин
 +72;/recover-priyomnaya/recup_dir.2036/f3467715168.doc'=>'71514', '/media/igor/chmk/home/zamutnii/Shared_Folder/0.3.shared/дл +;я СисАдмин
 +72;/recover-priyomnaya/recup_dir.2356/f3483793848.doc'=>'47380'); my %groups = ( '93688'=>[ '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Docs/ +Галина Павl +3;овна/Докум +077;нты/Кузнец& +#1086;ва Г.П/Нова +03; папка/standard/stddir1/xserver-xorg +-input-all_7.3+19_i386.deb', '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/Distrib/D +istr_Unix/Repo/Repo_1/pool/main/x/xorg/xserver-xorg-input-all_7.3+19_ +i386.deb' ], '58684'=>[ '/media/igor/chmk/home/zamutnii/.chmsee/bookshelf/99a36a6da9cc659b +be4e7122a92e66d1/8250final/images/ch06fig06_0.jpg', '/media/igor/chmk/m3/zamutnii/.chmsee/bookshelf/99a36a6da9cc659bbe4e71 +22a92e66d1/8250final/images/ch06fig06_0.jpg' ] ); my %oneFileEachDir = ( ); my %foldersWithOneFile = ( '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/deb-repo/ +1/pool/universe/libc/libconfig-mvp-perl/'=>[ 'libconfig-mvp-perl_0.093350-1_all.deb' ], '/media/igor/chmk/home/zamutnii/Shared_Folder/Administrators/deb-repo/ +6/pool/universe/p/python-tgext.admin/'=>[ 'python-tgext.admin_0.2.6-1_all.deb' ] ); sub import_vars{ return({ 'sameFilesOneDir'=>\%sameFilesOneDir, 'info'=>\%info, 'folders'=>\%folders, 'files'=>\%files, 'groups'=>\%groups, 'oneFileEachDir'=>\%oneFileEachDir, 'foldersWithOneFile'=>\%foldersWithOneFile }); } return(1);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: problem with hashes, loaded from file
by Anonymous Monk on Dec 24, 2014 at 12:09 UTC | |
by igoryonya (Pilgrim) on Dec 24, 2014 at 12:46 UTC | |
by AnomalousMonk (Archbishop) on Dec 24, 2014 at 13:32 UTC | |
by igoryonya (Pilgrim) on Dec 25, 2014 at 09:27 UTC | |
by AnomalousMonk (Archbishop) on Dec 25, 2014 at 17:25 UTC | |
| |
by Anonymous Monk on Dec 24, 2014 at 13:04 UTC | |
by igoryonya (Pilgrim) on Dec 25, 2014 at 09:49 UTC | |
by Anonymous Monk on Dec 25, 2014 at 19:35 UTC | |
|