rahu_6697 has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a database and i have to work on all .txt files present in different directories. Some file with same name are also present in different directories but i have to work on only those which are latest updated/created. I am using glob command but it is passing all files with .txt format into the array assigned to it. I just want to remove duplicate entries on basis of date of formation of file. Only latest file should present rest all files with same name should get eliminate.Only files with unique names should get stored in @filelist. Please help me out...

my @filelist = glob"/user/*/ws/*/BLK_*/*.txt for my $file (@filelist) { . . . . . }

Replies are listed 'Best First'.
Re: GLOB function
by thanos1983 (Parson) on Jun 18, 2018 at 13:44 UTC

    Hello rahu_6697,

    Welcome to the Monastery. From my point of view I would approach the problem a bit differently. I would use stat to get each file latest updated date, sample:

    my $modtime = (stat($fh))[9]

    As a second step I would create a hash of hashes. The name of the file I would use it as key then the secondary keys would be keywords like stat and path or location. Sample of hash of hashes (not tested):

    my %HoH = ( file_name => { stat => 'date here', location => 'path to file' );

    By iterating through the directories with your files you can check one by one the file and in case of common name you can compare the latest date and either update the hash or move ot the next one.

    There are many modules that can help you find the file by searching the directories recursively for example File::Find

    I would approach the problem like this.

    Hope this helps. In case that something is not clear let me know and I will try to analyse my answer further.

    BR, Thanos

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: GLOB function
by haukex (Archbishop) on Jun 18, 2018 at 18:23 UTC

    The glob function has issues, see To glob or not to glob. But if your pattern is always going to be the fixed string you showed, without variables interpolated into the string, then it's probably ok in this case.

    When you say "files with unique names", I'm going to assume you mean basenames (the filename without the path), but I'm also guessing that what you need in @filelist is still the full filename, so that you can properly open the files. Also, by "latest updated/created", I'm guessing you mean the mtime (see also stat(2)).

    You could use a hash; here I'm using the basename as the key, and each value is an anonymous array where the first element is the full filename and the second element is the mtime. To see how it works, uncomment the "# Debug" lines.

    use warnings; use strict; #use Data::Dump; # Debug use File::Basename qw/fileparse/; use File::stat; my %filelist; for my $file (glob '/user/*/ws/*/BLK_*/*.txt') { my $bn = fileparse($file); my $mtime = stat($file)->mtime; #dd $file, $bn, $mtime; # Debug $filelist{$bn} = [$file,$mtime] unless $filelist{$bn} && $filelist{$bn}[1]>$mtime; } #dd \%filelist; # Debug my @filelist = sort map {$_->[0]} values %filelist; #dd \@filelist; # Debug

      Thanks for help, you have understood the problem quite well, I have to sort files with same basename present in different directory and subdirectories according to their mtime such that only latest modified file would come in @filelist. Later I have to work on these files separately to fetch some particular data which is working fine for me.I have done changes as per your code but on terminal these errors are coming. Please resolve this issue as I am working on a shared workspace so I can't install any package in linux.

      Use of uninitialized value $arg in stat at /user/s5e7420/local/lib/per +l5/5.18.0/File/stat.pm line 206. Use of uninitialized value $name in index at /user/s5e7420/local/lib/p +erl5/5.18.0/Symbol.pm line 118. Use of uninitialized value $name in pattern match (m//) at /user/s5e74 +20/local/lib/perl5/5.18.0/Symbol.pm line 121. Use of uninitialized value $name in hash element at /user/s5e7420/loca +l/lib/perl5/5.18.0/Symbol.pm line 121. Use of uninitialized value $name in concatenation (.) or string at /us +er/s5e7420/local/lib/perl5/5.18.0/Symbol.pm line 129. Can't call method "mtime" on an undefined value at script.pl line 13.

        Hello again rahu_6697,

        Fellow Monk haukex have provided a full answer to your question, but I just wanted to add something here. You wrote: Please resolve this issue as I am working on a shared workspace so I can't install any package in linux.

        Programming and troubleshooting is part of the learning curve. Even if the fellow Monk haukex will solve the problem that you have today for you what you have learned? He also gave you a hint in the sample of code Data::Dumper module is for debugging purposes. Did you tried to debug your code? It is part of fun/hell debugging and troubleshooting.

        Regarding the part that you are not able to install modules, Data::Dumper, File::stat and File::Find are core modules.

        #!/usr/bin/env perl use strict; use warnings; use Module::CoreList; my @moduleList = ("Data::Dumper", "File::stat", "File::Find"); foreach my $module (@moduleList) { if (Module::CoreList::is_core($module)) { print "".$module." is a core module and it was released:"; print Module::CoreList->first_release($module) . "\n"; } else { print "".$module." it is not core module!\n"; } } __END__ $ perl test.pl Data::Dumper is a core module and it was released:5.005 File::stat is a core module and it was released:5.004 File::Find is a core module and it was released:5

        Hope this helps, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!
        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: GLOB function
by taint (Chaplain) on Jun 19, 2018 at 16:28 UTC
    I see others have already addressed your question well. My first thought was, when I read your question was mtime
    Mtime prints the name and modification time (in seconds since the epoch) of each of the files.
    Seemed the fastest/easiest way to "stat" the file(s) you're interested in. So I performed a search on CPAN, and found File::Find::Age, and newest-mtime. Either one of which would probably serve you well. I think using this approach may make coding your solution easier, and should also improve the speed for the results, as drilling down a filesystem hierarchy will be the bottleneck.

    HTH

    --Chris

    Evil is good, for without it, Good would have no value
    ¡λɐp ʇɑəɹ⅁ ɐ əʌɐɥ puɐ ʻꜱdləɥ ꜱᴉɥʇ ədoH