snra_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I have a list containing the file names in the following format,

/devLog/devid234/term_logs/devid234.2009-08-27-23-55-11.terminal.log.tar.gz
/devLog/devid234/term_logs/devid234.2009-08-27-15-05-06.terminal.log.tar.gz
/devLog/devid234/term_logs/devid234.2009-08-27-01-45-03.terminal.log.tar.gz
/devLog/devid234/term_logs/devid234.2009-08-28-00-00-01.terminal.log.tar.gz
/devLog/devid234/term_logs/devid234.2009-08-28-18-25-04.terminal.log.tar.gz
/devLog/devid168/term_logs/devid168.2009-08-28-01-35-02.terminal.log.tar.gz
/devLog/devid168/term_logs/devid168.2009-08-27-04-02-01.terminal.log.tar.gz
/devLog/devid168/term_logs/devid168.2009-08-28-20-25-01.terminal.log.tar.gz
/devLog/devid168/term_logs/devid168.2009-08-27-17-55-01.terminal.log.tar.gz
/devLog/devid918/term_logs/devid918.2009-08-27-21-15-01.terminal.log.tar.gz
/devLog/devid918/term_logs/devid918.2009-08-27-13-25-01.terminal.log.tar.gz
/devLog/devid918/term_logs/devid918.2009-08-27-00-00-01.terminal.log.tar.gz
/devLog/devid918/term_logs/devid918.2009-08-28-00-00-02.terminal.log.tar.gz
/devLog/devid918/term_logs/devid918.2009-08-28-09-45-01.terminal.log.tar.gz
/devLog/devid918/term_logs/devid918.2009-08-28-19-25-01.terminal.log.tar.gz
/devLog/devid167/term_logs/devid167.2009-08-28-02-45-01.terminal.log.tar.gz
/devLog/devid167/term_logs/devid167.2009-08-27-01-45-02.terminal.log.tar.gz
/devLog/devid167/term_logs/devid167.2009-08-27-10-55-01.terminal.log.tar.gz

The file name will have the timestamp in YYYY-MM-DD-hh-mm-ss format.

The above list is created by reading a dir recursively using File::find ,

sub process_file { push( @LogAnalyser::LogFileList, $File::Find::name ) if /terminal.log.tar.gz$/; } find( \&process_file, "$log_dir" );


Now my question is to have all the filenames in each devid directory in the list sorted based on the timestamp in the log file name,
so that when i read every file from the list for processing , all the files are read sequentially with respect to time.
. I tried to use the stat() function but it did'nt suit my need either as it works based on time when file is last modified.
Thanks !!!

Replies are listed 'Best First'.
Re: Sort the file names a list
by bv (Friar) on Aug 31, 2009 at 15:46 UTC

    Sounds like you need a custom sort routine. There are several ways you could do it. Most simply (and least efficiently):

    my @sorted = sort { (split /\./, $a)[1] cmp (split /\./, $b)[1] } @LogAnalyser::LogFileList;

    A better way (that someone named something fancy which I cannot remember):

    my @sorted = sort datestamp @LogAnalyser::LogFileList; { # anonymous block. If you're using Perl 5.10, you could # use a state variable instead. my %cache; sub datestamp { my $left = ( $cache{$a} ||= (split /\./, $a)[1] ); my $right = ( $cache{$b} ||= (split /\./, $b)[1] ); return $left cmp $right; } }

    This last one does a little memory tradeoff to avoid doing the split and offset twice for every comparison in the sort.

    $,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;

      A better way (that someone named something fancy which I cannot remember):

      Memoizing is caching the result of a function based on its inputs. You're memoizing split.


      Less of a tradeoff:

      my @sorted = map substr($_, 19), sort map substr($_, -(19+20), 19) . $_, @LogAnalyser::LogFileList;

      For the data posted in the OP:

      Rate memoized st naive grt memoized 12517/s -- -30% -64% -66% st 17883/s 43% -- -48% -52% naive 34555/s 176% 93% -- -6% grt 36872/s 195% 106% 7% --

      Update: Added ST.
      Update: Fixed bug in substr indexes.

        ikegami++! I'd seen the GRT discussed, but never tried it myself. I guess I never thought enough about it, and my first-glance assessment was that memoization was faster. That'll teach me to make assumptions without testing!

        Oh, and I saw the memoization technique called the "Orcish (or-cache) maneuver" in Perl Underground 2 (credit japhy).

        $,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;
        ikegami, I know you already know it, but reading your post somebody could thing that using the your GRT code is a good idea because it is the faster solution when actually it is not!

        It is not good because using substr() to extract the sorting keys is very weak. If for instance, the file names extensions are changed, it will extract incorrect keys and, what is worst, without reporting any warning or error to the user!

        Even if slower, using a regular expression to extract the keys is probably the best solution.

        In any case, there is a faster (not recommendable either) method...

        ... use Sort::Key qw(keysort); ... cmpthese(-3, { ... sk => ' use strict; use warnings; my @sorted = keysort { substr($_, -(19+20), 19) } @::LogFileLi +st; ', });
        that on my computer runs as...
        Rate memoized st grt naive sk memoized 8784/s -- -26% -51% -54% -58% st 11894/s 35% -- -33% -38% -43% grt 17794/s 103% 50% -- -7% -14% naive 19207/s 119% 61% 8% -- -7% sk 20714/s 136% 74% 16% 8% --
        Note also that on my hardware, naive is actually faster than grt.
Re: Sort the file names a list
by ssandv (Hermit) on Aug 31, 2009 at 16:38 UTC

    It strikes me as remarkable that you say, right in your post, that you need them sorted by the timestamp in the filename, and then you say you tried stat(), which did exactly what the documentation says it does (which isn't parsing the filename). When you go so far as to specify the behavior, you should strive to write code that matches the specification. If you don't, you will continue to struggle at relatively simple programming tasks.

    "I'm trying to parse this filename string. I tried stat()" is a little like saying "I just can't seem to drive this screw. I tried a hammer, but it didn't work."

      Thanks everyone for your inputs.. Have got couple of solutions based on your postings. Will try it out and post the same here..

Re: Sort the file names a list
by dsheroh (Monsignor) on Aug 31, 2009 at 16:03 UTC
    Erm... If your filenames are all /some/prefix/dirname/xxxYYYY-MM-DD-hh-mm-ss..., as in the example, then just doing a standard (ASCII-ordered) sort will give you the filenames in each directory in timestamp order, provided that (again, as in your example data) all files within the same dirname have the same xxx.

      It'd be nice if it was that easy, but his example has several different values for the "devid" prefix.

      $,=' ';$\=',';$_=[qw,Just another Perl hacker,];print@$_;

        But each devid is in a different devid directory so, in the sample provided, there is only one devid in each directory and the statement of objective is quite ambiguous saying both "have all the filenames in each devid directory in the list sorted" and "all the files are read sequentially with respect to time".

        Until this ambiguity is resolved a simple sort is as valid as any of the other solutions provided, satisfying the first statement of the objective. A simple sort of the file names is certainly closer to the stated objective (sorting based on the filename) than anything based on stat of the files.

        The ambiguity can be resolved by the OP posting the desired result corresponding to the sample provided. The sample is small enough that this could be produced manually. Prose descriptions of objectives are prone to error, ambiguity and misunderstanding. Sample inputs and outputs may also have errors and may suffer from an inadequate sample of cases, leaving ambiguity. A combination of both sample data and prose description often works best.

Re: Sort the file names a list
by bichonfrise74 (Vicar) on Sep 01, 2009 at 01:36 UTC
    I know you have already seen a lot of suggestions. But just for the kicks, here's a sort based on the Schwartzian Transform.
    #!/usr/bin/perl # http://perlmonks.org/index.pl?node_id=792392 use strict; my $string; while( my $line = <DATA>) { my $date = (split( /\Q.\E/, $line ))[1]; $string = $string . $date . " " . $line; } my @test = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, (split)[0]] } split( /\n/, $string); print join "\n", map { (split)[1] } @test; __DATA__ /devLog/devid234/term_logs/devid234.2009-08-27-23-55-11.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-27-15-05-06.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-27-01-45-03.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-28-00-00-01.terminal.log.t +ar.gz /devLog/devid234/term_logs/devid234.2009-08-28-18-25-04.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-28-01-35-02.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-27-04-02-01.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-28-20-25-01.terminal.log.t +ar.gz /devLog/devid168/term_logs/devid168.2009-08-27-17-55-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-27-21-15-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-27-13-25-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-27-00-00-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-28-00-00-02.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-28-09-45-01.terminal.log.t +ar.gz /devLog/devid918/term_logs/devid918.2009-08-28-19-25-01.terminal.log.t +ar.gz /devLog/devid167/term_logs/devid167.2009-08-28-02-45-01.terminal.log.t +ar.gz /devLog/devid167/term_logs/devid167.2009-08-27-01-45-02.terminal.log.t +ar.gz /devLog/devid167/term_logs/devid167.2009-08-27-10-55-01.terminal.log.t +ar.gz