in reply to Parse a list of path strings into a nested hash

Often clean and fast go hand in hand:

use strict; use warnings; use Benchmark qw(cmpthese); srand (1); my @testPaths = map { join '\\', ${['c:', 'd:', 'e:']}[rand 2], map {int rand 100} 0 .. rand 6 } 1 .. 100; cmpthese ( -3, { JDP_c => sub {paths2treeJDP_c (@testPaths)}, JDP => sub {paths2treeJDP (@testPaths)}, GF => sub {paths2TreeGF (@testPaths)}, } ); sub paths2TreeGF { my %pTree; for my $path (@_) { my @parts = split /\\/, $path; my $scan = \%pTree; $scan = $scan->{shift @parts} ||= {} while @parts; } return \%pTree; } sub paths2treeJDP { my $hr = {}; @{$hr}{@_} = map {{}} @_; my $n_repls; do { $n_repls = 0; for (sort {length ($b) <=> length ($a)} keys %$hr) { if (/(.*)\\(.*)/) { $hr->{$1}{$2} = delete $hr->{$_}; $n_repls++; } } } while ($n_repls); $hr } sub paths2treeJDP_c { my $hr = {map {$_ => {}} @_}; my $n_repls; do { $n_repls = 0; for (keys %$hr) { if (/(.*)\\(.*)/) { $hr->{$1}{$2} = delete $hr->{$_}; $n_repls++; } } } while ($n_repls); $hr }

Prints:

Rate JDP JDP_c GF JDP 625/s -- -19% -47% JDP_c 773/s 24% -- -34% GF 1175/s 88% 52% --

The _c variant omits the sort for a modest improvement but the more interesting improvement is omitting the need for the delete.

True laziness is hard work

Replies are listed 'Best First'.
Re^2: Parse a list of path strings into a nested hash
by Jim (Curate) on Sep 06, 2010 at 01:52 UTC

    Can you please briefly explain paths2TreeGF to us, especially the tricky bit? It looks slick. I just can't get my head around it.

    $scan = $scan->{shift @parts} ||= {} while @parts;

    Thanks.

      Sure. For each path split the path into parts:

      my @parts = split /\\/, $path;

      Then for each part in the path starting at the 'root' (drive specifier for example) add it to the parent hash. %pTree is the parent hash for the whole tree so we start out by setting $scan to reference that:

      my $scan = \%pTree;

      then the 'tricky' bit adds the parts. shift @parts removes the left most part of the path. $scan->{shift @parts} ||= {} will create a new hash element if there isn't one already. In any case it returns the hash ref for the current part of the path and that becomes the new parent $scan = ... while @parts;. Taken all together the 'tricky' bit walks down the tree generating entries as required:

      $scan = $scan->{shift @parts} ||= {} while @parts;

      A few important tricks:

      1. $scan->{shift @parts} ||= {} autovivifies (creates) an entry in the hash if it doesn't exist yet.
      2. ||= {} only assigns a new (empty) hash if a new entry has been created (or in the nasty special case that the directory name is 0).
      3. assignment operators (=, ||=, +=, ...) 'return' the value assigned which is how $scan gets updated.

      Note that with Perl 5.10 and later you can use //= in place of ||= and it fixes the 0 issue mentioned in item 2.

      True laziness is hard work
Re^2: Parse a list of path strings into a nested hash
by repellent (Priest) on Sep 09, 2010 at 23:37 UTC
    Here's another way that produces a tree whose leaf nodes have undef values:
    sub paths2treeREPEL { my %tree; for (@_) { my $last = \\%tree; $last = \$$last->{$_} for split /\\/; } return \%tree; }

    FWIW:
    Rate JDP JDP_c GF REPEL JDP 881/s -- -23% -53% -63% JDP_c 1140/s 29% -- -39% -52% GF 1867/s 112% 64% -- -21% REPEL 2377/s 170% 109% 27% --