bryank has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I have another question that is probably super simple, but I keep hitting a wall. Let's say I have a list of paths that represent a tree, like so:

Valentine's Day Valentine's Day|Cards Valentine's Day|Flowers Valentine's|Candy Valentine's|Telegrams
It is assumed that the the order represents the tree's order. The problem is that the tree has some structure errors -- parents are missing in certain locations. I want to create something that would create the missing paths where necessary. For example:

Valentine's Day Valentine's Day|Cards Valentine's Day|Flowers Valentine's Valentine's|Candy Valentine's|Telegrams

I realize the above structure makes no sense, but I wanted to illustrate the problem with my current script. It won't notice 'Valentine's' as a path that needs to be created, because the term exists in the previous path. I think I need to tighten my regex, but I am not sure. Can someone help me out? Thanks!

use strict; #use warnings; use Data::Dumper; while(<DATA>) { chomp; my $path = $_; my @paths = split /\|/, $path; my $parent = $paths[-2]; my $child = $paths[-1]; my $prev_path; if ($prev_path !~/^$parent$/) { my $message = 'created new path!'; my @new_paths = @paths; my $discard = pop @new_paths; my $new_path = join '|', @new_paths; print $new_path . "\t" . $message . "\n"; } print $path . "\n"; $prev_path = $path; } __DATA__ Valentine's Day Valentine's Day|Cards Valentine's Day|Flowers Valentine's|Candy Valentine's|Telegrams

Replies are listed 'Best First'.
Re: Tree path analyzer regex problem (maybe other issues)?
by ikegami (Patriarch) on Jun 23, 2009 at 16:33 UTC

    It won't notice 'Valentine's' as a path that needs to be created,

    That's not what I see. I see it creating path "Valentine's Day" twice and path "Valentine's" twice.

    Valentine's Day Valentine's Day created new path! Valentine's Day|Cards Valentine's Day created new path! Valentine's Day|Flowers Valentine's created new path! Valentine's|Candy Valentine's created new path! Valentine's|Telegrams

    For some reasons, you commented out use warnings; which tells you why you're creating the same path more than once.


    By the way,

    $prev_path !~ /^$parent$/

    can fail since $parent contains plain text, not a regex pattern. You want

    $prev_path !~ /^\Q$parent\E$/
    or just
    $prev_path ne $parent
Re: Tree path analyzer regex problem (maybe other issues)?
by ikegami (Patriarch) on Jun 23, 2009 at 16:50 UTC

    Here is a more robust solution (input doesn't need to be ordered, supports multi-level paths) that produces sorted output.

    use strict; use warnings; sub set { my $p = \shift; $p = \( $$p->{$_} ) for @_; } sub flatten { my $node = shift; print(join('|', @_), "\n") if @_; return if !$node; for (sort keys %$node) { flatten($node->{$_}, @_, $_); } } { my $tree; while (<DATA>) { chomp; set($tree, split(/\|/)); } flatten($tree); } __DATA__ Valentine's Day Valentine's Day|Cards Valentine's Day|Flowers Valentine's|Candy Valentine's|Telegrams
    Valentine's Valentine's|Candy Valentine's|Telegrams Valentine's Day Valentine's Day|Cards Valentine's Day|Flowers
      That is a thing of beauty -- thanks. Question. I'd like to flag any new tree nodes that are created. I wasn't sure where I'd insert that messaging in your script...?
        It wasn't designed to do that. To keep the strengths of this method and produce a listing of changes, you could diff the sorted input with the output.
Re: Tree path analyzer regex problem (maybe other issues)?
by kennethk (Abbot) on Jun 23, 2009 at 16:36 UTC
    Given that you have fixed strings, eq and ne are likely better choices than regular expressions. Also, note that your code has assumptions about depth in it - the fact that you had to turn off warnings is usually a sign you have a design issue. Perhaps something more like:

    use strict; use warnings; use Data::Dumper; my @last = (); while(<DATA>) { chomp; my @paths = split /\|/; my $different = 0; foreach my $i (0 .. $#last) { last if $i > $#paths; if ($different or $last[$i] ne $paths[$i]) { $different = 1; print join('|',@paths[0 .. $i]),"\n"; } } foreach my $i ($#last+1 .. $#paths) { print join('|',@paths[0 .. $i]),"\n"; } @last = @paths; } __DATA__ Valentine's Day Valentine's Day|Cards Valentine's Day|Flowers Valentine's|Candy Valentine's|Telegrams
      Thanks -- that does exactly what I want. Can you explain what some of the magic does? Specifically:

      foreach my $i (0 .. $#last)

      and

      foreach my $i (0 .. $#last)

        I would not describe that as magic. Range Operators (written as ..) produce a list of integers between the two limits. As described in Variable names, $#array returns the index of the last element of @array. "Magic" usually involves (from my perspective) either action-at-a-distance or Perl typing in the background. These are just operators.