Re: Parsing newick trees

A non-eval solution. This constructs a AoA. I'd define constants: use constant { LEFT => 0, RIGHT => 1 }; rather than use a HoH for this, but converting the code to produce a HoH is trivial. As coded, it does require an absence of spaces, so strip them first if they might be present:

Updated with a slightly less verbose version

#! perl -slw
use strict;
use Data::Dumper;

sub newickFromString {
    my $in = shift;
    return $in unless $in =~ tr[(),][(),];

    my( $depth, $p ) = 0;
    for ( 0 .. length( $in ) -1 ) { ## find split point
        my $c = substr $in, $_, 1;
        ++$depth if $c eq '(';
        --$depth if $c eq ')';
        $p = $_ and last if $c eq ',' and $depth == 1;
    }
    return [
        newickFromString( substr $in, 1, $p-1 ),
        newickFromString( substr $in, $p+1, -1 )
    ];
}


my $input = "((Alpha,(Bravo,((Charlie,(Delta,Echo)),(Foxtrot,Golf)))),
+Hotel)";
my $newick = newickFromString( $input );

print Dumper $newick;

__END__
c:\test>junk8
$VAR1 = [
          [
            'Alpha',
            [
              'Bravo',
              [
                [
                  'Charlie',
                  [
                    'Delta',
                    'Echo'
                  ]
                ],
                [
                  'Foxtrot',
                  'Golf'
                ]
              ]
            ]
          ],
          'Hotel'
        ];
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: Parsing newick trees Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing newick trees by JadeNB (Chaplain) on Oct 17, 2008 at 21:32 UTC
Rather than doing all the `substr`'s, couldn't you just do `$in = s/^$(.*)$$/$1/; my @splat = split(/,/, $in); my ( $depth, $left ); while ( my $c = shift @splat ) { $left .= ( $left ? ',' : '' ) . $c; ++$depth if index($c, '(') >= $[; --$depth if index($c, ')') >= $[; last if $depth == 0; } my $right = join(',', @splat);` [download] and then recurse on `$left` and `$right`? (Maybe searching each time for the parentheses is as hard as `substr`'ing, though.) UPDATE: Per BrowserUK's response, the answer seems to be “Yes, but why?” I always thought that `substr`'s were very expensive, but apparently not. Hurrah for premature optimisation!	[reply] [d/l] [select]
Re^3: Parsing newick trees by BrowserUk (Patriarch) on Oct 17, 2008 at 21:52 UTC
Rather than doing all the substr's, What's wrong with substr? You can do dozens (if not hundreds) of substrs in the time it takes to start the regex engine. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]