in reply to Parsing newick trees

A non-eval solution. This constructs a AoA. I'd define constants: use constant { LEFT => 0, RIGHT => 1 }; rather than use a HoH for this, but converting the code to produce a HoH is trivial. As coded, it does require an absence of spaces, so strip them first if they might be present:

Updated with a slightly less verbose version

#! perl -slw use strict; use Data::Dumper; sub newickFromString { my $in = shift; return $in unless $in =~ tr[(),][(),]; my( $depth, $p ) = 0; for ( 0 .. length( $in ) -1 ) { ## find split point my $c = substr $in, $_, 1; ++$depth if $c eq '('; --$depth if $c eq ')'; $p = $_ and last if $c eq ',' and $depth == 1; } return [ newickFromString( substr $in, 1, $p-1 ), newickFromString( substr $in, $p+1, -1 ) ]; } my $input = "((Alpha,(Bravo,((Charlie,(Delta,Echo)),(Foxtrot,Golf)))), +Hotel)"; my $newick = newickFromString( $input ); print Dumper $newick; __END__ c:\test>junk8 $VAR1 = [ [ 'Alpha', [ 'Bravo', [ [ 'Charlie', [ 'Delta', 'Echo' ] ], [ 'Foxtrot', 'Golf' ] ] ] ], 'Hotel' ];

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Parsing newick trees
by JadeNB (Chaplain) on Oct 17, 2008 at 21:32 UTC
    Rather than doing all the substr's, couldn't you just do
    $in = s/^\((.*)\)$/$1/; my @splat = split(/,/, $in); my ( $depth, $left ); while ( my $c = shift @splat ) { $left .= ( $left ? ',' : '' ) . $c; ++$depth if index($c, '(') >= $[; --$depth if index($c, ')') >= $[; last if $depth == 0; } my $right = join(',', @splat);
    and then recurse on $left and $right? (Maybe searching each time for the parentheses is as hard as substr'ing, though.)

    UPDATE: Per BrowserUK's response, the answer seems to be “Yes, but why?” I always thought that substr's were very expensive, but apparently not. Hurrah for premature optimisation!