in reply to Non deliminatd Nested text

Greetings Dr.Altaica,

You have an interesting situation. I'm sure there must be a module out there that does nearly exactly what you want, but since you specifically asked for something that will work "only in the standared distubution of Perl," here's my code:

#!/usr/bin/perl -w use strict; my $input = '[S [NP This NP] [VP is [NP [NP the turning point NP] ' . '[PP to [NP the left NP] PP] NP] VP] . S]'; $input =~ s/^\[S\s*(.*?)\s*S\]$/$1/; my $bracket_count = 0; my @output; my $build_var; foreach (split(//, $input)) { $build_var .= $_; if (/\[/) { $bracket_count++; } elsif (/\]/) { $bracket_count--; } if ($bracket_count == 0) { push @output, $build_var if ($build_var ne ' '); $build_var = ''; } } foreach (@output) { print '"', $_, '"', "\n"; }

It's not fancy or nice, and probably will need to be patched a bit for situations that fall outside of your given example. However, it does seem to return the results that you want.

-gryphon
code('Perl') || die;

Replies are listed 'Best First'.
Re: Re: Non deliminatd Nested text
by tommyw (Hermit) on Oct 26, 2001 at 05:56 UTC

    Just in case, test to see whether $bracket_count ever goes to -1. In the case where your string is aaa]bbb[... you probably want to do something a little more drastic than pushing (a, a, a, ]bbb[) into the output array :)

Re: Re: Non deliminatd Nested text
by Dr.Altaica (Scribe) on Dec 10, 2001 at 16:07 UTC
    Thanks gryphon, not exactly what [I wanted] but that was my was'nt clear about what I wantd. I should have used an example like: '[VP This stuff is [NP the left NP] [NP other thing NP] VP]' into ("This stuff is", "[NP the left NP]", "[NP other thing NP]") Here's the code that works like a need incase someone else need to split a anchor deliminated strings(in this case the center of " [" and "] ")and ignore the nested ones.nested
    #!/usr/bin/perl -w use strict; #my $input = '[VP is [NP one NP] it [NP two NP] working VP]'; my $input = '[VP This stuff is [NP the left NP] [NP other thing NP] VP +]'; #my $input = '[VP [NP This NP] [VP is [NP [NP the turning point NP] [P +P to [NP the left NP] PP] NP] VP] . VP]'; #my $input = '[S [NP This NP] [VP is [NP [NP the turning point NP] ' . + # '[PP to [NP the left NP] PP] NP] VP] . S]'; $input =~ s/^\[\w+\s*(.*?)\s*\w+\]$/$1\n/; my $bracket_count = 0; my @output; my $build_var; foreach (split(//, $input)) { if (/\[/) { if ($bracket_count == 0) {#DR $build_var =~ s/^\s*|\s*$//g;#dr push @output, $build_var if ($build_var ne '') +;#DR $build_var = '';#DR }#DR $bracket_count++; $build_var .= $_;#dr } elsif (/\]/) { $bracket_count--; $build_var .= $_;#dr if ($bracket_count == 0) {#DR $build_var =~ s/^\s*|\s*$//g;#dr push @output, $build_var if ($build_var ne '') +;#DR $build_var = '';#DR }#DR } elsif (/\n/) {#dr Should be the end $build_var =~ s/^\s*|\s*$//g;#dr push @output, $build_var if ($build_var ne '');#DR } else {#dr $build_var .= $_;#dr } } foreach (@output) { print '"', $_, '"', "\n"; }