newbio has asked for the wisdom of the Perl Monks concerning the following question:

Example: Using a ( ( modified yeast ( in PCNA promoter ) ) one-hybrid ( screen ) ) to identify potential ( SRF cofactors ) in cell

Hello Dear Monks,

Is there a nice way to handle recursive split operation in perl? For example, I want to perform a split operation recursively on the above sentence using the parenthesis and space together as the delimiter. So each recursive operation will produce the following:

First pass - each line below may be stored as an array element:
Using a
( modified yeast ( in PCNA promoter ) ) one-hybrid ( screen )
to identify potential
SRF cofactors
in cell

Second pass:
modified yeast ( in PCNA promoter )
one-hybrid
screen

Third pass:
modified yeast
in PCNA promoter

After dividing the sentence recursively, I want to join the split elements to get the original sentence back.

I tried looping using normal split but it does not seem to work.

Any clues? Thanks a lot.

Replies are listed 'Best First'.
Re: recursive split
by ikegami (Patriarch) on Sep 03, 2009 at 18:03 UTC
Re: recursive split
by SuicideJunkie (Vicar) on Sep 03, 2009 at 17:31 UTC

    Split does not modify the original scalar; it returns a list. If you want to split the elements of that list, you need to split each of the elements and then collect those mini lists into a bigger list.

    Write yourself a subroutine that takes a list, splits each of the things in that list, and pushes the results into a big list to return.

    You can then call that sub as many times as you like without having to use recursion (starting with a list of 1 items, your original string).

      Thanks for your reply. Actually, I need those passes to look like the way as shown above so that I can do some processing to the content within the parenthesis if required at each recursive step. I tried shellwords but that also does not work here.
        Sounds like you want a queue of things to do next then.
        • List of things to process in the next iteration = startstring
        • While there are things to process in the next iteration
          • things to process this iteration = things to process next iteration. Things to process next iteration = ()
          • while things to process this iteration
            • shift list of things to do this iteration, and split that thing
            • ponder the split thing and print any revelations.
            • add to the list of things to process next iteration if appropriate
        You probably don't want to do recursion if you want to print everything from one level before moving on to the next.
Re: recursive split
by grizzley (Chaplain) on Sep 04, 2009 at 08:15 UTC

    Do you really need to do it recursively? What about

    @parts=split/(?=\(|\))/; print for @parts;

    You get following output:

    Using a ( ( modified yeast ( in PCNA promoter ) ) one-hybrid ( screen ) ) to identify potential ( SRF cofactors ) in cell

    Now you can process the list in a simple loop. In every step if first character is opening parenthesis, you increase variable holding nesting level, if closing, you decrease the variable. And do your processing.

    You don't even have to split, just use regexp:

    for(/(?:^|\(|\))[^()]*/g) { print # or some processing }