Eugene has asked for the wisdom of the Perl Monks concerning the following question:

I need to split a string, by some pattern. But I also need to keep the delimiter as well. For example if my string is "text delimeter text delimeter text", the array would be
@a=(text, delimeter, text, delimeter, text).
How would I be able to do that?
If you try to split normally, you get @a=(text, text, text), and the delimeter is discarded.

Thanks, Eugene

Replies are listed 'Best First'.
Re: splitting
by perlmonkey (Hermit) on Apr 27, 2000 at 08:22 UTC
    If you put a grouping term like '(delimeter)'in the regular expression for split it will return that group as well as the normal splitted parts.

    This code assigns @a the normal split return variables, but the regex will also return the pattern matched between the '()' so @a becomes (text,delimeter,text,delimeter,text)
    $str = "text delimeter text delimeter text"; @a = split(/\s+(delimeter)\s+/, $str);
    If I tried: @a = split(/(\s+)(delimeter)(\s+)/, $str); Then @a is (text,' ',delimeter,' ',text,' ',delimeter,' ',text);
      $str = "text delimeter text delimeter text"; @a = split(/\s+(delimeter)\s+/, $str);
      Note that this won't work if the delimiter (called "delimeter" here :) is at the end or at the beginning of your string.
      probably, replacing the \s+ with \s* in the regexp would be better. But then, if "delimeter" was at the beginning of the string, you'd have an empty string as the first element of your array.
Re: splitting
by BBQ (Curate) on Apr 28, 2000 at 09:21 UTC
    Do I get a fine or something for overusing foreach? But, seriously, what's wrong with this approach?
    $dlm = '~'; $str = 'my~mind~has~been~broken'; foreach (split(/$dlm/,$str)) { push(@ary,$_,$dlm); } $ary[$#ary--]; print join(',',@ary),"\n";
    Sure it's not pretty, but I suppose that's the goal...

    #!/home/bbq/bin/perl
    # Trust no1!
      That works if your delimeter is always the same. Like ~ in your case. But what if you use a regex to match some pattern, so that your delimenter is text in quotes?
Re: splitting
by rmgiroux (Initiate) on Apr 27, 2000 at 23:22 UTC
    I am not sure I understand the regex being used by perlmonkey and snowcrash's answers...

    Assuming your delimiters are

    ' ',',', or ';',
    for example, doing

    perl -e'print join "!!",split /([ ,;])/,"This sentence contains, in he +re, a comma; and there's a semicolon clause\n";'
    gives us:
    This!! !!sentence!! !!contains!!,!!!! !!in!! !!here!!,!!!! !!a!! !!comma!!;!!!! !!and!! !!there's!! !!a!! !!semicolon!! !!clause
    
    This trick uses the fact that
    join "{some delimiter string}",@array
    
    returns the contents of @array, separated by the delimiter string.

    As you can see, things like ", " confuse it, so you might want to use a + inside the parentheses if you can afford to collect multiple separators into one array slot.

    perl -e'print join "!!",split /([ ,;]+)/,"This sentence contains, in h +ere, a comma; and there's a semicolon clause\n";'
    That's a bit better; now we get:
    This!! !!sentence!! !!contains!!, !!in!! !!here!!, !!a!! !!comma!!; !!and!! !!there's!! !!a!! !!semicolon!! !!clause
    
    Hope this helps...
      The only tricky thing about the regexes supplied by the other two (besides the use of parenthesis, which is new to me and very cool) is that they specify that there is at least one whitespace character immediately before and after the delimiter. Using an asterisk would make that optional.

      Does that explain it?

        The only reason I put the \s in the regex above was because the original posting wanted: (text,delimeter,text,delimeter,text) with the white space removed ... so my regex does not capture the white space. Perhaps I was not that clear: do what ever you want with the regex, but note that whatever is in the '()' will be returned into the array. So I just as well could have done:@array = split(/(delimeter)/, $str); But then @array would have the white space in some of the array elements:
        @array would be ("text ", "delimeter', " text ", "delimeter", " text") +;
        Hopefully this will clear some things up.
Re: splitting
by Eugene (Scribe) on Apr 28, 2000 at 09:24 UTC
    Thanks, all clear.