Rich36 has asked for the wisdom of the Perl Monks concerning the following question:

What's the best way to split a string and keep the delimiter as part of the string? I know that I could split the string into array elements, then use map on the array to add the delimiter back, but I wasn't sure if there's a better, simpler way.

I'm looking at data sort of like this:

:TAG:This is just some text. blahblahblahblahblahblahblah blahblahblah +. blahblah? blah.:TAG:This is just some text. blahblahblahblahblahbl +ahblah blahblahblah. blahblah? blah. :TAG:This is just some text. bl +ahblahblahblahblahblahblah blahblahblah. blahblah? blah. :TAG:This i +s just some more text. blahblahblahblahblahblahblah blahblahblah. bla +hblah? blah.

Where :TAG: is the delimiter.

Thanks,
Rich36

Replies are listed 'Best First'.
Re: Splitting and maintaining the delimiter
by boo_radley (Parson) on May 23, 2002 at 16:12 UTC
    use parens in the pattern, as per perldoc -f split. :
    my $foo = <<HERE :TAG:This is just some text. blahblahblahblahblahblahblah blahblahblah +. blahblah? blah.:TAG:This is just some text. blahblahblahblahblahblahb +lah blahblahblah. blahblah? blah. :TAG:This is just some text. blahblahblahblahblahblah +blah blahblahblah. blahblah? blah. :TAG:This is just some more text. blahblahblahblahblahblahblah blahblahblah. blahblah? blah. HERE ; @foo = split /(:TAG:)/,$foo; print join "\n",@foo
Re: Splitting and maintaining the delimiter
by Joost (Canon) on May 23, 2002 at 16:11 UTC
    Also note the following (from perldoc -f split):

    If the PATTERN contains parentheses, additional list elements are created from each matching sub­ string in the delimiter. split(/([,-])/, "1-10,20", 3); produces the list value (1, '-', 10, ',', 20)

    I'm not very sure how useful this is for you, but then again, I'm not very sure how useful it is to keep the delimiters in the string :-).

    -- Joost downtime n. The period during which a system is error-free and immune from user input.
      And despite being a while out of date, this has just proven very useful for some of my messing around :) Thanks.
Re: Splitting and maintaining the delimiter
by Molt (Chaplain) on May 23, 2002 at 15:46 UTC

    If the delimiter is a constant then simply doing a join should work to reverse it.

    my $delim = ':TAG:'; my @data = split $delim, $line; my $line = join $delim, @data;

    If the delimiter varies, ie. is a regexp, then things get more fun, but I'll only think about that if needed.

    The other obvious solution is to keep a copy of the original line about, no point rejoining if it's practical to simply throw it back out.

    Update: Misunderstood the question somewhat. If you're happy with keeping the tag at the start then a lookahead regexp assertion may be just what you're looking for. Have a look at this.
    my $data = ":TAG:This:TAG:is:TAG:a:TAG:test:TAG:"; my @results = split /(?=:TAG:)/, $data;

    I've not played with these things properly so I have no idea how performant they may be, however.

      It's not so much that I need to join the lines back together... What I've got is a sizable tagged data file. What I need to do is split them up by the tags, then get the character count of the data, including the tags. I was just curious if there was a better or more efficient way to do it. I thought about trying to do something with seek, but I believe that would involve taking two passes at the data to do so. I'll probably just go with something like:

      my $delim = ':TAG:'; my @data = split $delim, $line; @data = map{ $_ = $delim . $_ } @data; print length($_) . "\n" foreach @data;

      Thanks,
      Rich36

      UPDATE: See Molt's update on the previous post. Works great...

Re: Splitting and maintaining the delimiter
by rbc (Curate) on May 23, 2002 at 21:25 UTC
    Maybe this is what you want ...
    #!/usr/bin/perl -w use strict; while(<DATA>) { my @tags = ( /(\:TAG\:.*?)[(?=:)]/g ); for my $tag ( @tags ) { print "$tag\n"; } } __DATA__ :TAG: This is just some text. blahblahblahblahblahblahblah blahblahbla +h. blahblah? blah.:TAG: This is just some text. blahblahblahblahblah +blahblah blahblahblah. blahblah? blah. :TAG: This is just some text. + blahblahblahblahblahblahblah blahblahblah. blahblah? blah. :TAG: Th +is is just some more text. blahblahblahblahblahblahblah blahblahblah. + blahblah? blah.
A reply falls below the community's threshold of quality. You may see it by logging in.