Emanuel has asked for the wisdom of the Perl Monks concerning the following question:

fellow monks

I'm running into a problem issuing split invoking a regex. forgive me for the following bad regex, it's just something quick and dirty:

@lines = split(/(\d\d:\d\d:\d\d[a-zA-Z].*?[A-Z][A-Z][A-Z][A-Z]+)/,$dat +a);

this is working fine for $data like

00:01:00Something here bla bla blaTYPE00:02:00Something here bla bla blaANOTHERTYPE00:03:00Something here bla bla blaEVENMORETYPES

however, it (obviously) doesn't work for something like

00:01:00Something here bla bla blaType00:02:00Something here bla bla b +laTypetoooo00:03:00Something here bla bla blaTYPETHREE

I'm trying to build up a regex that lets split split at the digits of the next 'entry'. The outcome I need is
0 => 00:01:00Something here bla bla blaType 1 => 00:02:00Something here bla blablaTypetoooo 2 => 00:03:00Something here bla bla blaTYPETHREE

any help on this higly appreciated

regards
Emanuel

Replies are listed 'Best First'.
Re: split with regex
by rasta (Hermit) on Nov 01, 2002 at 14:44 UTC
    I believe this should be helpful:

    split /(?=\d\d:\d\d:\d\d)/, $data;

      rasta provides an elegant solution to your problem, if you need to use split. I tend to avoid fancy things like lookahead assertions if possible, if only to simplify maintenance. Also, since I was curious, I thought I'd benchmark our two solutions.

      my $data = "00:01:00Something here bla bla blaTYPE00:02:00". "Something here bla bla blaANOTHERTYPE00:03:00S". "omething here bla bla blaEVENMORETYPES"; use Benchmark; timethese (100000, { withsplit => sub { my @lines = split /(?=\d\d:\d\d:\d\d)/, $data; }, nosplit => sub { my @lines = $data =~ /(\d{2}:\d{2}:\d{2}[^\d]*)/g; } } );

      ...gives me...

      Benchmark: timing 100000 iterations of nosplit, withsplit... nosplit: 6 wallclock secs ( 7.09 usr + 0.00 sys = 7.09 CPU) @ 14 +104.37/s (n=100000) withsplit: 13 wallclock secs (13.85 usr + 0.00 sys = 13.85 CPU) @ 72 +20.22/s (n=100000)

      I don't know if it's split or the lookahead that's slowing things down, but I thought you might be interested in my results anyway.

      -Bird
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: split with regex
by Bird (Pilgrim) on Nov 01, 2002 at 15:05 UTC

    If you're trying to keep all the data you're matching, I don't know if split is the best solution. You should be fine just using a global match with capturing. Something like...

    my $data = "00:01:00Something here bla bla blaTYPE00:02:00". "Something here bla bla blaANOTHERTYPE00:03:00S". "omething here bla bla blaEVENMORETYPES"; my $moredata = "00:01:00Something here bla bla blaType00:0". "2:00Something here bla bla blaTypetoooo00:". "03:00Something here bla bla blaTYPETHREE"; my @lines = $data =~ /(\d{2}:\d{2}:\d{2}[^\d]*)/g; my @morelines = $moredata =~ /(\d{2}:\d{2}:\d{2}[^\d]*)/g; print "$_\n" for @lines; print "\n"; print "$_\n" for @morelines;

    This assumes that the text between the digits won't contain any other digits. You may need to modify the [^\d]* portion of the regex if any digits may appear in the text section. From your examples, though, this appears to do what you need.

    -Bird
Re: split with regex
by Jaap (Curate) on Nov 01, 2002 at 14:06 UTC
    How about this:
    split (/(\d{2}\:\d{2}\:\d{2})/, $data);
      the problem here is that it rips off the leading digits, but I do need them.. and that's my dilemma, that I can't seem to find a proper regex for this.
        No it does not, because the regex is surrounded by (). The time is stored in a separate array element.