Re: Efficient method to replace middle lines only when no match

G'day Zu,

Welcome to the monastery.

I get the feeling that there's some aspect of this that you haven't told us about. For both strings, you have three capture groups but discard $2 in each case. Also, your replacements both have "\nfield2: valueB\n" hard-coded.

However, you've said "The regex below works"; on that basis, this solution is not "slow" for either string.

#!/usr/bin/env perl -l

use strict;
use warnings;

use Time::HiRes qw{time};

my $no_field2 = "field1: valueA\nsome\nlines\nhere\nfield3: valueC\n"
                . "........................................\n" x 1000;
my $has_field2 = "field1: valueA\nfield2: valueB\nfield3: valueC\n"
                . "........................................\n" x 1000;
my $middle = "\nfield2: valueB\n";

my $re = qr{(^field1:.*?$).*?(^field3:)}ms;

replace_middle($_, $middle, $re) for ($no_field2, $has_field2);

sub replace_middle {
    my ($string, $middle, $re) = @_;

    print '-' x 40;
    print "Start:\n", substr $string, 0, 60;

    my $t0 = time;

    $string =~ s/$re/$1$middle$2/;

    my $t1 = time;

    print "Finish:\n", substr $string, 0, 60;
    print 'Time:   ', $t1 - $t0;
}
[download]

Output:

----------------------------------------
Start:
field1: valueA
some
lines
here
field3: valueC
..............
Finish:
field1: valueA
field2: valueB
field3: valueC
...............
Time:   6.89029693603516e-05
----------------------------------------
Start:
field1: valueA
field2: valueB
field3: valueC
...............
Finish:
field1: valueA
field2: valueB
field3: valueC
...............
Time:   1.62124633789062e-05
[download]

-- Ken

Comment on Re: Efficient method to replace middle lines only when no match Select or Download Code

Replies are listed 'Best First'.
Re^2: Efficient method to replace middle lines only when no match by Zu (Initiate) on Mar 20, 2014 at 07:21 UTC
Thanks for your post, Ken. I didn't see it until later, perhaps I didn't reload this page correctly. I did have a second capture group but it was extraneous. I had been testing simply using my machine's "time" command and the has_field2 version was would routinely take 4+ times longer than the no_field2 on relatively small files. On multi-megabyte files it was a disaster. Your timing is obviously much more accurate. Based on an earlier post I changed the RE (and eliminated the second capture group): `my $re = qr{field1:[^\n]\n\K(?!^field2:).(?=\nfield3:)}ms;` Which performs as I would expect - dramatically better. Thanks for your help!	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Efficient method to replace middle lines only when no match
by Zu (Initiate) on Mar 20, 2014 at 07:21 UTC

Thanks for your post, Ken. I didn't see it until later, perhaps I didn't reload this page correctly.

I did have a second capture group but it was extraneous.

I had been testing simply using my machine's "time" command and the has_field2 version was would routinely take 4+ times longer than the no_field2 on relatively small files. On multi-megabyte files it was a disaster. Your timing is obviously much more accurate.

Based on an earlier post I changed the RE (and eliminated the second capture group):

my $re = qr{field1:[^\n]*\n\K(?!^field2:).*(?=\nfield3:)}ms;

Which performs as I would expect - dramatically better. Thanks for your help!

[reply]
[d/l]