in reply to Efficient method to replace middle lines only when no match

G'day Zu,

Welcome to the monastery.

I get the feeling that there's some aspect of this that you haven't told us about. For both strings, you have three capture groups but discard $2 in each case. Also, your replacements both have "\nfield2: valueB\n" hard-coded.

However, you've said "The regex below works"; on that basis, this solution is not "slow" for either string.

#!/usr/bin/env perl -l use strict; use warnings; use Time::HiRes qw{time}; my $no_field2 = "field1: valueA\nsome\nlines\nhere\nfield3: valueC\n" . "........................................\n" x 1000; my $has_field2 = "field1: valueA\nfield2: valueB\nfield3: valueC\n" . "........................................\n" x 1000; my $middle = "\nfield2: valueB\n"; my $re = qr{(^field1:.*?$).*?(^field3:)}ms; replace_middle($_, $middle, $re) for ($no_field2, $has_field2); sub replace_middle { my ($string, $middle, $re) = @_; print '-' x 40; print "Start:\n", substr $string, 0, 60; my $t0 = time; $string =~ s/$re/$1$middle$2/; my $t1 = time; print "Finish:\n", substr $string, 0, 60; print 'Time: ', $t1 - $t0; }

Output:

---------------------------------------- Start: field1: valueA some lines here field3: valueC .............. Finish: field1: valueA field2: valueB field3: valueC ............... Time: 6.89029693603516e-05 ---------------------------------------- Start: field1: valueA field2: valueB field3: valueC ............... Finish: field1: valueA field2: valueB field3: valueC ............... Time: 1.62124633789062e-05

-- Ken

Replies are listed 'Best First'.
Re^2: Efficient method to replace middle lines only when no match
by Zu (Initiate) on Mar 20, 2014 at 07:21 UTC

    Thanks for your post, Ken. I didn't see it until later, perhaps I didn't reload this page correctly.

    I did have a second capture group but it was extraneous.

    I had been testing simply using my machine's "time" command and the has_field2 version was would routinely take 4+ times longer than the no_field2 on relatively small files. On multi-megabyte files it was a disaster. Your timing is obviously much more accurate.

    Based on an earlier post I changed the RE (and eliminated the second capture group):

    my $re = qr{field1:[^\n]*\n\K(?!^field2:).*(?=\nfield3:)}ms;

    Which performs as I would expect - dramatically better. Thanks for your help!