Re^3: In place editing without reading further

If the use case demands it, then so be it.

However, it sounds like you don't actually have any hard numbers. Give it a try both ways and see how long it really takes! Benchmarks are far better than random numbers pulled out of ... the air.

You can also consider a two-pass system, where you do the in-place option first, and if $valueLength was too short, then rewrite the file after you have finished all the easy files.

Either way, benchmark it! Tell us how long it actually takes.

Comment on Re^3: In place editing without reading further

Replies are listed 'Best First'.

Re^4: In place editing without reading further
by trippledubs (Deacon) on Jan 29, 2015 at 15:24 UTC

5gb - 32k	2m24.716s	2m25.723s	2m24.012s
5gb - 64k	2m23.235s	2m25.939s
5gb - 128k	2m18.724s

11gb - 32k	5m48.613s	5m50.557s	5m55.207s
11gb - 128k	5m38.264s	5m29.513s	5m38.922s

15.5gb 128k

9m31.711s

7m45.154s

9m32.641s

Beefy server with SAN storage

14gb - 64k	2m16.941s	2m40.087s	2m30.454s
14gb - 128k	2m14.720s	2m22.201s	2m26.875s

We roughly judge the penalty for failure at about 40 minutes and discard the home server results. The script penalty is about 2 and a half minutes of time and suppose the payoff is a failure rate of 0%. So I interpret loosely this to mean that, if the edit in place script fails more than once out of every sixteen runs, it is not worth running. If it fails less than once out of every sixteen runs, it is worth the risk of damaging the file, and having to redo everything.

14gb is a good estimate for how large these files will be, but when they become smaller it looks very risky to make the edit in place since the savings become smaller and the time penalty will not decrease proportionally.

[reply]

Re^4: In place editing without reading further
by trippledubs (Deacon) on Jan 28, 2015 at 20:21 UTC

HEADER
a1
a2
a3
a4
a5
a6
a7
a8
a9
a10
END HEADER
b1
b2
c1
c2
..
z1
z2
[download]

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dump;

open (my $fh, 'message.txt') or die $!;

LINE: while (<$fh>) {
   last LINE if /END HEADER\s\w/s;
}

my $headerEndingPositionInBytes = tell($fh);

print "Found header ending at $headerEndingPositionInBytes\n";
sysseek $fh,0,0; # Rewind to beginning of file

my $header;
my $bytesRead = sysread $fh, $header, $headerEndingPositionInBytes;

print "Read $bytesRead into header variable\n";

my @lines = split '\n', $header;
for (0..$#lines) {
   $lines[$_] =~ s/^a5$/new magic/;
}
$header = join "\n",@lines;

open (my $newFile, '>','message-fixed.txt') or die "$!";
syswrite($newFile, $header); # Write the header


my $blockSize = 32 * 1<<10; #32k
my $window;
while (my $bytesRead = sysread $fh, $window,$blockSize) {
   syswrite $newFile, $window, $blockSize;
}
syswrite $newFile, "\n";
[download]

[reply]
[d/l]
[select]