Re: Re: Re: search/replace very large file w/o linebreaks

if you're reading in chunks you might aswell use sysread:

sub BLOCKLENGTH () { 1 << 12 }; # TIMTOWTDI =)

$_ = '';
while(sysread STDIN, $_, BLOCKLENGTH, length){
     s///g; # you know
     syswrite STDOUT, substr($_, 0, -BLOCKLENGTH, ''); # the fourth ar
+gument to substr will replace 0 .. -BLOCKLENGTH
};

syswrite STDOUT, $_;
[download]

It is more suited for the task, takes a bit less memory, and might even be faster if your stdio is stoned.

Personally, i think that perl -pe 'BEGIN{ $\ = "\n"; $/ = "tag" } chomp; s/tag2/\t/g; print' < infile > outfile is the nicest way.

Update: I thought some explanation was appropriate.

The notion of what is a line is pretty flexible, and has to be (computers in general and specifically in perl). A line, traditionally, ended in a carrige return and a line feed, in one order or another. Windoze still uses that. MacOS uses only CRs, ~UNIX does only LF (i might be confused). The one byte solution is somwhat simpler. But since you need to support two bytes in case they come, why not support everything.

Enters the concept of a record.

Treating a line as a record, with either a fixed length ($\ = \ 123), or one ending with a certain string ($\ = "\n" is for a record which is also a line on your native system) adds the flexibility to do something like you wanted quite easily. You're translating a record format that ends in a certain string, to one that ends with newlines. $\ is the output record seperator, BTW.

-nuffin
zz zZ Z Z #!perl

Comment on Re: Re: Re: search/replace very large file w/o linebreaks Select or Download Code

Replies are listed 'Best First'.
Re: Re: Re: Re: search/replace very large file w/o linebreaks by borisz (Canon) on Jan 09, 2004 at 11:48 UTC
But the `sysread` example sucks whenever a tag is inbetween your BLOCKLENGTH. Update:I was wrong and promise to read the snippsets more carefully. Want some XP? Ok! -- Boris	[reply] [d/l]
Re: Re: Re: Re: Re: search/replace very large file w/o linebreaks by nothingmuch (Priest) on Jan 09, 2004 at 11:51 UTC
ysth's example and subsequently mine both use substr to work around it. -nuffin zz zZ Z Z #!perl	[reply]
Re: Re: Re: Re: Re: search/replace very large file w/o linebreaks by ysth (Canon) on Jan 11, 2004 at 03:13 UTC
That's worked around by scanning each piece of the file twice (except for the very beginning and end), once with more data before it and once with more data after it. If this is a run-once-only problem, trying using a tag as the input record separator is the way to go, and only if that fails with out-of-memory go to another approach. One other way would be to pick a common character or string from the file that won't appear in any tag (or appears at the end of any tags it is in). For instance, ' ', '>', or 'the' might work, given appropriate data.	[reply]
Re: Re: Re: Re: Re: Re: search/replace very large file w/o linebreaks by nothingmuch (Priest) on Jan 11, 2004 at 13:22 UTC
I just upvoted this node, and saw it reach zero. For something obscene, rude, or wrong it may be reasonable to downvote. But if a node is as coherent, as the one this is a reply to, please explain your motives. It's really frustrating! -nuffin zz zZ Z Z #!perl	[reply]
Re: Re: Re: Re: Re: Re: Re: search/replace very large file w/o linebreaks by ysth (Canon) on Jan 11, 2004 at 17:39 UTC