comment on

if you're reading in chunks you might aswell use sysread:

sub BLOCKLENGTH () { 1 << 12 }; # TIMTOWTDI =)

$_ = '';
while(sysread STDIN, $_, BLOCKLENGTH, length){
     s///g; # you know
     syswrite STDOUT, substr($_, 0, -BLOCKLENGTH, ''); # the fourth ar
+gument to substr will replace 0 .. -BLOCKLENGTH
};

syswrite STDOUT, $_;
[download]

It is more suited for the task, takes a bit less memory, and might even be faster if your stdio is stoned.

Personally, i think that perl -pe 'BEGIN{ $\ = "\n"; $/ = "tag" } chomp; s/tag2/\t/g; print' < infile > outfile is the nicest way.

Update: I thought some explanation was appropriate.

The notion of what is a line is pretty flexible, and has to be (computers in general and specifically in perl). A line, traditionally, ended in a carrige return and a line feed, in one order or another. Windoze still uses that. MacOS uses only CRs, ~UNIX does only LF (i might be confused). The one byte solution is somwhat simpler. But since you need to support two bytes in case they come, why not support everything.

Enters the concept of a record.

Treating a line as a record, with either a fixed length ($\ = \ 123), or one ending with a certain string ($\ = "\n" is for a record which is also a line on your native system) adds the flexibility to do something like you wanted quite easily. You're translating a record format that ends in a certain string, to one that ends with newlines. $\ is the output record seperator, BTW.

-nuffin
zz zZ Z Z #!perl

In reply to Re: Re: Re: search/replace very large file w/o linebreaks by nothingmuch
in thread search/replace very large file w/o linebreaks by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.