in reply to In place editing of text files
You are getting the funny output because you are using buffered I/O on the file and you are neither flushing the buffers nor setting the file position when you switch from reading to writing.
When you read the first line from the file, perl actually reads a buffer full of data. Your file is very small so it fits in the buffer and is read in a single read from the system. Perl then returns the first line from the buffer to your program, leaving the file position at the next character: the start of the second line. The buffer contains the entire file:
Dit is een voorbeeldtekst The quick brown fox jumps over the lazy dog. SOURCEREPOSITORYNAME Roses are red, violets are blue And Osama Is coming To Kill you
This line does not match your RE, so you then write this line back to the same file handle. This writes the data back to the buffer (but not yet to disk). Since the current file position is the start of the second line, the string (a copy of the first line of the file) overwrites the second line in the buffer.
The number of characters overwritten is two more than the number of characters on the line. This is because, in addition to the characters you see on the line, you have line termination. On Windows (I deduce you are on Windows from the path of your file) line termination is two characters: carriage return and line feed.
Count the characters in the first line and add two and move this many characters into the second line and you will see that you have just overwritten the second line up to and including the 'o' of over, introducing a new line termination. This leaves the current file position at the 'v' of over on what was the second line. The buffer now contains:
Dit is een voorbeeldtekst Dit is een voorbeeldtekst ver the lazy dog. SOURCEREPOSITORYNAME Roses are red, violets are blue And Osama Is coming To Kill you
Next you read another line from the file. This reads from the modified buffer, starting at the current file position, which is at the 'v' of "over". You get all the text up to the next end of line: thus you get only the remainder of what was the second line of the file. Now the current file position is at the beginning of what was the third line. I say "what was" because new line terminations are being introduced, changing the total number of lines. The buffer content hasn't changed as a result of this read but the current position has.
Again, this "line" of text does not match your RE, so you write it back to the same file handle. You are now overwriting the third line with the text of a portion of the second line. It happens that you overwrite everything up to and including the 'M' near the end of the third line. This leaves the current file position at the 'E' near the end of the third line (remember the line termination: the last printing character is not quite the end of the line). The buffer now contains:
Dit is een voorbeeldtekst Dit is een voorbeeldtekst ver the lazy dog. ver the lazy dog. E Roses are red, violets are blue And Osama Is coming To Kill you
Next you read the remainder of what was the third line. Again, this doesn't match your RE and you write it out. Now you have overwritten the beginning of what was your fourth line. In this case, the first three characters. Now your buffer contains:
Dit is een voorbeeldtekst Dit is een voorbeeldtekst ver the lazy dog. ver the lazy dog. E E es are red, violets are blue And Osama Is coming To Kill you
Then the same with the remainder of the fourth line and fifth line, after which the buffer contains:
Dit is een voorbeeldtekst Dit is een voorbeeldtekst ver the lazy dog. ver the lazy dog. E E es are red, violets are blue es are red, violets are blue u u
Since the entire file still fits within the buffer, no I/O has been done to disk. All your I/O, since the initial read of the file from disk, has been contained within perl and has been updating perl's file position without changing the system's idea of current file position at all.
When you close the file handle your buffer is flushed to disk. But where does it write it???
At the system level, remember, perl read the entire file in a single read, filling its buffer. This left the system with a file open for read/write and positioned at the end of the file. Now perl comes along and writes its buffer full of data. Since you haven't done a seek to change the current file position, the buffer is written just past the end of the initial content, effectively appending the mixed up buffer to the end of the file.
Now you print your file and see the odd content that you posted.
There are other issues with alternating between read and write. You can read about some of them in open, seek and Mixing Reads and Writes. It is sometimes, but not often, the right thing to do.
Most of the time it is better to write a new file then, after closing both the original and new files, replace the original with the new file. This is what the '-i' option does.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: In place editing of text files
by jevaly (Sexton) on May 28, 2009 at 14:21 UTC | |
by ig (Vicar) on May 29, 2009 at 00:15 UTC |