Re: Insert at regular increments
by broquaint (Abbot) on Jul 22, 2002 at 19:45 UTC
|
substr($_, 321, 1) = "\n"
substr() is the closest thing to referencing strings by index in perl (esp. thanks to it's lvalueability).
HTH
_________ broquaint | [reply] [d/l] |
|
|
Thanks. I didn't know you could use substr as an lvalue, that's very handy. One comment: I think I'd need to say
substr($_, 321, 0) = "\n"
to avoid replacing some poor unsuspecting character with the newline.
And one question: how would I use this approach to add newlines after every record in the file (i.e. after every 320 characters)?
| [reply] [d/l] |
|
|
how would I use this approach to add newlines after every record in the file (i.e. after every 320 characters)?
Take advantage of the $/ variable
{
open(my $data, '<', "data_file") or die("ack - $!");
open(my $out, '>', "out_file") or die("ack - $!");
local $/ = \320;
while(<$data>) {
chomp;
print $out $_ . "\n";
}
}
HTH
_________ broquaint | [reply] [d/l] |
|
|
|
|
|
|
|
|
|
Re: Insert at regular increments
by dws (Chancellor) on Jul 22, 2002 at 19:45 UTC
|
This seems to work, but I'm not crazy about the solution. The s/// wasn't intuitive to me, but it was the first solution I found.
It may not be intuitive to you, but it's a fine solution, and one that wouldn't offend the sensibilities of most Perl hackers.
| [reply] |
Re: Insert at regular increments
by sauoq (Abbot) on Jul 22, 2002 at 22:14 UTC
|
Using s/// for this is actually pretty straight forward and readable.
I'd reconsider your seek() though. Consider that if you don't have any newlines in your file, your code reads the whole file after one iteration of your for (<in>) loop. So, just use it... there's no need to reread the file. Something like:
$_ = <IN>;
unless (tr/\n//) {
s/(.{320})/$1\n/g;
}
print OUT;
-sauoq
"My two cents aren't worth a dime."
| [reply] [d/l] [select] |
Re: Insert at regular increments
by fokat (Deacon) on Jul 23, 2002 at 04:09 UTC
|
You did not mention anything about portability, but having been involved in this recently, I want to help fellow monks avoid this pitfall.
Probably you want to avoid the local concept of a "line terminator" causing you trouble. So whenever you need to deal with things like this, use
binmode. This will prevent Perl from stripping or altering the line breaks when you're on a different system from the one used to create the file in the first place.
Regards. | [reply] |
Re: Insert at regular increments
by Abigail-II (Bishop) on Jul 23, 2002 at 12:49 UTC
|
Does that mean that "good" lines are 322 characters long?
That is, 320 characters + CR LF? Because if it is, any
solution that blindly splits on 320 characters, be it by
using //, substr or by setting
$/ = \320 will fail on files that have good
lines in them.
Abigail | [reply] [d/l] [select] |
|
|
Thanks, Abigail. You're correct: "good" lines should be 322 characters, after inserting the CR LF. Your warning is right on, and it's why I wanted to test whether the file already had any newlines before altering it. From what I saw in the sample data I was given, it was an all-or-nothing problem: the file was either already "good", with all the newlines in the right locations, or else it was missing newlines completely, and needed the \n inserted after every 320 chars.
Today, things have changed a bit (sigh). I'm told the files I'm working with *sometimes* have LFs but not CRs, although I don't have samples like this. Also, the record separator in the "good" files appears to be "CR CR LF" instead of "CR LF", for whatever reason. (These files are output from a buggy 3rd-party program, which we're not likely to get patched.)
I'll need to take these oddities into account, and it sounds like I can't make any assumptions about the files I've got to work with. I'm going to try to work up something that's very generalized, simply makes sure each line of each file is 320 chars with the proper record separator, and warns me if it sees an oddball I didn't know about. You've all given me ideas for nice approaches to this. Much thanks to everyone!
-Peter
| [reply] |
Re: Insert at regular increments
by mugwumpjism (Hermit) on Jul 23, 2002 at 12:12 UTC
|
Here's one very perlish way to do it, and remember to test for your `sanity' conditions - in your above code you are assuming that each input line is an even multiple of 320 characters long - the very least you should do is die() if that is not the case.
open IN, $file
or die "Cannot open $file for reading; $!";
my $c = 0;
my @lines = map {
$c++;
chomp;
die "Input line $c in $file irregular:\n$_\n"
if (length % 320);
m/(.{320})/g
} <IN>;
close IN;
if (@lines) {
open OUT, ">$file.new"
or die "Can't open $file.new for output; $!";
print $_,"\n" foreach (@lines);
close OUT;
}
| [reply] [d/l] |
Re: Insert at regular increments
by Aristotle (Chancellor) on Jul 23, 2002 at 14:25 UTC
|
use constant LEN => 320;
use constant DELIM => "\r\n";
my ($len, $offs) = (LEN, 0);
while(read IN, $_, $len, $offs) {
print OUT $_, "\n";
read IN, $_, length DELIM;
($len, $offs) = $_ eq DELIM ? (LEN, 0) : (LEN - length DELIM, leng
+th DELIM);
}
____________ Makeshifts last the longest. | [reply] [d/l] |
Re: Insert at regular increments
by zengargoyle (Deacon) on Jul 23, 2002 at 20:19 UTC
|
what characters are allowed in the 320? it may be possible to slurp in the whole file, remove ANY CR or LF characters (a partially correct file is now the same as the 'missing all terminators' file). then you only have to worry about fixing up one type of file.
| [reply] |