Read offset into other files

gio001 has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am encountering some issue in trying to do the following in ksh88, therefore I want to explore a smart perl approach, but my knowledge is limited, can you help?
I have 2 files, the first (indexFile1) contains lines with start offset and length for each record inside the second file, so just 2 numbers separated by a space. The second file can be very large, each actual record start offset and length is defined by the entry in indexFile1. Since there are no records separators wc-l returns 0 for the second file, no matter how large its size actually is.
I want to gather all the records from the large file one at a time and write them out to a new file individually. What is the best way to approach this processing?
I suspect I will have trouble in ksh88 reading a whole large file into a variable (using awk) and then use a cut command on the variable contents to collect my record in the form:

FileContent=$(awk '{print $0}' largeFile2) # this is where I think I h
+ave a problem :

# LINE contains start and offset identifying each record in largeFile2

while read LINE;do 
pass=1
for results in $LINE; do
if [[ $pass -eq 1 ]];then 
from=$results
pass=2
else to=$results
fi
done
(( from=$val1+1 ))
(( to=$val1+$val2 ))
newOut=$(echo $FileContent|cut -c $from-$to)
echo $newOut >> newfile
done < indexFile1
[download]

I have it working ok for small sizes of largeFile2. I can see a problem when the size of file2 gets large. I hope you can give me some suggestions on how to do this better. Thanks!

Comment on Read offset into other files Download Code

Replies are listed 'Best First'.
Re: Read offset into other files by GrandFather (Saint) on Oct 18, 2008 at 05:07 UTC
Yup, your problem starts right at the first line as you suspect. What you have is not Perl even though you state you "want to explore a smart perl approach". Here is a sample Perl program that demonstrates the techniques you need to implement your task: use warnings; use strict; use Data::Dump::Streamer; my $str = <<SNIPS; Hello all, I am encountering some issue in trying to do the following +in ksh88, therefore I want to explore a smart perl approach, but my knowledge is + limited, can you help? I have 2 files, the first (indexFile1) contains lines with start offset and length for each record inside the second file, so just 2 nu +mbers separated by a space. ... SNIPS open my $recData, '<', \$str; while (<DATA>) { chomp; my ($start, $len) = split; my $segment; seek $recData, $start, 0; read $recData, $segment, $len; print "$segment\n"; } __DATA__ 16 12 80 9 315 10 [download] Prints: `encountering therefore separated` [download] Note, to avoid needing external files I've used a string to provide one input "file" and provided the other as data following the body of the script. Note too that I tested this under Windows so you may get different output if your OS uses different line end characters than Windows does. Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re: Read offset into other files by BrowserUk (Patriarch) on Oct 18, 2008 at 07:38 UTC
As a one-liner: `perl -s -nle"BEGIN{open BIG};($undef,$l)=split;read BIG,$data,$l;print + $data" -- -BIG=bigfile.dat index.dat > outfile.dat` [download] All one one line. Switch "s to 's on *nix. Update: A slightly shorter version `perl -sple"BEGIN{open BIG};($undef,$l)=split;read(BIG,$_,$l)" -- -BIG=bigfile.dat index.dat >outfile.dat` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Read offset into other files by gio001 (Acolyte) on Oct 18, 2008 at 15:35 UTC
It is amazing how simple things become if you know how to use the right tools! Will this one liner process all the entries in the index file gathering and putting out strings from the bigfile without a need for a loop or a while? Please help me understand. Also there is no use of the offset value, am I reading this right, will the read head keep moving automatically inside the bigfile, forward from the last read? Thanks again.	[reply]
Re^3: Read offset into other files by BrowserUk (Patriarch) on Oct 18, 2008 at 16:53 UTC
Will this one liner process all the entries in the index file gathering and putting out strings from the bigfile without a need for a loop or a while? Yes. The loop is invoked by the `-p` option on the command line. This tells perl to read the file given as a command line argument (index.dat above) into `$_` and then print it to stdout. The code in the `-e` takes the contents of $_, splits it to extract the length, reads that number of bytes from the filehandle BIG, overwriting `$_`. This is then (implicitly) printed with a newline due to the `-l` switch, and redirected to the output file by the command line processor. The `-s` switch tells perl to parse the command line for options in the form -XXX=yyy. This creates a variable named XXX with the value yyy. The `BEGIN{}` block uses a one-arg open to open the file for input (using the value of the BIG as the filename and storing the filehandle the glob `BIG`). The `--` is required to allow Perl to differentiate between the options intended for use by perl itself, and those (`-BIG=bigfile.dat`) intended for use by the "script" (`-e"..."`) in this case. See perlrun for a better explanation of all the switched than I can give. Also there is no use of the offset value, am I reading this right, will the read head keep moving automatically inside the bigfile, forward from the last read?* Exactly. You are essentially just reading the file sequentially. The only extra information you need, is how many bytes constitute each record. Perl may have some weird nooks and crannies, but they're all there for very good reasons :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]