Ok, here goes, I am working on a project to parse a continous line 3-7Meg text file. In Win2k. Yargh, I know. It is evil.
I have successfully found a way to parse the file, and figure for incorrect entries. No Problem.
I have successfully found out how to do updates to it (As the humongous nastiness gets updated constantly). However, the update process takes just as long as the initial build process.
What has worked so far is to reparse the evil file, comparing each entry to the last valid(parsed) entry in the good file. This, as I am sure you are aware, is lengthy.
What I tried to do is build in a binary split. I figure half the size of the file in bytes, and attempt to read() my next entry from this position. I get an Out of Memory message. I realise it is a retarded situation and I am making a stupid grevious error, but please help!
Of course my file is open, filepointer positioned at the beginning and $Size is the size of file in bytes. (I am betting $Size is my problem)Also $MonDay and $Year also have valid entries.
$Target = $Size / 2;
read LIST, $NewString, $EntryLength, $Target;
$NewString = substr($NewString, $Target);
&Verify;
$CmpMonDay = substr($NewString, 16, 4);
$CmpYear = substr($NewString, 20, 4);
#This *should* split the find time in half. I hope.
#The following tree is used for the a binary split.
if ($CmpYear == $Year && $MonDay > $CmpMonDay) {
$PointerStart = $Target;
}
elsif ($Year > $CmpYear) {
$PointerStart = $Target;
}
$Start = $PointerStart;
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.