Hello,
I have to parse several large text files and enter the results in to a database. Each text file is 65 to 70 thousand pages long. I need a jumpstart getting the text into arrays or hashes and then I think I can take it from there. I used sed to remove garbage from the file, but I am unsure where to go from there. I started to use IO::File to read in and split the file, but that didn't seem to go anywhere. Any help would be greatly appreciated.
Here is a sample record from the file:
VENDOR 61125 TOTAL DOLLAR VAR 77,097.60 PAGE 1 2003 08 01
VENDOR SIS UNIT BASE SHIP TOT DOL DOLLAR PERCENT
CONTRACT NUMBER PRICE PRICE QTY U/I DATE
+ PR NUMBER BIN/PART NUMBER VALUE VARIANCE VARIANCE
YT67DY7898DUFT5126 88.20000 70.00000 50 EA 0000000
+0 POI90809819856 1560007117067 4,410.00 910.00 0
AWARD HISTORY PIIN BSCM N/A U/I UNIT PRI
+CE AWD DT QTY OPT DT FOB REP TYPE
765WTY34TF56A 7J777 N EA 39.5
+5000 93012 147 00000 2 Y B
PID DATA LINE NR
+ LINE NR
01 001PART, DESCRIPTION, DATA
+ 02 002TECHNICAL DATA AVAILABILITY:
03 003
The above record format repeats until EOF. The award history section repeats an undefined number of times for each main record.
In case the above record is hard to read here is the basic format of the file.
Each record looks like:
Header (Line with the VENDOR 61125)
Main Record (a bunch of columns)
A number of sub records (more columns)
Footer (Yet more columns, everything from PID down)
Thanks
-Shawn
\
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.