YYCseismic has asked for the wisdom of the Perl Monks concerning the following question:
I'm still working on my survey loading program, but now I've moved on to parsing the survey data files. This may be trivial to some, but I've never really done file parsing before.
The SEG-P1 format specifies that survey headers should be composed of lines that would be matched by the regex /^H/. Unfortunately not all survey companies adhere to this, only putting the 'H' at the start of the first header line. Also, it seems some places make 20-line headers, while others make 22-line headers.
I have two problems, but this may be able to solve both. My question is this: How can I parse out the header block correctly each time, regardless of the length or formatting? I include one example of each type of header (not looking at number of lines here) below.
First the format-specified version:
HLINE NUMBER : ABCDE HPROJECT ID : HGROUP : HAREA NAME : ********* HOPERATOR : ********* HCONTRACTOR : ENERTEC HSURVEY AUDITOR : ACCU-AUDIT HSURVEY DATE : ********* HUTM ZONE : 11 HSURVEY QUALITY : ASCM,1 HCOMMENTS : ********* H : H : H : HLINE LENGTH (Km): 2.65 HGRID VERSION : ATS 2.6 HDATUM : NAD 27 HAUDIT DATE : ********* H<....IDENTIFICATION....> <...GEOGRAPHICS...><.....UTMS.....> H<.....LINE.....><..SP..>I<..LAT..><..LONG..><.EAST.><.NORT.><ELV><COM +MENT>
Now the variant version:
HLINE NUMBER : ABCDE PROJECT ID : GROUP : AREA NAME : ********* OPERATOR : ********* CONTRACTOR : ENERTEC SURVEY AUDITOR : ACCU-AUDIT SURVEY DATE : ********* UTM ZONE : 11 SURVEY QUALITY : ASCM,1 COMMENTS : ********* : : : LINE LENGTH (Km): 2.65 GRID VERSION : ATS 2.6 DATUM : NAD 27 AUDIT DATE : ********* <....IDENTIFICATION....> <...GEOGRAPHICS...><.....UTMS.....> <.....LINE.....><..SP..>I<..LAT..><..LONG..><.EAST.><.NORT.><ELV><COM +MENT>
The actual survey data (point coordinates) come starting on the line after the last line above.
Here's the code I have for getting the first (I'll call it "proper") version (for some reason I can't see, chomping wouldn't work, but push works well enough for me):
while (<IN>) { if (/^H/) { ## Assumes all header lines start with 'H' push(@hdr, $_); next; ## skip to next (possibly header) line } ## ## Capture each line of data in file ## }
What can I do to make this work for both kinds of headers?
Update: Here's one more sample header:
H CLIENT : ********** + H PROSPECT : ******* + H CONTRACTOR : ***** LINE NAME : ******* + H SURVEY CO. : ************ UNIQUE ID : ******* + H SURVEY DATE : DEC 1977 ORIG.LINE NAME : ******* + H SURVEYOR : _N/A ENERGY SOURCE : DYNAMITE + H -------------------------------------------------------------------- +---------- H PRODUCED BY : DIVESTCO GEOMATICS FIRST SP : 101 + H WEBSITE : ********************** LAST SP : 222 + H EMAIL : ********************** LINE LENGTH : 8.003 K +M H DATE : ************ PROJECT NUMBER : + H JOB NUMBER : ************ AFE NUMBER : ********* +*** H FILE NAME : ******** CLIENT REFERENCE : ******* + H MAPSHEET : ************* DATUM : NAD 1983 - Canada + H ZONE : Z11N : 117W SOURCE INT.: *** F STN INT.: +*** F H GRID REF. : ATS 4.1 HTKO : + H UNITS : Decimeters VTKO : + H ELLIPSOID : GRS 1980 SURVEY QUALITY CODE : ********* +** H DATA QUALITY : Transcription 2D + H<LINE NAME ><POINT >< LAT >< LONG >< EAST ><NORTH ><ELE>< +>< ><>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Survey file parsing
by punch_card_don (Curate) on Jun 27, 2008 at 17:56 UTC | |
by YYCseismic (Beadle) on Jun 27, 2008 at 18:09 UTC | |
|
Re: Survey file parsing
by jds17 (Pilgrim) on Jun 27, 2008 at 17:58 UTC | |
by YYCseismic (Beadle) on Jun 27, 2008 at 22:02 UTC | |
|
Re: Survey file parsing
by johngg (Canon) on Jun 27, 2008 at 18:44 UTC | |
by YYCseismic (Beadle) on Jun 27, 2008 at 19:42 UTC | |
by YYCseismic (Beadle) on Jun 27, 2008 at 22:06 UTC | |
|
Re: Survey file parsing
by samtregar (Abbot) on Jun 27, 2008 at 17:57 UTC | |
|
Re: Survey file parsing
by YYCseismic (Beadle) on Jun 27, 2008 at 20:53 UTC |