I'm still working on my survey loading program, but now I've moved on to parsing the survey data files. This may be trivial to some, but I've never really done file parsing before.

The SEG-P1 format specifies that survey headers should be composed of lines that would be matched by the regex /^H/. Unfortunately not all survey companies adhere to this, only putting the 'H' at the start of the first header line. Also, it seems some places make 20-line headers, while others make 22-line headers.

I have two problems, but this may be able to solve both. My question is this: How can I parse out the header block correctly each time, regardless of the length or formatting? I include one example of each type of header (not looking at number of lines here) below.

First the format-specified version:

HLINE NUMBER : ABCDE HPROJECT ID : HGROUP : HAREA NAME : ********* HOPERATOR : ********* HCONTRACTOR : ENERTEC HSURVEY AUDITOR : ACCU-AUDIT HSURVEY DATE : ********* HUTM ZONE : 11 HSURVEY QUALITY : ASCM,1 HCOMMENTS : ********* H : H : H : HLINE LENGTH (Km): 2.65 HGRID VERSION : ATS 2.6 HDATUM : NAD 27 HAUDIT DATE : ********* H<....IDENTIFICATION....> <...GEOGRAPHICS...><.....UTMS.....> H<.....LINE.....><..SP..>I<..LAT..><..LONG..><.EAST.><.NORT.><ELV><COM +MENT>

Now the variant version:

HLINE NUMBER : ABCDE PROJECT ID : GROUP : AREA NAME : ********* OPERATOR : ********* CONTRACTOR : ENERTEC SURVEY AUDITOR : ACCU-AUDIT SURVEY DATE : ********* UTM ZONE : 11 SURVEY QUALITY : ASCM,1 COMMENTS : ********* : : : LINE LENGTH (Km): 2.65 GRID VERSION : ATS 2.6 DATUM : NAD 27 AUDIT DATE : ********* <....IDENTIFICATION....> <...GEOGRAPHICS...><.....UTMS.....> <.....LINE.....><..SP..>I<..LAT..><..LONG..><.EAST.><.NORT.><ELV><COM +MENT>

The actual survey data (point coordinates) come starting on the line after the last line above.

Here's the code I have for getting the first (I'll call it "proper") version (for some reason I can't see, chomping wouldn't work, but push works well enough for me):

while (<IN>) { if (/^H/) { ## Assumes all header lines start with 'H' push(@hdr, $_); next; ## skip to next (possibly header) line } ## ## Capture each line of data in file ## }

What can I do to make this work for both kinds of headers?

Update: Here's one more sample header:

H CLIENT : ********** + H PROSPECT : ******* + H CONTRACTOR : ***** LINE NAME : ******* + H SURVEY CO. : ************ UNIQUE ID : ******* + H SURVEY DATE : DEC 1977 ORIG.LINE NAME : ******* + H SURVEYOR : _N/A ENERGY SOURCE : DYNAMITE + H -------------------------------------------------------------------- +---------- H PRODUCED BY : DIVESTCO GEOMATICS FIRST SP : 101 + H WEBSITE : ********************** LAST SP : 222 + H EMAIL : ********************** LINE LENGTH : 8.003 K +M H DATE : ************ PROJECT NUMBER : + H JOB NUMBER : ************ AFE NUMBER : ********* +*** H FILE NAME : ******** CLIENT REFERENCE : ******* + H MAPSHEET : ************* DATUM : NAD 1983 - Canada + H ZONE : Z11N : 117W SOURCE INT.: *** F STN INT.: +*** F H GRID REF. : ATS 4.1 HTKO : + H UNITS : Decimeters VTKO : + H ELLIPSOID : GRS 1980 SURVEY QUALITY CODE : ********* +** H DATA QUALITY : Transcription 2D + H<LINE NAME ><POINT >< LAT >< LONG >< EAST ><NORTH ><ELE>< +>< ><>


In reply to Survey file parsing by YYCseismic

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.