Hello. It's my first time here, and I'm new to Perl (and all programming, too). I'm trying to extract coordinates from lots (about 15 GB) of small (3-5 kB) text files. Then I'll need to append them into an SQL database. Until now I've got to the regex part, and I'm stuck.

The files I need to process are of this format:

DER.7-767/04.7 5194.5700 -6772.5200 0.0000 DER.7-767/04.8 5194.7400 -6776.3200 0.0000 DER.7-767/04.9 5192.1000 -6776.4300 0.0000 Der.7-539/99.1 5337.9000 6997.1200 0.0000 Der.7-539/99.10 5348.3300 -7020.0900 0.0000 Der.7-539/99.11 5348.4400 -7021.1100 0.0000 Kredyt3.27 5789322.3040 7500854.8800 0.0000 Kredyt3.27a -124.9646 373.4666 0.0000 Kredyt3.28 5789295.3170 7500857.7380 0.0000 Kredyt3.28a -151.9768 376.3191 0.0000 Kredyt3.29 5789298.8620 7500874.6180 0.0000 Kredyt3.29a -148.4337 393.2154 0.0000 Kredyt3.2a -63.0262 297.6930 0.0000 Kredyt3.3 5789369.8750 7500785.7170 0.0000 Kredyt3.30 5789303.2010 7500873.9300 0.0000 Kredyt3.30a -144.0905 392.5281 0.0000 Kredyt3.31 5789302.7240 7500869.9080 0.0000 Kredyt3.31a -144.5668 388.5023 0.0000 Kredyt3.32 5789307.5930 7500869.2210 0.0000 Kredyt3.32a -139.6932 387.8161 0.0000 Kredyt3.33 5789307.9110 7500871.6550 0.0000 Kredyt3.33a -139.3756 390.2524 0.0000

And my code so far is this:

#!/usr/bin/perl ##parser3.plx use strict; use warnings; use diagnostics; #call up variables my(@array); my($file, $filename1, $filename2, $line, $i, $tmp); $i = 1; $filename2 = ''; #opens a list of the files that need processing open LIST, 'listA' or die "(L1)We've got a problem: $!"; while (<LIST>) { $filename1 = <LIST>; fileswitch(); $file = $_; #opens the actual file that will be processed open FILE, "$file" or die "(L2)We've got a problem: $!"; while (<FILE>) { $line = $_; #-----------------$1----$2-----$3------$4---$5------$6 if($line =~ /\s+([-*])(\d+)\.(\d+)\s+([-*])(\d+)\.(\d+)\s*/g +) { #Append the coordinates (with the '-' sign where appropr +iate) push(@array, "$1$2.$3 || $4$5.$6 \n"); } } #close file close(FILE); } #close list close(LIST); #print to file p2f(); #check if path stays the same, if not then append a note to the databa +se sub fileswitch { if($filename1 ne $filename2) { push(@array, "Path: $filename1\n"); #print "\n $filename1 \n" ; #print (".pkt file parsing completed. \n"); } else { $filename2 = $filename1; } } #print to file sub p2f { open (COORDLIST, '>>coordinates'); print COORDLIST @array; close(COORDLIST); }

And when I check the file it is supposed to print to, I find this:

Path: #sorry I won't leave those :) -52157127.9760 || -2989955.5568 -52158244.6810 || -6741268.4549 -52157681.8715 || -1698959.4033 -50440239.8128 || -1701475.3622 -50441191.7990 || -1705583.1112 -57952305.4315 || -7490163.6682 -52157134.5720 || -27730.8039 Path: #sorry I won't leave those :) Path: #sorry I won't leave those :)

Of course, the tests are on a smaller scale, just 40kB of data, but still it should give me at least 20 000 lines.

Oh and when I checked by adding 'print' codes everywhere it actually did go into the 'if' statement 7 or 8 times. So my guess would be that it's my regular expression skills at fault, but I just can't find what's wrong. It seems to match the pattern the coordinates represent.

Anyway, thanks in advance.

--Ignas


In reply to Extracting coordinates by Ignas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.