Ignas has asked for the wisdom of the Perl Monks concerning the following question:
Hello. It's my first time here, and I'm new to Perl (and all programming, too). I'm trying to extract coordinates from lots (about 15 GB) of small (3-5 kB) text files. Then I'll need to append them into an SQL database. Until now I've got to the regex part, and I'm stuck.
The files I need to process are of this format:
DER.7-767/04.7 5194.5700 -6772.5200 0.0000 DER.7-767/04.8 5194.7400 -6776.3200 0.0000 DER.7-767/04.9 5192.1000 -6776.4300 0.0000 Der.7-539/99.1 5337.9000 6997.1200 0.0000 Der.7-539/99.10 5348.3300 -7020.0900 0.0000 Der.7-539/99.11 5348.4400 -7021.1100 0.0000 Kredyt3.27 5789322.3040 7500854.8800 0.0000 Kredyt3.27a -124.9646 373.4666 0.0000 Kredyt3.28 5789295.3170 7500857.7380 0.0000 Kredyt3.28a -151.9768 376.3191 0.0000 Kredyt3.29 5789298.8620 7500874.6180 0.0000 Kredyt3.29a -148.4337 393.2154 0.0000 Kredyt3.2a -63.0262 297.6930 0.0000 Kredyt3.3 5789369.8750 7500785.7170 0.0000 Kredyt3.30 5789303.2010 7500873.9300 0.0000 Kredyt3.30a -144.0905 392.5281 0.0000 Kredyt3.31 5789302.7240 7500869.9080 0.0000 Kredyt3.31a -144.5668 388.5023 0.0000 Kredyt3.32 5789307.5930 7500869.2210 0.0000 Kredyt3.32a -139.6932 387.8161 0.0000 Kredyt3.33 5789307.9110 7500871.6550 0.0000 Kredyt3.33a -139.3756 390.2524 0.0000
And my code so far is this:
#!/usr/bin/perl ##parser3.plx use strict; use warnings; use diagnostics; #call up variables my(@array); my($file, $filename1, $filename2, $line, $i, $tmp); $i = 1; $filename2 = ''; #opens a list of the files that need processing open LIST, 'listA' or die "(L1)We've got a problem: $!"; while (<LIST>) { $filename1 = <LIST>; fileswitch(); $file = $_; #opens the actual file that will be processed open FILE, "$file" or die "(L2)We've got a problem: $!"; while (<FILE>) { $line = $_; #-----------------$1----$2-----$3------$4---$5------$6 if($line =~ /\s+([-*])(\d+)\.(\d+)\s+([-*])(\d+)\.(\d+)\s*/g +) { #Append the coordinates (with the '-' sign where appropr +iate) push(@array, "$1$2.$3 || $4$5.$6 \n"); } } #close file close(FILE); } #close list close(LIST); #print to file p2f(); #check if path stays the same, if not then append a note to the databa +se sub fileswitch { if($filename1 ne $filename2) { push(@array, "Path: $filename1\n"); #print "\n $filename1 \n" ; #print (".pkt file parsing completed. \n"); } else { $filename2 = $filename1; } } #print to file sub p2f { open (COORDLIST, '>>coordinates'); print COORDLIST @array; close(COORDLIST); }
And when I check the file it is supposed to print to, I find this:
Path: #sorry I won't leave those :) -52157127.9760 || -2989955.5568 -52158244.6810 || -6741268.4549 -52157681.8715 || -1698959.4033 -50440239.8128 || -1701475.3622 -50441191.7990 || -1705583.1112 -57952305.4315 || -7490163.6682 -52157134.5720 || -27730.8039 Path: #sorry I won't leave those :) Path: #sorry I won't leave those :)
Of course, the tests are on a smaller scale, just 40kB of data, but still it should give me at least 20 000 lines.
Oh and when I checked by adding 'print' codes everywhere it actually did go into the 'if' statement 7 or 8 times. So my guess would be that it's my regular expression skills at fault, but I just can't find what's wrong. It seems to match the pattern the coordinates represent.
Anyway, thanks in advance.
--Ignas
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting coordinates
by toolic (Bishop) on Mar 20, 2010 at 22:56 UTC | |
|
Re: Extracting coordinates
by moritz (Cardinal) on Mar 20, 2010 at 23:01 UTC | |
by Ignas (Novice) on Mar 20, 2010 at 23:51 UTC | |
by moritz (Cardinal) on Mar 21, 2010 at 07:59 UTC | |
|
Re: Extracting coordinates
by GrandFather (Saint) on Mar 21, 2010 at 00:46 UTC | |
by Ignas (Novice) on Mar 21, 2010 at 11:02 UTC | |
by GrandFather (Saint) on Mar 21, 2010 at 19:54 UTC | |
by wfsp (Abbot) on Mar 21, 2010 at 10:17 UTC | |
by Anonymous Monk on Mar 21, 2010 at 10:27 UTC | |
|
Re: Extracting coordinates
by se@n (Initiate) on Mar 20, 2010 at 23:17 UTC |