awohld has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file that looks like:
Event,1stEcIo,RxPower,Channel,Longitude,Latitude 1sec CDMA Event DtiC(0),-5.36,,,-88.150782,41.940351 Generic Scanner PN Measurement DtiC(0),,-83.35999999999997,384,-88.150 +782,41.940351 1sec CDMA Event DtiC(0),-6.36,,,-88.150782,41.940351 Generic Scanner PN Measurement DtiC(0),,-83.35999999999997,384,-88.150 +80400000001,41.940331 Generic Scanner PN Measurement DtiC(0),,-85.35999999999996,384,-88.150 +82600000001,41.940311 1sec CDMA Event DtiC(0),-7.36,,,-88.150848,41.940291 Generic Scanner PN Measurement DtiC(0),,-84.35999999999996,384,-88.150 +86533333333,41.94028566666667 Generic Scanner PN Measurement DtiC(0),,-88.36000000000001,384,-88.150 +88266666666,41.940280333333334

I need to write a Perl script to change it to this log format:
null,null,Ec,channel,long,lat 0,0,-75.2,384,-87606306,41798374

I need to combine the "1sec CDMA Event DtiC(0)" and "Generic Scanner PN Measurement DtiC(0)" rows. Every time either row repeats twice in a row, I need to average them together. So I should end up with alternating "1sec CDMA Event DtiC(0)" and "Generic Scanner PN Measurement DtiC(0)" rows.

Then I need to add the RxPower from "Generic Scanner PN Measurement DtiC(0)" to the 1stEcIo from "1sec CDMA Event DtiC(0)" and then average those two coordinates.

Also the lat longs need the decimal places taken out and truncated to 8 digits long. I should end up with a log file like:

null,null,Ec,channel,long,lat
0,0,-70.5,384,-87777777,41111111

Where:
Ec = RxPower + 1stEcIo
Channel allways = 384
Lat & Long = the average lat/long for all datapoints combined to make row.

Here's some of my code that I started:

$LOGFILE = @ARGS; open(LOGFILE) or die("Could not open log file."); foreach $line (<LOGFILE>) { #This is where I'm lost, I don't know where to beging on chomping all +this data. }

How would you guys recommend chomping all this data?

Replies are listed 'Best First'.
Re: Manipulating Text File in Perl
by Joost (Canon) on Apr 21, 2005 at 23:10 UTC
Re: Manipulating Text File in Perl
by sweetblood (Prior) on Apr 22, 2005 at 00:33 UTC
    This is not a terribly difficult task, however you're not going to get through it without doing some reading. Every Perl install comes with the Perldoc, the perl documentation. I'd start with split. Your should also read perlre. The have another wack at your problem and let us know how you make out.

    Cheers

    Sweetblood

Re: Manipulating Text File in Perl
by jpeg (Chaplain) on Apr 22, 2005 at 00:22 UTC
    It sounds like you want a simple state machine. You're checking for duplicates and also checking for two parts of a tuple.
    No big deal. Here are some hints:

    First off, use split(/,/, $_) to break up the line.

    Then for converting lat and longitude formats:

    my @latitude = split(/\./, $_[5]); my $latstring = sprintf("%8s", ($latitude[0] . $latitude[1])) ;

    And finally for the state machine (totally untested):

    while (<LOGFILE>) { chomp $_; if ($_[0] eq $lastline ) { if ($_[0] eq "1sec CDMA Event DtiC(0)") { #average those things or whatever you neeed to do }elsif ( $_[0] eq "Generic Scanner PN Measurement DtiC(0)") { #average those things or whatever you need to do } } else { #determine which part of the tuple this line is and #store Rxpower/add Rxpower and massage lat/long into strings } $lastline = $_[[0]; }

    As I mentioned, all code is untested, but it's a starting point.
    --
    jpg

Re: Manipulating Text File in Perl
by eibwen (Friar) on Apr 22, 2005 at 21:51 UTC

    I started drafting a code according to your specification as a programming exercise, but in doing so I discovered a few pecularities regarding your specification:

    1. (How) does your specification accomodate for varying coordinates?
    2. It is impossible to determine the magnitude of the original coordinate and artificial granularity may be introduced if only the first 8 digits are used:

      10000001 # 100.00001 99999999 # 99.999999 10000001 # 10.000001 99999999 # 9.9999999 10000001 # 1.0000001 99999999 # 0.99999999 10000001 # 0.10000001 99999999 # 0.099999999 ...

      A fixed reference is necessary to determine the respective magnitude:

      10000001 # 100.00001 09999999 # 99.999999 01000000 # 10.000001 00999999 # 9.9999999 00100000 # 1.0000001 00099999 # 0.99999999 00010000 # 0.10000001 00009999 # 0.099999999 ...

    3. Should coordinates be truncated or rounded?

      Given a coordinate like 256.123456, the difference between 256.12346 and 256.12345 could be significant, but I guess that depends on the size of the planet in question.

    4. What is the significance of the first two fields of the specification?

      Do they correspond to what I presume to be "device numbers" (or similar) in the log, eg: Generic Scanner PN Measurement DtiC(0)? If so, is there a particular order in the syntax (eg arbitrarily stipulated, alphabetical)?