BrowserUK, get ready for the longest post ever.
First of all, the explanation has allowed me to actually understand what's going on in the code, thank you.
I have a couple last little questions for you, but first off, here is my current-ish code:
use strict;
use warnings;
#convert DDHHMM into int of m
sub d2i {
my( $d, $h, $m ) = unpack '(A2)*', $_[0];
return ( ( $d - 1 ) * 24 + $h ) * 60 + $m;
}
#convert int of m into DDHHMM
sub i2d {
sprintf "%02d%02d%02d", int($_[0]/1440)+1, int($_[0]/60)%24, $_[0]
+%60;
}
#find and open files ... only ever 1 of each file type
my ($d,$t);
foreach (<*>) {
$d = $_ if (/\.dat$/);
$t = $_ if (/\.mrg$/);
}
open(D, $d) or die "Unable to open DAT file. Exiting.\n"; my @dl = <D
+>; close(D);
open(T, $t) or die "Unable to open MRG file. Exiting.\n"; my @tl = <T
+>; close(T);
#create big array of data points with added d2i
my @big;
foreach (@dl,@tl) {
chomp($_);
if (/^\/\d\d\d\d\d\d\//) {
my @n = split(/\//);
push @big,[$n[1],$n[2],$n[5],d2i($n[1]),d2i$n[2]];
}
}
#break by day if needed
my @ex = map {
my $s = $_->[3];
my $e = $_->[4];
my @out;
while (int($s/1440) != int($e/1440)) {
my $newe = ( int($s/1440) + 1) * 1440 - 1;
push @out, [i2d($s),i2d($newe),$_->[2],$s,$newe];
$s = $newe + 1;
}
(@out, [i2d($s),$_->[1],$_->[2],$s,$e]);
} @big;
#build parallel arrays of minute and values ... total minimized for ov
+erlaps
my (@tally, @id);
for my $e (0 .. $#ex) {
my $r = $ex[$e];
for my $i ($r->[3] .. $r->[4] ) {
if( !defined($tally[$i]) or $tally[$i] > $r->[2] ) {
$tally[$i] = $r->[2];
$id[$i] = $e;
}
}
}
#recreate [DDHHMM,DDHHMM,V] with overarching tally and ids
my @res;
my $i = 0;
while ($i < $#id) {
++$i until defined $id[$i];
my $id = $id[$i];
my $start = $i;
++$i while defined ($id[$i]) and $id[$i] == $id;
my $end = $i - 1;
push @res, [i2d($start),i2d($end),$tally[$start]];
}
#output organized final to file
open (C,'>CDA.txt');
my @last = ("","","");
print C "To do later: ... this still needs specific formatting work(ea
+sy)\n";
foreach (@res) {
if (substr($last[0],0,2) ne substr(@$_[0],0,2)) { print C "\n"; }
print C "@$_[0] @$_[1] @$_[2]\n";
$last[0] = @$_[0];
}
close(C);
#[temporary] report comparison for manual check
@big = sort {$a->[0] <=> $b->[0]} (@big);
foreach (@big) {
print "@$_[0] @$_[1] @$_[2]\n";
}
print "\n";
foreach (@res) {
print "@$_[0] @$_[1] @$_[2]\n";
}
This all works EXACTLY as intended.
A couple easy questions:
1) Does the map function assume an output format of @array, [1,2,3,4,5]? Why isn't it push @array, [1,2,3,4,5] ?
2) Fundamentally what is the difference bewteen $_[0] and $_->[0]? When I start replacing $e and some $s in the map with the array counterpart (either $_[x] or $_->[x]) things break. I understand that $s may or may not be modified and needs to be variablarized (I bet that's not a word), but why does it break for just pointing to $e's array location($_->[4])?
3) The tally method was quite clever :)
So now that the easy questions are done, I have some new questions ... and a warning: you might throw your arms up in exasperation.
So I've just realized that in my effort to simplify the task enough to ask for help, I made a very unfortunate assumption that has bad consequences for the code. I think I've fixed most of it but I have a couple little problems.
Essentially, the two digit date value in DDHHMMV represents day of month and has the potential to roll over halfway through a file. Also, toward the top of each input file is a line that identifies the period of time applicable to that file, for instance '/XXX/XXX/110000ZNOV/140000ZNOV/XXX/XXX' where the value is 'DDHHHHZMON'. The input files will only ever cover a few days at a time (3-10ish), so there is never an issue of which month the data is applicable to, as the date tag in the beginning identifies that. The potential problem I see is when different months rollover (ie, /XXX/XXX/300000ZSEP/030000ZOCT/XXX/XXX'), the day breaking map effectively increments forever.
Here's an example of an input file (only a couple important lines, the rest is nonsense I filled in to test parsing):
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
f
sd
fa
sdfasdf
/XXX/XXX/300000ZSEP/020000ZOCT/XXX/XXX
a
sdfasdfasdf
asdf
df
sa
fas
dfASDFASDF
AS
DF
ASDF
SDFASDF
///
//sdfasdf/123
\\
/asdfasdf
adsf
sdf//
/555/234/
/301222/301232/234/4234/011.0/
/fasdfasd
/301240/301250/asdf/fdsa/011.3/
/302340/302355/9j9js9j9j9jf9sjfd/9j9sfj9df9323/010.0/
/302359/010002/kfjakdjfakdfasdf/salkdjfaklsdjflkasjd/008.1/
/011200/011400/kfjakdjfakdfasdf/salkdjfaklsdjflkasjd/008.1/
asd
/1/2/3/4/5
//
/
As you can also see, there is no 'year' date tag in input files, so there is still an assumption being made about DEC->JAN rollovers too.
Here's my new code that almost fixes the problem:
use strict;
use warnings;
use Time::Local;
#convert MMddhhmm into minutes since epoch (w/ leap year accounted for
+)
sub d2i {
my( $mo, $d, $h, $m ) = unpack '(A2)*', $_[0];
my @temp = localtime(time);
return int(timelocal(0,$m,$h,$d,$mo,$temp[4]) / 60);
}
#convert minutes since epoch into MMddhhmm (w/ leap year accounted for
+)
sub i2d {
my @times = localtime($_[0]*60);
sprintf "%02d%02d%02d%02d", $times[4], $times[3], $times[2], $time
+s[1];
}
#return month
sub fmon {
if ($_[0] =~ /JAN$/) {return "00"; }
if ($_[0] =~ /FEB$/) {return "01"; }
if ($_[0] =~ /MAR$/) {return "02"; }
if ($_[0] =~ /APR$/) {return "03"; }
if ($_[0] =~ /MAY$/) {return "04"; }
if ($_[0] =~ /JUN$/) {return "05"; }
if ($_[0] =~ /JUL$/) {return "06"; }
if ($_[0] =~ /AUG$/) {return "07"; }
if ($_[0] =~ /SEP$/) {return "08"; }
if ($_[0] =~ /OCT$/) {return "09"; }
if ($_[0] =~ /NOV$/) {return "10"; }
if ($_[0] =~ /DEC$/) {return "11"; }
}
#find and open files ... only ever 1 of each file type
my ($d,$t);
foreach (<*>) {
$d = $_ if (/\.dat$/);
$t = $_ if (/\.mrg$/);
}
open(D, $d) or die "Unable to open DAT file. Exiting.\n"; my @dl = <D
+>; close(D);
open(T, $t) or die "Unable to open MRG file. Exiting.\n"; my @tl = <T
+>; close(T);
#create big array of data points with added d2i
my (@big, $startmon, $stopmon);#@big format: [MMddhhmm,MMddhhmm,V,I,I]
+ where I is minutes from epoch
foreach (@dl,@tl) {
chomp($_);
if (/^\/\w{3}\/\w{3}\/\d{6}/) { #find month values
my @n = split(/\//);
$startmon = fmon($n[3]); #cheating because start/s
+top always preceeds values
$stopmon = fmon($n[4]);
}
if (/^\/\d{6}\//) { #read in actual data
my @n = split(/\//);
#if ($startmon ne $stopmon && substr($n_[0],0,2) < 15) {
# push @big,[$startmon.$n[1],$stopmon.$n[2],$n[5],d2i($star
+tmon.$n[1]),d2i($stopmon.$n[2])];
#}
push @big,[$startmon.$n[1],$stopmon.$n[2],$n[5],d2i($startmon.
+$n[1]),d2i($stopmon.$n[2])];
}
}
#break by day if needed
my @ex = map {
my $s = $_->[3];
my $e = $_->[4];
my @out;
while (int($s/1440) != int($e/1440)) {
my $newe = ( int($s/1440) + 1) * 1440 - 1;
push @out, [i2d($s),i2d($newe),$_->[2],$s,$newe];
$s = $newe + 1;
}
(@out, [i2d($s),$_->[1],$_->[2],$s,$e]);
} @big;
#build parallel arrays of minute and values ... total minimized for ov
+erlaps
my (@tally, @id);
for my $e (0 .. $#ex) {
my $r = $ex[$e];
for my $i ($r->[3] .. $r->[4] ) {
if( !defined($tally[$i]) or $tally[$i] > $r->[2] ) {
$tally[$i] = $r->[2];
$id[$i] = $e;
}
}
}
#recreate [MMddhhmm,MMddhhmm,V] with overarching tally and ids
my @res;
my $i = 0;
while ($i < $#id) {
++$i until defined $id[$i];
my $id = $id[$i];
my $start = $i;
++$i while defined ($id[$i]) and $id[$i] == $id;
my $end = $i - 1;
push @res, [i2d($start),i2d($end),$tally[$start]];
}
#output organized final to file
open (C,'>CDA.txt');
my @last = ("","","");
print C "To do later: ... a lot of specific formatting work(easy)\n";
foreach (@res) {
if (substr($last[0],0,2) ne substr(@$_[0],0,2)) { print C "\n"; }
print C substr(@$_[0],2,6), " ", substr(@$_[1],2,6), " ", @$_[2],
+ "\n";
$last[0] = @$_[0];
}
close(C);
#[temporary] report comparison for manual check
@big = sort {$a->[0] <=> $b->[0]} (@big);
foreach (@big) {
print "@$_[0] @$_[1] @$_[2]\n";
}
print "\n";
foreach (@res) {
print "@$_[0] @$_[1] @$_[2]\n";
}
The problems I am having are:
1) Making the assumption of 'current year', as when this rolls over it would mess up.
2) Breaking days apart doesn't seem to be working and I don't see why ... I'm just using minutes since epoch instead of minutes since start of month.
3) Wastes a lot of resources (maybe in #recreate loops, start counting at 100000000 instead of 0 or something?
4) The way the input files are implemented real world is typically from today-ish to 3-10 days in the future, so the assumption of current year is valid except for the last couple days of the year when the file spans after the rollover.
Sorry to be such a pain in the butt, and thank you again for your help. On the bright side, I think I figured a lot of it out!
-Alan | [reply] [d/l] [select] |