Re^2: log file sorting

ah, thanks for notifying me about the paragraph tags, i somehow missed them reading about them

print "File Location?";
my $data_file = <>;
open(RAWDATA, $data_file);
my @list;
while (<RAWDATA>) {
  chomp;
  my (%hash, @rest);
  ($hash{first}, $hash{date}, @rest) = split(",", $_);
  for my $r (@rest) {
    my ($k, $v) = split(' ', $r, 2);
    $hash{$k} = $v;
  }
  push(@list, \%hash);
};
my %seen;
for (@list) { for (keys %$_) { $seen{$_}++ } };
delete $seen{first};
delete $seen{date};
my @allkeys = ('first', 'date', sort keys %seen);
my @keys = (sort keys %seen);
open(SEMIDATA, ">temp.slice");
for my $h (@list) {
print(SEMIDATA join(',', $h->{first}, $h->{date}, map( $_.' '.$h->{$_}
+, @keys ) ), "\n") or warn "print failed: $!";
}
close(RAWDATA);
close(SEMIDATA);
open(EDITDATA, "temp.slice");
my @array_of_data = <EDITDATA>;
close ("temp.slice");
foreach my $line (@array_of_data)
{
#all replacements go here
$line =~ s!X!!g;
$line =~ s!ART ,!ART / ,!g;
$line =~ s!ECG ,!ECG /,!g;
$line =~ s!NBP ,!NBP /  ,!g;
$line =~ s!PA ,!PA / ,!g;
$line =~ s!RESP ,!RESP /,!g;
$line =~ s!SAO2 ,!SAO2 /,!g;
$line =~ s!ST ,!ST //,!g;
$line =~ s!TEMP \n!TEMP /\n!g;
}

# Open the file for writing.
open REGDATA, ">temp2.slice";
foreach my $line (@array_of_data)
   {
   # Print each line in turn to the new filehandle DATAOUT
   print REGDATA "$line";
   }
close REGDATA;
}
[download]

this is relevant part of the program i have so far, which sorts each data label alphabetically. As I was unsure of how to count how many distinct numbers followed a label, i tried to fill in non existant data with substitutions, which was merely a temporary fix. I havent even begun attempting to get the data organized into nice excel-esque columns, as that would first require standardizing its appearance.

Comment on Re^2: log file sorting Download Code

Replies are listed 'Best First'.
Re^3: log file sorting by moritz (Cardinal) on Aug 04, 2008 at 16:31 UTC
ah, thanks for notifying me about the paragraph tags, i somehow missed them reading about them No problem. Just go to your original question and fix the markup. As for your programming problem, I think you're making it harder than it needs to be. For example there's no need to store your data to disk twice, and read it again. Here's what I'd do, in non-tested perl code, with some blanks left for you to figure out: # store all data here: my %data; while (<INPUT>){ chomp my @items = sort split m/,/; my %seen; # number the occurrences of data points, and put them into a hash for (@items) { my ($key, $val) = split m/ /, $_, 2; my $index = ++$seen{$key}; push @{$data{"$key$index"}}, $val; } } # now all data should be in the hash %data. use Data::Dumper; print Dumper \%data; # now print it: my @keys = sort keys %data; while (keys %data) { for (@keys) { if (exists $data{$_}) { # print it out here # then remove it shift @{$data{$_}}; delete $data{$_} unless @{$data{$_}}; } else { # print a placeholder here } } } [download] The idea is to keep a list of all data values for each label, in your case `['#', '#']` for `A1`, ... The choice of a clever data structure (ie one that fits the way you want to access it in your code) makes it much easier.	[reply] [d/l] [select]
Re^4: log file sorting by numberninja (Initiate) on Aug 04, 2008 at 17:30 UTC
just to make sure i understand your programming template: the first part does something similar to my program and splits the data according to commas, then has a space where i can count the occurrences of numbers following a label? I considered using the count function and looking for instances of \D\d{0,3}, but i don't see how to limit the count to only the area between the label and the comma. However, this doesn't really shed any light on how i can store the multiple instances of "A" in one line as seperate values. edit: now i've gotten it to count the number by having it search for \D\d{1,3}, and then storing that value to a hash with the key being the label. Every line i recheck the count, and if its greater, i replace the old value.	[reply]