in reply to Re^3: Best way to store/access large dataset?
in thread Best way to store/access large dataset?
I can't quite visualize it, but what you're doing is assigning each file it's category name and carrying that forward right? And then it just counts up the "hits" in each category for each attribute.
Just for general knowledge, on a 3/4 size data set, it takes approximately 16 minutes before the dumper starts printing to screen. That's where I was wondering if this was the type of thing that could be forked? Also, is $j an arbitrary variable, or is it special? And $i is a special variable right? I was hoping to shoehorn the attribute ID into the data structure in order to use it in an output at the end of this.
This works:
use strict ; use warnings ; use Data::Dumper ; open my $dataIn1, "<", "Attribute_ID.txt" or die "NO ID FILE: $!"; open my $dataIn2, "<", "Attributes.txt" or die "NO ATTR FILE: $!"; my $data = () ; my $attrs = () ; sub getdata { my ( $fileName, $type ) = split /\t/, $_[1] ; push @{$data}, $type unless !defined $fileName ; } sub getattrs { my @attrs = split /\t/, $_[1] ; shift @attrs ; push @{$attrs}, \@attrs unless !defined $attrs[0] ; } while( <$dataIn1> ) { chomp ; getdata( 0, $_ ) ; } while( <$dataIn2> ) { chomp ; getattrs( 0, $_ ) ; } my @result; for( my $j = 0 ; $j < @{$attrs} ; ++$j ) { my %subres ; @subres{@{$data}} = ( 0 ) x @{$attrs->[0]} ; for( my $i = 0 ; $i < @{$attrs->[$j]} ; ++$i ) { if ( $attrs->[$j][$i] == 1 ) { ++$subres{ $data->[$i]} ; } } ; push @result, \%subres ; } print Dumper(\@result) ;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Best way to store/access large dataset?
by Veltro (Hermit) on Jun 26, 2018 at 08:45 UTC | |
by Speed_Freak (Sexton) on Jun 26, 2018 at 23:03 UTC | |
by Veltro (Hermit) on Jun 27, 2018 at 10:26 UTC | |
by Speed_Freak (Sexton) on Jun 27, 2018 at 14:10 UTC | |
by poj (Abbot) on Jun 28, 2018 at 14:32 UTC | |
| |
by Speed_Freak (Sexton) on Jun 27, 2018 at 16:56 UTC |