comment on

I would suggest you try a two step approach.

1. Get your data structures as you want them

2. Process your data

For example build something like this first to get the data (I show you a rather simple script here, but I wonder if the database is not already coming from the database, why don't you poor it into the correct format already...)

use strict ;
use warnings ;
use Data::Dumper ;

my $data = () ;
my $attrs = () ;

sub getData {
    my ( $fileName, $type ) = split /\t/, $_[1] ;
    push @{$data}, $type unless !defined $fileName ;
}

sub getAttrs {
    my @attrs = split /\t/, $_[1] ;
    shift @attrs ;
    push @{$attrs}, \@attrs unless !defined $attrs[0] ;
}

# Gather data
my $context = 0 ;
my $counter = -1 ;
while(<DATA>) {
    chomp ;
    if ( $_ =~ /ID\'s/ ) {
        $context = 1 ;
        $counter = -1 ;
        next ;
    }
    if ( $_ =~ /Attributes/ ) {
        $context = 2 ;
        $counter = -1 ;
        next ;
    }
    if ( $context == 1 && $counter == -1 ) {
        ++$counter ;
        next ;
    } elsif ( $context == 1 && $counter > -1 ) {
        getData($counter, $_) ;
        ++$counter ;
    }
    if ( $context == 2 && $counter == -1 ) {
        ++$counter ;
        next ;
    } elsif ( $context == 2 && $counter > -1 ) {
        getAttrs($counter, $_) ;
        ++$counter ;
    }
} ;
foreach ( @{$data } ) {
    print $_ . " " ;
}
print "\n" ;
foreach ( @{$attrs->[0] } ) {
    print $_ . " " ;
} 
print "\n" ;
__DATA__
[download]

#ID's
File    ID
1.file.ext    Square
2.file.ext    Triangle
3.file.ext    Circle
4.file.ext    Square
5.file.ext    Triangle
6.file.ext    Circle
7.file.ext    Circle
8.file.ext    Rectangle
9.file.ext    Rectangle
10.file.ext    Circle
11.file.ext    Triangle
12.file.ext    Triangle
13.file.ext    Square
14.file.ext    Rectangle
15.file.ext    Rectangle
16.file.et    Square

#Attributes
attribute    1.file.ext    2.file.ext    3.file.ext    4.file.ext    5
+.file.ext    6.file.ext    7.file.ext    8.file.ext    9.file.ext    
+10.file.ext    11.file.ext    12.file.ext    13.file.ext    14.file.e
+xt    15.file.ext    16.file.et                
1    1    0    1    1    0    1    1    1    1    1    0    0    1    
+1    1    1                
2    1    0    1    1    0    1    1    0    0    1    0    0    1    
+0    0    1                
3    0    1    0    0    1    0    0    1    1    0    1    1    0    
+1    1    0                
4    0    1    1    0    1    1    1    1    1    1    1    1    0    
+1    1    0                
5    0    1    0    0    1    0    0    0    0    0    1    1    0    
+0    0    0                
6    0    0    0    0    0    0    0    1    1    0    0    0    0    
+1    1    0                
7    0    0    1    0    0    1    1    1    1    1    0    0    0    
+1    1    0                
8    1    0    1    1    0    1    1    1    1    1    0    0    1    
+1    1    1                
9    0    0    0    0    0    0    0    1    1    0    0    0    0    
+1    1    0                
10    0    1    0    0    1    0    0    0    0    0    1    1    0   
+ 0    0    0                
11    0    1    0    0    1    0    0    1    1    0    1    1    0   
+ 1    1    0                
12    1    1    1    1    1    1    1    0    0    1    1    1    1   
+ 0    0    1                
13    0    0    1    0    0    1    1    0    0    1    0    0    0   
+ 0    0    0                
14    0    0    1    0    0    1    1    1    1    1    0    0    0   
+ 1    1    0                
15    0    0    1    0    0    1    1    0    0    1    0    0    0   
+ 0    0    0                
16    1    0    0    1    0    0    0    0    0    0    0    0    1   
+ 0    0    1                
17    1    0    0    1    0    0    0    0    0    0    0    0    1   
+ 0    0    1                
18    0    0    1    0    0    1    1    0    0    1    0    0    0   
+ 0    0    0                
19    1    1    1    1    1    1    1    1    1    1    1    1    1   
+ 1    1    1                
20    0    1    1    0    1    1    1    1    1    1    1    1    0   
+ 1    1    0                
21    0    0    0    0    0    0    0    1    1    0    0    0    0   
+ 1    1    0                
22    1    1    1    1    1    1    1    1    1    1    1    1    1   
+ 1    1    1                
23    1    1    1    1    1    1    1    1    1    1    1    1    1   
+ 1    1    1                
24    0    0    0    0    0    0    0    0    0    0    0    0    0   
+ 0    0    0                
25    0    0    0    0    0    0    0    0    0    0    0    0    0   
+ 0    0    0                
26    1    1    1    1    1    1    1    0    0    1    1    1    1   
+ 0    0    1                
27    0    1    0    0    1    0    0    0    0    0    1    1    0   
+ 0    0    0                
28    0    0    0    1    0    0    0    1    1    0    0    0    1   
+ 1    1    1                
29    0    0    0    0    0    0    0    1    1    0    0    0    0   
+ 1    1    0                
30    0    0    0    1    0    0    0    1    1    0    0    0    1   
+ 1    1    1
[download]

Once you have collected your data then move on to your algorithm. In the following example I have reduced the amount of input data to reduce the output and I use hashes for their behavior. Further I don't know what you exactly want with the 25/75% thingy, but you can easily add another counter to this algorithm and count the times a 0 is encountered. I would work from there if you want some statistical calculation or something.

my @data = qw(Square Triangle Circle Square Triangle Circle Circle Rec
+tangle Rectangle Circle Triangle Triangle Square Rectangle Rectangle 
+Square) ;
$data = \@data ;
$attrs = [
    [1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1],
    [1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1],
] ;
my @result;
for( my $j = 0 ; $j < @{$attrs} ; ++$j ) {
    my %subres ;
    @subres{@{$data}} = ( 0 ) x @{$attrs->[0]} ;
    for( my $i = 0 ; $i < @{$attrs->[$j]} ; ++$i ) {
        if ( $attrs->[$j][$i] == 1 ) {
            ++$subres{ $data->[$i]}  ; 
        }
    } ;
    push @result, \%subres ;
}
print Dumper(\@result) ;
[download]

The output is:

$VAR1 = [
          {
            'Square' => 4,
            'Circle' => 4,
            'Rectangle' => 4,
            'Triangle' => 0
          },
          {
            'Rectangle' => 0,
            'Triangle' => 0,
            'Circle' => 4,
            'Square' => 4
          }
        ];
[download]

In reply to Re: Best way to store/access large dataset? by Veltro
in thread Best way to store/access large dataset? by Speed_Freak

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.