comment on

i want a way of counting matches. in this case, it is per column of a csv file (but i've run across this doing different things). obviously, this should (and will) be in a db, but i want something strictly perl for the time being (or some xs theory might work).

so, is there a way to sort data as i match. the obvious answer is something like:

$data->{ criteria }->{ $record } += 1;
[download]

and then:

@sorted = sort $data->{ $a } cmp $data->{ $b } keys %{ $data };
[download]

... but, this seems wasteful. though, i'm almost out of ideas. i thought of using an array and splice to sort as i go but, of course this is slow as hell as it moves elements.

i had a glimmer of another idea, but i'm not sure how sound it is. let me try to elaborate on it (and if it is not understandable, there's a good chance it's not very good :) ). i create the same counter hash (of course). but, then i create a separate lookup table like:

$pointer = $data->{ criteria }->{ $record };
undef $lookup->{ ( $counter - 1 } . '-' . $pointer } if( $counter > 1 
+);
$lookup->{ $counter . '-' . $pointer } = $pointer;
[download]

then, i could just do something like:

foreach my $key ( sort( keys( %{ $lookup } ) ) ) {
 print data->{ $lookup->{ $key } };
}
[download]

but, half of me feels that i'm chasing my tail here as i'm still sorting (and i'm using two data structures, obfuscating a little and yielding a prettier sort). any thoughts on this?

UPDATE btw, i recently had an idea of how to do this sort in-place and wanted to run it by y'all. i haven't gone to debugging this, but it is sort of my attempt at a proof of concept (i think it shows decently what i'm looking to do) and wanted to know if this idea was worth anything?


my $data;
my $sorted;

while( <> ) {
 my @cols = split /,/, $_;

 for my $i ( 0 .. $#cols ) {

#counter for the unique element
 $data->[ $i ]->{ $cols[ $i ] }->[ 0 ]++;

#undefine the array element in sorted if a $data reference was previou
+sly defined
 undef $sorted->[ $i ]->{ \$data->[ $i ]->{ $cols[ $i ] }->[ 1 ] } if
$data->[ $i ]->{ $cols[ $i ] }->[ 1 ];

#elements in $sorted
 my $stack = $#{ $sorted->[ $i ] };

#store the new reference to the $data record at the top of $sorted's s
+tack
 $sorted->[ $i ]->[ $stack ] = \$data->[ $i ]->{ $cols[ $i ] };

#reference to place in $sorted so that it may be undefined later if ne
+cessary
 $data->[ $i ]->{ $cols[ $i ] }->[ 1 ] = \$sorted->[ $i ]->[ $stack ];

 }
}

#then, you could just loop through sorted. bypassing:
# sort { $data->[ $i ]->{ $a } <=> $data->[ $i ]->{ $b }
# } keys %{ $data->[ $i ] }
# with something like

for my $i ( 0 .. $#{ $sorted } } ) {
 foreach my $j ( 0 .. $#{ $sorted->[ $i ] } ) {
 print "column ". $i . ":" . $sorted->[ $i ]->[ $j ]->[ 1 ] . " had "
. $sorted->[ $i ]->[ $j ]->[ 0 ] . " duplicates\n" if( $sorted->[ $i
]->[ $j ] );
 }
}
[download]

In reply to best sort by ag4ve

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.