i want a way of counting matches. in this case, it is per column of a csv file (but i've run across this doing different things). obviously, this should (and will) be in a db, but i want something strictly perl for the time being (or some xs theory might work).


so, is there a way to sort data as i match. the obvious answer is something like:

$data->{ criteria }->{ $record } += 1;
and then:
@sorted = sort $data->{ $a } cmp $data->{ $b } keys %{ $data };

... but, this seems wasteful. though, i'm almost out of ideas. i thought of using an array and splice to sort as i go but, of course this is slow as hell as it moves elements.

i had a glimmer of another idea, but i'm not sure how sound it is. let me try to elaborate on it (and if it is not understandable, there's a good chance it's not very good :) ). i create the same counter hash (of course). but, then i create a separate lookup table like:

$pointer = $data->{ criteria }->{ $record }; undef $lookup->{ ( $counter - 1 } . '-' . $pointer } if( $counter > 1 +); $lookup->{ $counter . '-' . $pointer } = $pointer;

then, i could just do something like:

foreach my $key ( sort( keys( %{ $lookup } ) ) ) { print data->{ $lookup->{ $key } }; }


but, half of me feels that i'm chasing my tail here as i'm still sorting (and i'm using two data structures, obfuscating a little and yielding a prettier sort). any thoughts on this?

UPDATE btw, i recently had an idea of how to do this sort in-place and wanted to run it by y'all. i haven't gone to debugging this, but it is sort of my attempt at a proof of concept (i think it shows decently what i'm looking to do) and wanted to know if this idea was worth anything?

my $data; my $sorted; while( <> ) { my @cols = split /,/, $_; for my $i ( 0 .. $#cols ) { #counter for the unique element $data->[ $i ]->{ $cols[ $i ] }->[ 0 ]++; #undefine the array element in sorted if a $data reference was previou +sly defined undef $sorted->[ $i ]->{ \$data->[ $i ]->{ $cols[ $i ] }->[ 1 ] } if $data->[ $i ]->{ $cols[ $i ] }->[ 1 ]; #elements in $sorted my $stack = $#{ $sorted->[ $i ] }; #store the new reference to the $data record at the top of $sorted's s +tack $sorted->[ $i ]->[ $stack ] = \$data->[ $i ]->{ $cols[ $i ] }; #reference to place in $sorted so that it may be undefined later if ne +cessary $data->[ $i ]->{ $cols[ $i ] }->[ 1 ] = \$sorted->[ $i ]->[ $stack ]; } } #then, you could just loop through sorted. bypassing: # sort { $data->[ $i ]->{ $a } <=> $data->[ $i ]->{ $b } # } keys %{ $data->[ $i ] } # with something like for my $i ( 0 .. $#{ $sorted } } ) { foreach my $j ( 0 .. $#{ $sorted->[ $i ] } ) { print "column ". $i . ":" . $sorted->[ $i ]->[ $j ]->[ 1 ] . " had " . $sorted->[ $i ]->[ $j ]->[ 0 ] . " duplicates\n" if( $sorted->[ $i ]->[ $j ] ); } }

In reply to best sort by ag4ve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.