Need help for possible efficient perl code to count, rank and get the percentage. I have the following code, already running for 6hours and not yet complete. The target filed is a csv (all text) and is about 14m to 14.5m rows, and around 1100 to 1500columns and 62gig size A run of 4hours is acceptable. what it does: - do a count (like a countif in excel) - get the percent (based on 14m rows) - get the rank based on count Appreciate any help.
$x="Room_reserve.csv"; $in = "D:\\package properties\\${x}.csv"; $in = "D:\\package properties\\${x}.csv"; $out = "D:\\package properties\\output\\${x}_output.csv"; open($fh, '<', $in) or die "Could not open file '$file' $!"; @data = <$fh>; close($fh); %counts; @columns; $first = 1; #counter foreach $dat (@data) { chomp($dat); @rows = split(',',$dat); if ($first == 1) { $first = 0; next; } else { $count = 1; foreach $i (0..$#rows) { if ( exists($columns[$i]{$rows[$i]}) ) { $columns[$i]{$rows[$i]}++; } else { $columns[$i]{$rows[$i]} = int($count); } } } } #output $first = 1; open($fh, '>', $out) or die "Could not open file '$file' $!"; foreach $dat (@data) { chomp($dat); @rows = split(',',$dat); foreach $i (0..$#rows) { if ($i > 6) { #for modifying name if ( $first == 1 ) { $line = join( ",", "Rank_$rows[$i]", "Percent_$rows[$i]", "C +ount_$rows[$i]", $rows[$i]); print $fh "$line,"; if ( $i == $#rows ) { $first = 0; } } else { @dat_val = reverse sort { $a <=> $b } values %{$columns[$i]} +; %ranks = {}; $rank_cnt = 0; foreach $val (@dat_val) { if ( ! exists($ranks{$val}) ) { $rank_cnt++; } $ranks{$val} = $rank_cnt; } $rank = $ranks{$columns[$i]{$rows[$i]}}; $cnt = $columns[$i]{$rows[$i]}; $ave = ($cnt / 14000000) * 100; $line = join( ",", $rank, $ave, $cnt, $rows[$i]); print $fh "$line,"; } } else { print $fh "$rows[$i],"; } } print $fh "\n"; } close($fh);

In reply to efficient perl code to count, rank by Perl_Noob2021

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.