I am trying to find the top 10 largest files on my system.

At first I thought I would try to use some sort of stack to keep track of the 10 highest but the code quickly went beyond my mental reach.

Then I thought I would just store the file names of all files in a has with the key being the file size. This pretty well works but is a really bad idea for a filesystem with lots of files. But, I implemented this to see if I could get the top 10 out of it. I ran into trouble with the output as it did not print out expected results.

Here is the code to output the top 10 files:

$y = 0; for $key (sort { $hash{$b} <=> $hash{$a} || length($b) <=> length($a) +} keys %sizehash) { if ($y < 10) { $res = &commas($key); print "$res: $sizehash{$key}\n"; $y++; } }

And here is the output I got:

Total files: 104644 Largest file: c:\/My Virtual Machines/Gentoo/Gentoo Largest file size: 1,533,542,400 Smallest file: c:\ Smallest file size: 0 1,008,164,864: c:\/Program Files/dtSearch/UserData/personal/index_k_4. +ix 1,533,542,400: c:\/My Virtual Machines/Gentoo/Gentoo 102,177,098: c:\/Data/obbd/Org Basic Building Binder/Building Binder.z +ip 135,019,052: c:\/My Virtual Machines/Gentoo/Gentoo.vmss 148,851,791: c:\/Data/Paraben/foch-beta.rar 569,366,528: c:\/data_transfer/Software/ISO/en_windows_server_2003_ent +erprise_vl.iso 144,244,736: c:\/Documents and Settings/davisone/Local Settings/Applic +ation Data/Microsoft/Outlook/archive_2003q3.pst 344,746,496: c:\/My Documents/My Virtual Machines/Windows95/Windows 98 +.vmdk 176,308,736: c:\/My Documents/My Virtual Machines/Windows95/Windows95. +vmdk 524,288,000: c:\/Data/Personal.vol

As you can see the output is not numerically sorted like I would expect.

What have I done wrong here?

Also any recommendations on a better way to implement a search like this than using a hash entry for each file on the filesystem would be great.

Ed


In reply to finding top 10 largest files by bfdi533

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.