comment on

I have a list of 100,000 files with file sizes

size  filename
----- --------
4329  file1
12311 file2
...
657   file100000
[download]

If I calculate the average file size of the list, as it is, the average may be too high or too low for the target average I'm seeking. What I am trying to do is generate a new list that will match (or be within ~500 bytes) of a target average file size.

./prog 25000 file.lst > new.25000.lst
[download]

One idea I had was to generate buckets of files at a resolution of 1024 bytes and then create the new list by plucking files from there, and chucking some from the new list to meet the efficiency goal (see below). I would also like to maintain some randomness to the file sizes. Don't want to have clumping near the target average or something like that. Fast would be good, but efficient is more important. i.e. efficient = maintain as many of the elements of the original as possible.

In reply to Creating a sub-list constrained by an average while maintaining maximum number of original elements by Jack B. Nymbol

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.