comment on

Hey Monks,

I need your advice on how to do a read ahead buffer over the network right.
My situation is this: A local script allows local apps to read a file which appears to the script (and the app) as being local, but in fact the file resides on a remote server and is potentially quite big. It's effectively a virtual file system setup.

Now reading begins, and depending on the application doing the read() it asks for chunks of 1, 46, 1024, whatever bytes of data - efficient when done on local disk, inefficient over network.

Each of these requests in an uncached world would mean one hit for a probably tiny chunk of data on the remote server side, resulting in quick and useless concurrent hits. Thus, I've put a simple caching mechanism into the loop: whenever a local read is done, it caches ahead 14 more chunks of the asked size. For example an app which does tiny reads asks for a first chunk of data, 16 bytes. My cache then multiplies this by 15 and sends just one request over the network for 16 bytes * 15. Upon arrival it delivers the asked for chunck and caches the remaining 14 chunks as the app will quite likely ask for them after it has consumed the previous chunk. (Of course, limiting the number of read-ahead slots by EOF etc.)

This is as good as it gets without additional work. The problem is that this gets more effective if an app asks for reasonable sized chunks, but effectively doesn't help much if it asks for 1byte chuncks!

Imagined version 2
A next iteration of the problem would base the

read-ahead buffer size on file size and a cap-limit, so a scheme would read chuncks of up to, let's say 64000bytes on larger files, or read the whole file in one go if it is smaller. And then would cache this locally.

The problem in this solution is that the local script needs to be able to read() and seek() in a data structure (for example if a video-player skips to a later section in the video) that is possibly in the process of being filled, the seek() might be in a portion of the file which is already there (probably the first few bytes) and yet might move to another section of data which isn't effectively there, should then be delivered next, etc.
A bit like a canister being filled while someone is tapping it on the bottom.

Quiz 1:
Would it be a good idea to start the file-fill reader in a different thread so the local cached copy gets filled asynchroneously (I get a headache...)

Quiz 2:
Would IO::Mark or IO::Stream or IO::File::Cached be of any help here? (I still can't get my head around them..)

Any help, input, advice, code bits welcome!

In reply to Advice needed on an interesting read-ahead over network file IO problem by isync

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.