=head1 GETTING A HANDLE ON IO

=head2 Introduction

This tutorial will delve into the naughtier, uglier parts of POSIX centric I/O.
Herein are covered the nasty details of the various calls, and the layers they
belong to, the different modes of input and output, and the combination of
these aspects into practical examples.

The tutorial will not attempt to explain the various ways to obtain the actual
handles, except as necessary for specific examples. It's aim is to tutor on the
various styles of I/O, that is, to manipulate given handles, in more
specialized ways, in the hope that when the kind of interaction with a handle a
user wants to attain is known, finding out how to get such a handle will be
easy by using the reference (L<perlipc>, L<perlopentut>, L<perlfunc/socket>,
L<perlfunc/pipe>, L<perlfunc/fcntl>, L<IPC::Open2>).

The main focus of the tutorial will be simplicity. Thereafter will come
robustness, and then performance. What this means is that I will not
systematically append C<or die "$!"> to every line of example, because I find
that distracting. I will also not resort to ugly constructs to gain a little
throughput. I think impure examples hinder my ability to convey my ideas
clearly.

Nuff said, on to the intro. We start with tiny baby steps, and then start
striding forward.

=head2 What is a filehandle?

We'll start by covering the perl specific data type, which abstracts a stream
of data. The filehandle.

If you already think you know what you're doing, skip onwards a bit. This is
really basic stuff.

Perl's filehandles are points through which data is moved. You can refer to by
name, or by storing them in a variable.

The abstraction focuses round a metaphor of a sort of port hole, or pipe end,
which your software can ask the OS to take data from and move it elsewhere, or
put data on for your software to read.

Data is moved via these orifices in chunks, coming out of or going into a
normal variable, as a string.

For example, lets say we've opened a file:

	open my $fh, "/some/file";

This stores a reference to a filehandle in the variable C<$fh>, which will
grant you access to the data inside C</some/file>.

Perl allows us to ask for data to come out of filehandles in useful ways. Lets
say we wanted a single line from C</some/file> to be stored in the variable.

	my $var = <$fh>;

But wait, how do we know which line will come out of C<$fh>? Well, the answer
is "the next one". Filehandles are stream oriented. Data will arrive serially,
and you can nibble at it, slowly progressing through the stream of data, till
it ends.

Specifically, handles having to do with files will have an implicit cursor,
working behind the scenes, marking the point in the file which the handle is
currently at.

=head2 Plumbing your handles

To move data in and out of file handles you use system calls. We'll start with
the two most basic calls there are, L<read(2)> and L<write(2)>, which are
available in perl as the builtin functions C<sysread> and C<syswrite>.

Their interfaces are pretty streight forward. Here is a subset of their
functionality:

	sysread $fh, $variable_data_will_be_read_to, $how_much_data_to_read;

C<sysread> takes a filehandle as it's first argument, a variable as it's
second, and a number as it's third, and read as many bytes as are specified in
the number, from the handle, into the variable.

	syswrite $fh, $data_to_write;

C<syswrite> will take a filehandle as it's first argument, and a string as it's
second argument, and write the data from the string, to the filehandle.

We already know of a way data can be put on a filehandle for us, which was
telling the OS what file we'd like it to come from. Writing is just as
flexible. The next section discusses ways of telling the OS not only what data
is moved around, but where it will go.

=head2 Directing data, a conceptual introduction

Now that we've a hopefully firm grasp on how data enters and exits your
software through handles, lets discuss it's movement, specifically, where it
goes.

The most common use for filehandles is for storing and retrieving data in
files.

We've already seen opening for reading. We can also write to a file:

	open my $fh, ">", "/some/file";

The C<< > >> argument will tell C<open> that we want to write to the file (and
also to erase it's contents first). When C<$fh> is opened for writing, we
simply use C<syswrite> or one of it's deriviatives on it.

But handles are not limited to just files. They can also be sockets, allowing
the transfer of data between two unrelated processes, on possibly two different
machines. A web server, for example, reads and writes on handles, receiving and
sending data to browsers.

Handles can be used as pipes to other processes, like to child processes using
L<pipe> or processes in a shell pipeline. The latter case is interesting,
because it is done implicitly:

	cat file | tr a-z A-Z > file.uppercase

That command will ask L<cat(1)> to read the file C<file>, and then print it to
it's I<standard output>. The standard output is a handle that you would
normally output data to. What "normally" means in this context will be
explained soon. Then L<tr(1)> reads data from it's I<standard input>, converts
the data, and writes it to I<it's> standard output. It does this a chunk at a
time. The shell redirect is perhaps the most interesting part: instead of
L<tr(1)>'s STDOUT being connected to the terminal, where the user can read the
data, the shell connected L<tr(1)>'s STDOUT to a handle of it's own, which is
opened to C<file.uppercase>.

I hope that the example fullfilled it's purpose in demonstrating the
flexibility of the concept of piping data around through file handles.

=head2 The going gets tough

Now that we've covered the conceptual basics, lets look in greater detail at
the most simple type of handle there is - a single purposed, non seekable,
blocking handle.

What single purposed means is that it can either read, or write. Not both.
What seekable means, is that you can use C<seek> to change the cursor position
for the file the handle abstracts. Not all handles abstract files, and thus not
all handles have cursors. The ones that don't work more simply. A blocking
handle refers to the type of semantics the system calls on the handle will work
in.

Non-seekable handles are implemented in terms of a buffer. The operating system
associates some scratch space for it. As data comes into the buffer from
somewhere (it could be your software writing to it, or somebody else if you're
on the reading side), it is accumilated in that buffer. When data is read from
the handle, it is taken from the buffer.

What happens when there is not enough space in the buffer to write anymore? Or
not enough data in the buffer to be read? This is where the blocking semantics
of this kind of handle comes in. I'm oversimplifying, but basically, if the
writing side wants to write a chunk of data that is too big for the space in
the buffer, the operating system simply makes it wait with the write till the
reading side asks for some data to come out. As data exits the buffer, more
space is cleared out, and the writing can continue. Eventually all the data
will be written to the buffer, and the write system call that the writing side
executed will return. The same goes for reading: the read system call will
simply wait until the data that was asked for has been made available.

The state in which an operating system puts a process that is waiting for an IO
call to complete is referred to as 'blocked'. When a process is blocked it
leaves the hardware resources free for other processes to use.

Blocking IO has an interesting property, in that it balances resource
allocation in a pipeline.

Lets say for example, that you ran this line of shell:

	cat file.gz | gzip -d | tr a-z A-Z

L<cat(1)> is doing very little work. It's a simple loop. It reads from the
file, and writes to STDOUT. The data that L<gzip(1)> is getting, on the other
hand, is processed more extensively. L<gzip(1)> performs some complex
calculation on the data that enters it, and outputs derived data after this
calculation. Then, finally, L<tr(1)> performs simple actions, that while more
complex than L<cat(1)>, they are dwarfed by L<gzip(1)>.

So what happens is that L<cat(1)> will read some data, and then write some
data, and then read some more data, and write some more data, until the buffer
is full, and it's write will block. All this time L<gzip(1)>'s and L<tr(1)>'s
read calls were blocking.  Eventually L<gzip(1)>'s read will return, allowing
it to do it's job, and finally emit data to L<tr(1)>. It will turn out that
most of the time L<gzip(1)> will use up CPU time, while L<cat(1)> will spend
most of it's time blocking in write calls, and L<tr(1)> will spend most of it's
time blocking in read calls, but will need some time for calculation too,
otherwise L<gzip(1)>'s writes will block.




Plan (not really in order):

blocking, nonseekable handles, and their conventions:
fatal errors, SIGPIPE, etc. promote fault tolerant behavior by default
UNIX pipelining mantra

explain when blocking is not good, and continue with a single purpose, non
seekable handle as used in a select loop to avoid it. mention epoll/kqueue, and
perl interfaces for thereof. Mention Event/POE as more powerful multiplexing solutions.

multiplexing, with a threading approach, as an alternative to select.

and a non blocking approach, including SIGIO, nb vs. select, reliability, and
latency versus blocking & selected IO.

When not to use nonblocking

bufferring, stdio vs sycalls, different functions, perlio

Touch seekable handles briefly, and explain the semantics of blocking and so on
as far as file io is concerned, mention files, and discuss that not all things
in the filesystem are files: devices (char and block), named pipes, unix domain
sockets...

Sockets
Introduce non stream handles, and discuss the implementations of socket IO, and
it's multilayered nature, the relationship between streams and datagrams.
Implications of networking envs.

discuss IO on shared handles

discuss accept() on shared sockets in a preforked env

appendix: faux IO: open ref, perlio layerrs and ties