Lucas' Blog

cat /dev/thoughts

Linux Asynchronous I/O

Disclaimer: This post is a compilation of my study notes and I am not an expert on this subject (yet). I put the sources linked for a deeper understanding.

When we talk about asynchronous tasks what comes to our minds is almost always a task running in a separated thread. But, as I will clear below, this task usually is blocking, and not async at all.

Another interesting misunderstanding occurs between nonblocking and asynchronous (or blocking and synchronous as well), and people use to think these pair of words are always interchangeable. We will discuss below why this is wrong.

The models

So here we will discuss the 04 available I/O models under Linux, they are:

  • blocking I/O
  • nonblocking I/O
  • I/O multiplexing
  • asynchronous I/O

Blocking I/O

The most common way to get information from a file descriptor is a synchronous and blocking I/O. It sends a request and waits until data is copied from the kernel to the user.

An easy way to implement multitasking is to create various threads, each one with blocking requests.

Non-blocking I/O

This is a synchronous and non-blocking way to get data.

When a device is open with the option O_NONBLOCK, any unsuccessful try to read it return an exception EWOULDBLOCK or EAGAIN. The application then retries to read the data until the file descriptor is ready for reading.

This method is very wasteful and maybe that’s the reason to be rarely used.

I/O multiplexing or select/poll

This method is an asynchronous and blocking way to get data.

The multiplexing, in POSIX, is done with the functions select() and poll(), that registers one or more file descriptors to be read, and then blocks the thread. As been as the file becomes available, the select returns and is possible to copy the data from the kernel to the user.

I/O multiplexing is almost the same as the blocking I/O, with the difference that is possible to wait for multiple file descriptors to be ready at a time.

Asynchronous I/O

This one is the method that is, at the same time, asynchronous and non-blocking.

The async functions tell the kernel to do all the job and report only when the entire message is ready in the kernel for the user (or already in the user space).

There are two asynchronous models:

  • The all asynchronous model, that the POSIX functions that begin with aio_ or lio_;
  • The signal-driven model, that uses SIGIO to signal when the file descriptor is ready.

One of the main difference between these two is that the first copy data from kernel to the user, while the seconds let the user do that.

The confusion

The POSIX states that

Asynchronous I/O Operation is an I/O operation that does not of itself cause the thread requesting the I/O to be blocked from further use of the processor. This implies that the process and the I/O operation may be running concurrently.

So the third and fourth models are, really, asynchronous. The third being blocker, since after the register of the functions it waits for the FDs.

I believe that there is almost always space to a discussion when it comes to the use of any terms. But only the fact that there is divergence whether blocking I/O and synchronous I/O are the same thing shows us that we have to be cautious when we use these terms.

To finish, an image worths a thousand of words, and a table even more, so let us look at this:

Further words

When dealing with asynchronous file descriptors, it is important to make account of how many of them the application can handle open at a time. This is easily checked with

$ cat /proc/sys/fs/file-max

Be sure to check it to the right user.

Other References

I/O Multiplexing, by shichao
Boost application performance using asynchronous I/O, by M. Jones
Asynchronous I/O and event notification on linux
The Open Group Base Specifications Issue 7, by The IEEE and The Open Group

Comments