Command Prompt for Windows

Windows and Capturing Console I/O

This interesting article written by "Jonathan de Boyne Pollard" highlights the problems with capturing the input/output of console programs. Basically, in windows there is two methods available :-

  1. anonymous pipes, and
  2. console screen buffer polling

both have their own limitations as Jonathan points out :-

Because of a programming tradition on Microsoft operating systems that goes all of the way back to the earliest versions of MS-DOS, there is no one single way to capture the output or control the input of a textual application.

On Windows NT, the case of DOS applications is actually a subset of the more general case of "console mode" applications, because DOS applications run as coroutines within a Win32 process (NTVDM) that translates their I/O to Win32 equivalents.

There are two classes of console mode applications. The important difference between the two is whether they read from and write to their standard input and standard output in "glass TTY" fashion using ReadFile() and WriteFile() (what Win32 terms "high-level console I/O"), or whether they use "random access" APIs such as WriteConsoleOutput() (what Win32 terms "low-level console I/O"). Translating this to DOS terms: DOS programs that use INT 21h to read from and write to their standard input and standard output are in the former class; and DOS programs that use INT 10h or that write directly to video memory are in the latter class.

To capture the output and control the input of programs that use "low-level console I/O", one sits in a loop whilst the child process is executing, continuously monitoring the contents of the console screen buffer using ReadConsoleOutput() and sending any keystrokes using WriteConsoleInput().

There are several problems with this design. One minor problem is that it doesn't cope at all well with Win32 programs that take full advantage of the features of the Win32 low-level console paradigm and use alternate console screen buffers for their output. A more major problem is that because it uses polling (Win32 not providing any handy event mechanism to latch on to so that the monitor could know when the buffer has been modified) it is always going to be both unreliable and expensive. It is unreliable in that depending from the relative scheduling priorities of the various processes, something which is going to vary dynamically with system load, it may well be the case that the child program may be able to generate reams of output that the monitoring process will simply miss because its console monitoring thread won't be scheduled to run often enough. It is expensive in that if the child process happens not to generate any output for a while, the monitoring process is going to consume CPU time nee

To capture the output and control the input of programs that use "high-level console I/O", one redirects their standard input and standard output through pipes, and reads from and writes to the other end of the pipes in the monitoring process.

The advantage of this method is that one doesn't need to worry about missing any output, since this approach doesn't use polling. But, conversely, this method has a problem of its own, in that it won't capture any output generated by "low-level console I/O", and programs that use "low-level console I/O" for input will bypass the redirection entirely. Alas, all too many DOS programs fall into this category.

It is difficult to combine the two mechanisms into one for capturing output, and practically impossible to combine them for controlling input. So really one must know ahead of time the type of textual application that is going to be run, i.e. whether it is going to be using "high-level" or "low-level" console I/O, and select the appropriate mechanism accordingly.

The only perfect solution that would cope with both sorts of programs simultaneously would be for Win32 to provide functionality akin to what in the UNIX world is known as a "pseudo-TTY". Win32 would need to provide some means of for a monitoring process to hook into the "back" of a console instead of the "front" as seen in normal use. The monitor process would write data to the back of the console and the console would turn those data into keystrokes that applications reading from the front of the console, by either means, would see as input. All output written to the console, by either means, would be translated into a single encoded bytestream that the monitor process could then read from the back of the console, in sequence, without needing to poll, and without missing any data.

Alas, Windows has no such mechanism.