When I redirect a command's output to a file (e.g., echo Hello > file) will that file be guaranteed to have such data right after the command exits? Or is there still a very small window between the command exits and data written to the file? I'd like to read the file right after the command exits, but I do not want to read an empty file.
- 351
9 Answers
If the application doesn't have any internal caches, then the changes will be immediately written to the file. The same for your example. The file is a logical entity in memory which will be immediately updated. Any subsequent operations on the file will see the changes made by the program.
However, this does not mean the change was written to the physical disk. The changes might linger in the OS filesystem caches or hardware caches. To flush the filesystem buffers, use the sync command.
I'd like to read the file right after the command exits, but I do not want to read an empty file.
You shouldn't run into any practical problems here.
- 17,262
Will buffer be automatically flushed to disk when a process exits?
In general the answer is no.
It depends on the command. As the other answers mention, if the command does not internally buffer the data, all data will be available when the command terminates.
But most, if not all, standard I/O libraries do buffer stdout by default (to some extent), and give different guarantees about the automatic flushing of buffers when the application closes.
C guarantees that a normal exit will flush the buffers. “Normal exit” means that exit is called — either explicitly, or by returning from main. However, abnormal exit can circumvent this call (and therefore leave unflushed buffers behind).
Here’s a simple example:
#include <signal.h>
#include <stdio.h>
int main() {
printf("test");
raise(SIGABRT);
}
If you compile this and execute it, test will not necessarily be written to stdout.
Other programming languages give even fewer guarantees: Java, for instance, does not auto-flush upon program termination. If the output buffer contains an unterminated line, it may therefore be lost, unless System.out.flush() was called explicitly.
That said, your question body asks something slightly different: if the data arrives in the file at all, it should do so immediately after the command terminates (subject to the caveats described in the other answers).
- 7,678
There are multiple layers of buffers/caches involved.
The CPU cache.
The data is put together byte by byte, and stored in the CPU cache. If the CPU cache is full and the data has not been accessed for a while, the block containing our data may get written to main memory. These are, for the most part, hidden from the application programmers.
The in-process buffers.
There is some memory set aside in the process where the data is collected so we need to make as few requests to the OS as possible, because that is comparatively expensive. The process copies the data to these buffers, which again may be backed by CPU caches, so there is no guarantee that the data is copied to main memory. The application needs to explicitly flush these buffers, for example using fclose(3) or fsync(3). The exit(3) function also does this before the process is terminated, while the _exit(2) function does not, which is why there is a big warning in the manual page for that function to call it only if you know what you are doing.
The kernel buffers
The OS then keeps its own cache, to minimize the number of requests it needs to send to the disks. This cache belongs to no process in particular, so data in there may belong to processes that have finished already, and since all accesses go through here, the next program will see the data if it has reached here. The kernel will write this data to the disks when it has time to do so or when explicitly asked.
The drive cache
The disk drives themselves also keep a cache to speed up accesses. These are written fairly quickly, and there is a command to write the remaining data in the caches and report when that is complete, which the OS uses on shutdown to make sure no data is left unwritten before powering down.
For your application, it is sufficient for the data to be registered in the kernel buffers (the actual data may still live in CPU caches at this point, and might not have been written to main memory): the "echo" process terminates, which means that any in-process buffers must have been flushed and the data handed over to the OS, and when you start a new process then, it is guaranteed that the OS will give the same data back when asked.
- 3,105
I think that no question addresses this issue sufficiently yet:
I'd like to read the file right after the command exits, but I do not want to read an empty file.
As the other answers explain, a well behaving program flushes its internal file buffers before the process terminates normally. Afterwards the data may still linger in kernel or hardware buffers before it's written to persistent storage. However, the file system semantics of Linux guarantee that all processes see the content of files in the same way as the kernel does including internal buffers1.
This is typically implemented by having at most one in-kernel buffer per file object and to require all file access to go through this buffer.
If a process reads a file, the kernel will present the buffer content to the process, if the requested file part is currently in the buffer; if it is not, the kernel will fetch the data from the underlying storage medium and place it inside the buffer, then go back to the previous step.
If a process writes to a file, the data is first placed inside the in-kernel buffer for that file. Eventually the buffer content will be flushed to storage. In the mean time read access is satisfied from the same buffer (see above).
1 At least for regular files, directories and symbolic links. FIFOs and sockets are a different matter since their content is never stored persistently anyway. There are some special cases of regular files whose contents depend on who's asking; examples are files in procfs and sysfs (think /proc/self which is a symbolic link to the process ID of the process reading the symbolic link).
- 907
Assuming your command is executed by some program using the C runtime library, at some point it should invoke fclose to close the open file.
The man page for fclose C function says:
NOTES Note that fclose() only flushes the user space buffers provided by the C library. To ensure that the data is physically stored on disk the kernel buffers must be flushed too, for example, with sync(2) or fsync(2).
and the man page for fflush has the same note. The man page for close says:
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
Note that the data is available to other processes even if it is not synced to the drive. Maybe that is already good enough for you.
If you are in doubt, write a test.
- 881
When I redirect a command's output to a file (e.g.,
echo Hello > file) will that file be guaranteed to have such data right after the command exits?
Yes. The shell opens the output-file, and echo outputs directly to that. After the command exits, it's done.
Or is there still a very small window between the command exits and data written to the file?
Whether the data is already on the media is another matter, which only matters if there is thereafter a hardware failure, or you inspect the live partition with some forensic software, bypassing the mounted filesystem.
I'd like to read the file right after the command exits, but I do not want to read an empty file.
Don't worry, the kernel only keeps one view of the file, independent of how often it is opened.
- 157
As a general rule, any data owned by the kernel is maintained & cleaned up by the kernel, period. Such data includes data transferred to kernel memory by a system call such as write(2).
However, if your application (e.g. C library) performs buffering on top of this, then the kernel obviously has no idea and hence does not guarantee its clean-up.
Moreover, I don't believe there is any timing guarantee for the clean-up—it is, in general, performed on a "best-effort" (read: "when I have a sec") basis.
- 23,629
Or is there still a very small window between the command exits and data written to the file?
No, there isn't.
I'd like to read the file right after the command exits, but I do not want to read an empty file.
You can read the final contents of the file right after the command exits, you will never be reading the empty file instead. (In C and C++, use the wait, waitpid, wait3 or wait4 system calls to wait for the program to exit, and only then read the file. If you are using a shell, another programming language or a library (e.g. the C library call system or the Java Process class), it probably uses one of these system calls already.)
As other answers and comments have pointed out, you may end up reading an empty file after the exit of the program if the program has exited without flushing its internal output buffers (e.g. because of _exit, abort or receiving a fatal signal, or because it's a Java program exiting normally). However there is nothing you can do about this at this point: the unflushed data is lost forever, additional waiting won't recover it.
- 7,648
Yes
Sorry for maybe adding another superfluous answer, but most seem to focus on the red herring of the title of the question. But as far as I can tell, the question is not about buffering at all, but this:
When I redirect a command's output to a file (e.g., echo Hello > file) will that file be guaranteed to have such data right after the command exits?
Yes, unconditionally. The usage of ">" that you are describing, along with "|" and "<", is the pipe-based processing model that the Unix and Linux world is heavily based on. You will find hundreds, if not thousands of scripts totally depending on this behaviour in every Linux installation.
It works as you want per design, and if there was even the slightest chance of a race condition, it would have been fixed probably decades ago.
- 933