If you call ReadFile once with something like 32 MB as the size, it takes noticeably longer than if you read the equivalent number of bytes with a smaller chunk size, like 32 KB.
Why?
(No, my disk is not busy.)
Edit 1:
Forgot to mention -- I'm doing this with FILE_FLAG_NO_BUFFERING!
Edit 2:
Weird...
I don't have access to my old machine anymore (PATA), but when I tested it there, it took around 2 times as long, sometimes more. On my new machine (SATA), I'm only getting a ~25% difference.
Here's a piece of code to test:
#include <memory.h>
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
int main()
{
    HANDLE hFile = CreateFile(_T("\\\\.\\C:"), GENERIC_READ,
        FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
        OPEN_EXISTING, FILE_FLAG_NO_BUFFERING /*(redundant)*/, NULL);
    __try
    {
        const size_t chunkSize = 64 * 1024;
        const size_t bufferSize = 32 * 1024 * 1024;
        void *pBuffer = malloc(bufferSize);
        DWORD start = GetTickCount();
        ULONGLONG totalRead = 0;
        OVERLAPPED overlapped = { 0 };
        DWORD nr = 0;
        ReadFile(hFile, pBuffer, bufferSize, &nr, &overlapped);
        totalRead += nr;
        _tprintf(_T("Large read: %d for %d bytes\n"),
            GetTickCount() - start, totalRead);
        totalRead = 0;
        start = GetTickCount();
        overlapped.Offset = 0;
        for (size_t j = 0; j < bufferSize / chunkSize; j++)
        {
            DWORD nr = 0;
            ReadFile(hFile, pBuffer, chunkSize, &nr, &overlapped);
            totalRead += nr;
            overlapped.Offset += chunkSize;
        }
        _tprintf(_T("Small reads: %d for %d bytes\n"),
            GetTickCount() - start, totalRead);
        fflush(stdout);
    }
    __finally { CloseHandle(hFile); }
    return 0;
}
Result:
Large read: 1076 for 67108864 bytes
Small reads: 842 for 67108864 bytes
Any ideas?