Any drawback in use an allocation unit size of 2048 kilobytes?
Since I wrote an answer to Downsides of a small allocation unit size, it would seem appropriate to consider the alternative.
The downsides of a large filesystem allocation unit (such as the proposed 2 megabytes) include:
Large I/O operations will consume long intervals of time.
Assuming that the disk I/O for a cluster will be optimized by using a single (multi-sector) read or write ATA command, such an I/O operation would tie up the I/O channel (i.e. SATA or USB for an external drive) and delay any other operations.
Other active processes could suffer delays in satisfying their I/O requests.
Note that older versions of ATA (e.g. version 3) limited the sector count in a multi-sector operation to 255. Your proposed 2MB cluster size is equivalent to 4096 (512-byte) sectors.
Large memory buffers.
Every open or active file will typically require system buffers matching the allocation size (unless direct I/O is performed).
Modern DMA controllers have scatter/gather capability, so these buffers do not require physically contiguous RAM. X86 system (typically) do not have IOMMUs (yet), so there is some added overhead when dealing with such large DMA buffers.
If your system has a lot of RAM (as some 64-bit systems do), then this may not be a significant concern at all.
Slack space is increased.
This would probably be the most obvious downside of a very large allocation size. Every file is likely to have a remnant of unused space in its last cluster.
As the allocation size is increased, the potential for more unused (and wasted) space as slack space is increased.
But increased slack space is typically accepted as a tradeoff for increased storage capacity while maintaining a reasonable quantity of allocation units (i.e. avoiding an overly large allocation table).
The last link has solved my questions
Beware that the metrics were primarily the time to perform the I/O operations.
A large cluster size should reduce some CPU processing (e.g. fewer clusters to allocate, fewer I/Os to perform), which can result in improved multiprocessing performance, and may not be reflected in the time to perform the I/O operations.