I'm working on a reader/writer for DNG/TIFF files. As there are several options to work with files in general (FileInputStream, FileChannel, RandomAccessFile), I'm wondering which strategy would fit my needs.
A DNG/TIFF file is a composition of:
- some (5-20) small blocks (several tens to hundred bytes)
- very few (1-3) big continuous blocks of image data (up to 100 MiB)
- several (maybe 20-50) very small blocks (4-16 bytes)
The overall file size ranges from 15 MiB (compressed 14 bit raw data) up to about 100 MiB (uncompressed float data). The number of files to process is 50-400.
There are two usage patterns:
- Read all meta-data from all files (everything except the image data)
- Read all image data from all files
I'm currently using a FileChannel and performing a map() to obtain a MappedByteBuffer covering the whole file. This seems quite wasteful if I'm just interested in reading the meta-data. Another problem is freeing the mapped memory: When I pass slices of the mapped buffer around for parsing etc. the underlying MappedByteBuffer won't be collected.
I now decided to copy smaller chunks of FileChannel using the several read()-methods and only map the big raw-data regions. The downside is that reading a single value seems extremely complex, because there's no readShort() and the like:
short readShort(long offset) throws IOException, InterruptedException {
return read(offset, Short.BYTES).getShort();
}
ByteBuffer read(long offset, long byteCount) throws IOException, InterruptedException {
ByteBuffer buffer = ByteBuffer.allocate(Math.toIntExact(byteCount));
buffer.order(GenericTiffFileReader.this.byteOrder);
GenericTiffFileReader.this.readInto(buffer, offset);
return buffer;
}
private void readInto(ByteBuffer buffer, long startOffset)
throws IOException, InterruptedException {
long offset = startOffset;
while (buffer.hasRemaining()) {
int bytesRead = this.channel.read(buffer, offset);
switch (bytesRead) {
case 0:
Thread.sleep(10);
break;
case -1:
throw new EOFException("unexpected end of file");
default:
offset += bytesRead;
}
}
buffer.flip();
}
RandomAccessFile provides useful methods like readShort() or readFully(), but cannot handle little endian byte order.
So, is there an idiomatic way to handle scattered reads of single bytes and huge blocks? Is memory-mapping an entire 100 MiB file to just read a few hundred bytes wasteful or slow?