I am running into an OOM when reading a large number of objects from an ObjectInputStream with readUnshared. MAT points at its internal handle table as the culprit, as does the OOM stack trace (at end of this post). By all accounts, this shouldn't be happening. Furthermore, whether or not the OOM occurs appears to depend on how the objects were written previously.
According to this write-up on the topic, readUnshared should solve the issue (as opposed to readObject) by not creating handle table entries during read (that write-up is how I discovered writeUnshared and readUnshared, which I previously had not noticed).
However, it appears from my own observations that readObject and readUnshared behave identically, and whether the OOM happens or not depends on if the objects were written with a reset() after each write (it does not matter if writeObject vs writeUnshared was used, as I previously thought -- I was just tired when I first ran the tests). That is:
writeObject writeObject+reset writeUnshared writeUnshared+reset
readObject OOM OK OOM OK
readUnshared OOM OK OOM OK
So whether or not readUnshared has any effect actually seems to be completely dependent on how the object was written. This is surprising and unexpected to me. I did spend some time tracing through the readUnshared code path but, and granted it was late and I was tired, it wasn't apparent to me why it would still be using handle space and why it would depend on how the object was written (however, I now have an initial suspect although I have yet to confirm, described below).
From all of my research on the topic so far, it appears writeObject with readUnshared should work.
Here is the program I've been testing with:
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.EOFException;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class OOMTest {
// This is the object we'll be reading and writing.
static class TestObject implements Serializable {
private static final long serialVersionUID = 1L;
}
static enum WriteMode {
NORMAL, // writeObject
RESET, // writeObject + reset each time
UNSHARED, // writeUnshared
UNSHARED_RESET // writeUnshared + reset each time
}
// Write a bunch of objects.
static void testWrite (WriteMode mode, String filename, int count) throws IOException {
ObjectOutputStream out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
out.reset();
for (int n = 0; n < count; ++ n) {
if (mode == WriteMode.NORMAL || mode == WriteMode.RESET)
out.writeObject(new TestObject());
if (mode == WriteMode.UNSHARED || mode == WriteMode.UNSHARED_RESET)
out.writeUnshared(new TestObject());
if (mode == WriteMode.RESET || mode == WriteMode.UNSHARED_RESET)
out.reset();
if (n % 1000 == 0)
System.out.println(mode.toString() + ": " + n + " of " + count);
}
out.close();
}
static enum ReadMode {
NORMAL, // readObject
UNSHARED // readUnshared
}
// Read all the objects.
@SuppressWarnings("unused")
static void testRead (ReadMode mode, String filename) throws Exception {
ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(filename)));
int count = 0;
while (true) {
try {
TestObject o;
if (mode == ReadMode.NORMAL)
o = (TestObject)in.readObject();
if (mode == ReadMode.UNSHARED)
o = (TestObject)in.readUnshared();
//
if ((++ count) % 1000 == 0)
System.out.println(mode + " (read): " + count);
} catch (EOFException eof) {
break;
}
}
in.close();
}
// Do the test. Comment/uncomment as appropriate.
public static void main (String[] args) throws Exception {
/* Note: For writes to succeed, VM heap size must be increased.
testWrite(WriteMode.NORMAL, "test-writeObject.dat", 30_000_000);
testWrite(WriteMode.RESET, "test-writeObject-with-reset.dat", 30_000_000);
testWrite(WriteMode.UNSHARED, "test-writeUnshared.dat", 30_000_000);
testWrite(WriteMode.UNSHARED_RESET, "test-writeUnshared-with-reset.dat", 30_000_000);
*/
/* Note: For read demonstration of OOM, use default heap size. */
testRead(ReadMode.UNSHARED, "test-writeObject.dat"); // Edit this line for different tests.
}
}
Steps to recreate issue with that program:
- Run the test program with the
testWrites uncommented (andtestReadnot called) with the heap size set high, sowriteObjectdoes not lead to OOM. - Run the test program a second time with
testReaduncommented (andtestWritenot called) with the default heap size.
To be clear: I'm not doing the writing and reading in the same JVM instance. My writes happen in a separate program from my reads. The test program above may be slightly misleading at first glance due to the fact that I crammed both the write and read tests into the same source.
Unfortunately, the real situation I'm in is I have a file containing a lot of objects written with writeObject (without reset), which will take quite some time to regenerate (on the order of days) (and also the reset makes the output files massive), so I'd like to avoid that if possible. On the other hand, I cannot currently read the file with readObject, even with the heap space cranked up to the maximum available on my system.
It's worth noting that in my real situation, I do not need the caching provided by the object stream handle tables.
So my questions are:
- All my research so far suggests no connection between
readUnshared's behavior and how the objects were written. What is going on here? - Is there some way I can avoid the OOM on read, given that the data was written with
writeObjectand noreset?
I'm not entirely sure why readUnshared is failing to resolve the issue here.
I hope this is clear. I am running on empty here so may have typed strange words.
From comments on an answer below:
If you're not calling
writeObject()in the current instance of the JVM you should not be consuming memory by callingreadUnshared().
All my research shows the same, and yet, confusingly:
Here is the OOM stack trace, pointing at
readUnshared:Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464) at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readUnshared(ObjectInputStream.java:460) at OOMTest.testRead(OOMTest.java:40) at OOMTest.main(OOMTest.java:54)Here is a video of it happening (video recorded before recent test program edit, video is equivalent of
ReadMode.UNSHAREDandWriteMode.NORMALin new test program).Here are some test data files, which contain 30,000,000 objects (compressed size is a tiny 360 KB but be warned it expands to a whopping 2.34 GB). There are four test files here, each generated with various combinations of
writeObject/writeUnsharedandreset. The read behavior is dependent only on how it was written and independent ofreadObjectvs.readUnshared. Note that thewriteObjectvswriteUnshareddata files are byte-for-byte identical, I can't decide if this is surprising or not.
I've been staring at the ObjectInputStream code from here. My current suspect is this line, present in 1.7 and 1.8:
ObjectStreamClass desc = readClassDesc(false);
Where that boolean parameter is true for unshared and false for normal. In all other cases the "unshared" flag is propagated through to other calls, but in that case it's hard-coded to false, thus causing handles to be added to the handle table when reading class descriptions for serialized objects even when readUnshared is used. AFAICT, this is the only occurrence of the unshared flag not being passed through to other methods, hence why I am focused on it.
This is in contrast to e.g. this line where the unshared flag is passed through to readClassDesc. (You can trace the call path from readUnshared to both of those lines if anybody wishes to dig in.)
However, I have not yet confirmed that any of this is significant, or reasoned why false is hard-coded there. This is just the current track I'm taking looking into this, it may prove meaningless.
Also, fwiw, ObjectInputStream does have a private method, clear, that clears the handle table. I did an experiment where I called that (via reflection) after every read, but it just broke everything, so that's a no-go.