Problem
Following up on this question, it seems that a file- or disk-based Map implementation may be the right solution to the problems I mentioned there. Short version:
- Right now, I have a
Mapimplemented as aConcurrentHashMap. - Entries are added to it continually, at a fairly fixed rate. Details on this later.
- Eventually, no matter what, this means the JVM runs out of heap space.
At work, it was (strongly) suggested that I solve this problem using SQLite, but after asking that previous question, I don't think that a database is the right tool for this job. So - let me know if this sounds crazy - I think a better solution would be a Map stored on disk.
Bad idea: implement this myself. Better idea: use someone else's library! Which one?
Requirements
Must-haves:
- Free.
- Persistent. The data needs to stick around between JVM restarts.
- Some sort of searchability. Yes, I need the ability to retrieve this darn data as well as put it away. Basic result set filtering is a plus.
- Platform-independent. Needs to be production-deployable on Windows or Linux machines.
- Purgeable. Disk space is finite, just like heap space. I need to get rid of entries that are
ndays old. It's not a big deal if I have to do this manually.
Nice-to-haves:
- Easy to use. It would be great if I could get this working by the end of the week.
Better still: the end of the day. It would be really, really great if I could add one JAR to my classpath, changenew ConcurrentHashMap<Foo, Bar>();tonew SomeDiskStoredMap<Foo, Bar>();
and be done. - Decent scalability and performance. Worst case: new entries are added (on average) 3 times per second, every second, all day long, every day. However, inserts won't always happen that smoothly. It might be
(no inserts for an hour)then(insert 10,000 objects at once).
Possible Solutions
- Ehcache? I've never used it before. It was a suggested solution to my previous question.
- Berkeley DB? Again, I've never used it, and I really don't know anything about it.
- Hadoop (and which subproject)? Haven't used it. Based on these docs, its cross-platform-readiness is ambiguous to me. I don't need distributed operation in the foreseeable future.
- A SQLite JDBC driver after all?
- ???
Ehcache and Berkeley DB both look reasonable right now. Any particular recommendations in either direction?