I'm working on an application that has the following use case:
- Users upload csv files, which need to be persisted across application restarts
- The data in the csv files need to be queried/sorted etc
- Users specify the query-able columns in a csv file at the time of uploading the file
The currently proposed solution is:
- For small files (much more common), transform the data into xml and store it either as a LOB or in the file system. For querying, slurp the whole data into memory and use something like XQuery
- For larger files, create dynamic tables in the database (MySQL), with indexes on the query-able columns
Although we have prototyped this solution and it works reasonably well, it's keeping us from supporting more complex file formats such as XML and JSON. There are also a few more niggling issues with the solution that I won't go into.
Considering the schemaless nature of NoSQL databases, I though they might be used to solve this problem. I have no practical experience with NoSQL though. My questions are:
- Is NoSQL well suited for this use case?
- If so, which NoSQL database?
- How would we store csv files in the DB (collection of key-value pairs where the column headers make up the keys and the data fields from each row make up the values?)
- How would we store XML/JSON files with possibly deeply hierarchical structures?
- How about querying/indexing and other performance considerations? How does that compare to something like MySQL?
Appreciate the responses and thanks in advance!
example csv file:
employee_id,name,address  
1234,XXXX,abcabc  
001001,YYY,xyzxyz  
...  
DDL statement:
CREATE TABLE `employees`(  
  `id` INT(6) NOT NULL AUTO_INCREMENT,  
  `employee_id` VARCHAR(12) NOT NULL,  
  `name` VARCHAR(255),  
  `address` TEXT,  
  PRIMARY KEY (`id`),  
  UNIQUE INDEX `EMPLOYEE_ID` (`employee_id`)  
);  
for each row in csv file
INSERT INTO `employees`  
                (`employee_id`,  
                 `name`,  
                 `address`)  
       VALUES (...);  
 
    
 
    