Questions tagged [large-data-volumes]
302 questions
                    
                    75
                    
            votes
                
                8 answers
            
        Designing a web crawler
I have come across an interview question "If you were designing a web crawler, how would you avoid getting into infinite loops? " and I am trying to answer it.
How does it all begin from the beginning.
Say Google started with some hub pages say…
        
        xyz
        
- 8,607
 - 16
 - 66
 - 90
 
                    59
                    
            votes
                
                12 answers
            
        Using Hibernate's ScrollableResults to slowly read 90 million records
I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it.  But there are 90 million rows and they are pretty big.  So it seemed like the following would be appropriate:
ScrollableResults results =…
        
        at.
        
- 50,922
 - 104
 - 292
 - 461
 
                    36
                    
            votes
                
                8 answers
            
        Is it possible to change argv or do I need to create an adjusted copy of it?
My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list.  I would like to filter them in place but I am pretty sure that messing with argv array itself,…
        
        ojblass
        
- 21,146
 - 22
 - 83
 - 132
 
                    33
                    
            votes
                
                8 answers
            
        large amount of data in many text files - how to process?
I have large amounts of data (a few terabytes) and accumulating... They are contained in many tab-delimited flat text files (each about 30MB). Most of the task involves reading the data and aggregating (summing/averaging + additional…
        
        hatmatrix
        
- 42,883
 - 45
 - 137
 - 231
 
                    29
                    
            votes
                
                9 answers
            
        Plotting of very large data sets in R
How can I plot a very large data set in R? 
I'd like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and calculate the summaries needed to make these plots? If so how?
        
        Daniel Arndt
        
- 2,268
 - 2
 - 17
 - 22
 
                    24
                    
            votes
                
                7 answers
            
        Efficiently storing 7.300.000.000 rows
How would you tackle the following storage and retrieval problem?
Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row:
id (unique row identifier)
entity_id (takes on values between 1 and 2.000.000…
        
        knorv
        
- 49,059
 - 74
 - 210
 - 294
 
                    24
                    
            votes
                
                2 answers
            
        JDBC Batch Insert OutOfMemoryError
I have written a method insert() in which I am trying to use JDBC Batch for inserting half a million records into a MySQL database:
public void insert(int nameListId, String[] names) {
    String sql = "INSERT INTO name_list_subscribers…
        
        craftsman
        
- 15,133
 - 17
 - 70
 - 86
 
                    21
                    
            votes
                
                2 answers
            
        Docker Data Volume Container - Can I share across swarm
I know how to create and mount a data volume container to multiple other containers using --volumes-from, but I do have a few questions regarding it's usage and limitations:
Situation: I am looking to use a data volume container to store user…
        
        deankarn
        
- 462
 - 2
 - 6
 - 17
 
                    21
                    
            votes
                
                4 answers
            
        what changes when your input is giga/terabyte sized?
I just took my first baby step today into real scientific computing today when I was shown a data set where the smallest file is 48000 fields by 1600 rows (haplotypes for several people, for chromosome 22). And this is considered tiny.
I write…
        
        Wang
        
- 3,247
 - 1
 - 21
 - 33
 
                    20
                    
            votes
                
                4 answers
            
        How to do page navigation for many, many pages? Logarithmic page navigation
What's the best way of displaying page navigation for many, many pages?
(Initially this was posted as a how-to tip with my answer included in the question.  I've now split my answer off into the "answers" section below).  
To be more…
        
        Doin
        
- 7,545
 - 4
 - 35
 - 37
 
                    20
                    
            votes
                
                2 answers
            
        Bad idea to transfer large payload using web services?
I gather that there basically isn't a limit to the amount of data that can be sent when using REST via a POST or GET.  While I haven't used REST or web services it seems that most services involve transferring limited amounts of data.  If you want…
        
        Marcus Leon
        
- 55,199
 - 118
 - 297
 - 429
 
                    18
                    
            votes
                
                6 answers
            
        How to avoid OOM (Out of memory) error when retrieving all records from huge table?
I am given a task to convert a huge table to custom XML file. I will be using Java for this job.
If I simply issue a "SELECT * FROM customer", it may return huge amount of data that eventually causing OOM. I wonder, is there a way i can process the…
        
        janetsmith
        
- 8,562
 - 11
 - 58
 - 76
 
                    16
                    
            votes
                
                5 answers
            
        Transferring large payloads of data (Serialized Objects) using wsHttp in WCF with message security
I have a case where I need to transfer large amounts of serialized object graphs (via NetDataContractSerializer) using WCF using wsHttp. I'm using message security and would like to continue to do so. Using this setup I would like to transfer…
        
        jpierson
        
- 16,435
 - 14
 - 105
 - 149
 
                    12
                    
            votes
                
                11 answers
            
        Fastest way to search a 1 GB+ string of data for the first occurrence of a pattern
There's a 1 gigabyte string of arbitrary data which you can assume to be equivalent to something like:
1_gb_string=os.urandom(1*gigabyte)
We will be searching this string, 1_gb_string, for an infinite number of fixed width, 1 kilobyte patterns,…
        
        user213060
        
- 1,249
 - 3
 - 19
 - 25
 
                    11
                    
            votes
                
                7 answers
            
        Fastest way for inserting very large number of records into a Table in SQL
The problem is, we have a huge number of records (more than a million) to be inserted into a single table from a Java application. The records are created by the Java code, it's not a move from another table, so INSERT/SELECT won't help.
Currently,…
        
        Iravanchi
        
- 5,139
 - 9
 - 40
 - 56