I have a CSV file of nearly 2 million rows with 3 columns (item, rating, user). I am able to transfer the data into a 2D String array or list. However, my issue arises when I am trying to parse through the arrays to create CSV files from because the application stops and I do not know how long I am expected to wait for the program to finish running.
Basically, my end goal is to be able to parse through large CSV file, create a matrix in which each distinct item represents a row and each distinct user represents a column with the rating being at the intersection of the user and item. With this matrix, I then create a cosine similarity matrix with the rows and columns represented by items with their cosine similarity being at the intersection of the two distinct items.
I already know how to create CSV files, but my issue falls within the large loop structures when creating other arrays for the purposes of comparison.
Is there a better way to be able to process and calculate large amounts of data so that my application doesn't freeze?
My current program does the following:
- Take large CSV file
- Parse through large CSV file
- Create 2D array resembling original CSV file
- Create list of distinct items (each distinct item being represented by an index number)
- Create list of distinct users (each distinct user being represented by an index number)
- Create 2D array of with row indexes representing items, column indexes representing users resulting in array[row][column] = rating
- Calculate cosine similarity of two matrices
- Create 2D array with both row and column indexes representing items resulting in array[row] [column] = cosine similarity
I noticed that my program freezes when it reaches steps 4 and 5 If I remove steps 4 and 5, it will still freeze at step 6
I have attached that portion of my code
      FileInputStream stream = null;
      Scanner scanner = null;
      try{
         stream = new FileInputStream(fileName);
         scanner = new Scanner(stream, "UTF-8");
         while (scanner.hasNextLine()){
             String line = scanner.nextLine();
             if (!line.equals("")){
                String[] elems = line.split(",");
                if (itemList.isEmpty()){
                  itemList.add(elems[0]);
                }
                else{
                  if (!itemList.contains(elems[0]))
                     itemList.add(elems[0]);
                }
                if (nameList.isEmpty()){
                  nameList.add(elems[2]);
                }
                else{
                  if (!nameList.contains(elems[2]))
                     nameList.add(elems[2]);
                }
                for (int i = 0; i < elems.length; i++){
                   if (i == 1){
                     if (elems[1].equals("")){
                        list.add("0");
                      }
                      else{
                        list.add(elems[1]);
                      }
                   }
                   else{
                     list.add(elems[i]);
                   }
                }
             }
         } 
         if (scanner.ioException() != null){
            throw scanner.ioException();
         }
      }
      catch (IOException e){
         System.out.println(e);
      }
      finally{
         try{
            if (stream != null){
               stream.close();
            }
         }
         catch (IOException e){
            System.out.println(e);
         }
         if (scanner != null){
            scanner.close();
         }
      }
 
     
    