Rails 4, Mongoid instead of ActiveRecord (but this should change anything for the sake of the question).
Let's say I have a MyModel domain class with some validation rules:
class MyModel
  include Mongoid::Document
  field :text, type: String
  field :type, type: String
  belongs_to :parent
  validates :text, presence: true
  validates :type, inclusion: %w(A B C)
  validates_uniqueness_of :text, scope: :parent # important validation rule for the purpose of the question
end
where Parent is another domain class:
class Parent
    include Mongoid::Document
    field :name, type: String
    has_many my_models
end
Also I have the related tables in the database populated with some valid data.
Now, I want to import some data from an CSV file, which can conflict with the existing data in the database. The easy thing to do is to create an instance of MyModel for every row in the CSV and verify if it's valid, then save it to the database (or discard it).
Something like this:
csv_rows.each |data| # simplified 
  my_model = MyModel.new(data) # data is the hash with the values taken from the CSV row
  if my_model.valid?
    my_model.save validate: false
  else
    # do something useful, but not interesting for the question's purpose
    # just know that I need to separate validation from saving
  end
end
Now, this works pretty smoothly for a limited amount of data. But when the CSV contains hundreds of thousands of rows, this gets quite slow, because (worst case) there's a write operation for every row.
What I'd like to do, is to store the list of valid items and save them all at the end of the file parsing process. So, nothing complicated:
valids = []
csv_rows.each |data|
  my_model = MyModel.new(data)
  if my_model.valid?  # THE INTERESTING LINE this "if" checks only against the database, what happens if it conflicts with some other my_models not saved yet?
    valids << my_model
  else
    # ...
  end
end
if valids.size > 0
  # bulk insert of all data
end
That would be perfect, if I could be sure that the data in the CSV does not contain duplicated rows or data that goes against the validation rules of MyModel.
My question is: how can I check each row against the database AND the valids array, without having to repeat the validation rules defined into MyModel (avoiding to have them duplicated)?
Is there a different (more efficient) approach I'm not considering?
 
    