I have a product Catalog object with upto 1 million products in it. Following code shows the Catalog class with some test code to populate 1 million dummy products for test purpose:
public class Catalog
{
    Random random = new Random();
    long Id { get; set; }
    public string Name { get; set; }
    public List<string> Products { get; set; }
    public Catalog()
    {
        Products = new List<string>();
        addProducts();
    }
    private void addProducts()
    {
        for (int i = 0; i < 1000000; i++)
        {                
            Products.Add(random.Next(0, 100000000).ToString());
        }
    }
}
I have about 300-600 of Catalog objects (with about 1 million products each) and need to check if there are common/same products between any 2 Catalogs. Just need to check. I don't want to find out which are the same products. Logic that I am using is something like this:
static bool SearchDuplicateProducts(Catalog catalogA, Catalog catalogB)
{
    var found = false;
    foreach (string product in catalogA.Products)
    {
        if (catalogB.Products.Contains(product))
        {
            found = true;
            break;
        }
    }
    return found;
}
Of course List<string> type for products is not the fastest way to search so I tried HashSet<string>. My tests showed about 200% increase in search speed in SearchDuplicateProducts() method when I used HashSet<> over List<> to hold Products.
I am not sure though if using HashSet<string> for Product list is the best or most efficient way to achieve what I am trying in SearchDuplicateProducts(). I want to know if there any way (by using third-party library, db, trie or an algorithm) that can give me better results: in terms of space and time complexity. If there is a choice between the two then I would prefer better time complexity.
I have checked similar questions:
- Best Way to compare 1 million List of object with another 1 million List of object in c#
- How to quickly search through a very large list of strings / records on a database
- C#: Memory-efficient search through 2 million objects without external dependencies
Thanks for your help.
 
    