I have been reading quite a lot about Parallel .net 4 and I have to say that I am a bit confused when to use it.
This is my common scenario I have been given a task to migrate lots of xml files to a database.
I typically I have to
- Read Xml Files (100.000) and more and order them numerically (each file is named 1.xml, 2.xml etc.).
- Save to a database.
I thought the above was a perfect candidate for parallel programming.
Conceptually I would like to process many files at a times.
I am currently doing this:
private ResultEventArgs  progressResults=new ResultEventArgs();
public void ExecuteInParallelTest()
{
    var sw=new Stopwatch();
    sw.Start();
    int index = 0;
    cancelToken = new CancellationTokenSource();
    var parOpts = new ParallelOptions();
    parOpts.CancellationToken = cancelToken.Token;
    parOpts.MaxDegreeOfParallelism = Environment.ProcessorCount;  //It this correct?
    FileInfo[] files = myDirectory.EnumerateFiles("*.xml").ToArray();//Is this faster?
    TotalFiles = files.Count();
    try
    {
        Task t1 = Task.Factory.StartNew(() =>
        {
            try
            {
                Parallel.ForEach(files, parOpts, (file, loopState) =>
                {
                    if (cancelToken.Token.IsCancellationRequested)
                    {
                        cancelToken.Token.ThrowIfCancellationRequested();
                    }
                    index = Interlocked.Increment(ref index);
                    ProcessFile(file,index);
                                progressResults.Status=InProgress                                   
                    OnItemProcessed(TotalFiles,index,etc..);
                });
            }
            catch (OperationCanceledException ex)
            {
                OnOperationCancelled(new progressResults
                    {
                        progressResults.Status=InProgress                               
                        progressResults.TotalCount = TotalFiles;
                        progressResults.FileProcessed= index;
                        //etc..                                  
                    });
            }
            //ContinueWith is used to sync the UI when task completed.
        }, cancelToken.Token).ContinueWith((result) => OnOperationCompleted(new ProcessResultEventArgs
            {
                        progressResults.Status=InProgress
                        progressResults.TotalCount = TotalFiles;
                        progressResults.FileProcessed= index;
                        //etc..
            }), new CancellationTokenSource().Token, TaskContinuationOptions.None, TaskScheduler.FromCurrentSynchronizationContext());
    }
    catch (AggregateException ae)
    {
        //TODO:
    }
   }
My Questions: I am using .net 4.0 Is using Parallel the best/simpler way to speed up the processing of these files. Is the above psudo code good enough or Am I missing vital stuff,locking etc...
The most important question is: Forgetting the "ProcessFile" as I cannot optmize that as I have no control Is there room for optmisation
Should I partition the files in chunks eg 1-1000 - 1001-2000-2001-3000 would that improve performance (how do you do that)
Many thanks for any replies or link/code snippet that can help me understand better how I can improve the above code.
 
    