You need to make sure your parallel processes do not share any state.
For example, in the case of the factorial, I would do the following:
- set a degree of parallelism (DOP) - usually the number of processors on your machine for compute-bound operations
 
- divide the problem into independent subsets 
 
- process each subset in parallel
 
- aggregate the results obtained from the parallel processes
 
This is somehow a simplified Map-Reduce.
The problem consists in multiplying together a set numbers. One way to divide this set into subsets is to use N parallel for loops where each start at a value i (where 0 < i <= N) with a step of N (and N = DOP).
Here's the code that does that:
/// <summary>
/// The max number of parallel tasks
/// </summary>
static readonly int DegreeOfParallelism = Environment.ProcessorCount;
public BigInteger Factorial(long x)
{
    // Make as many parallel tasks as our DOP
    // And make them operate on separate subsets of data
    var parallelTasks =
        Enumerable.Range(1, DegreeOfParallelism)
                    .Select(i => Task.Factory.StartNew(() => Multiply(x, i),
                                 TaskCreationOptions.LongRunning))
                    .ToArray();
    // after all tasks are done...
    Task.WaitAll(parallelTasks);
    // ... take the partial results and multiply them together
    BigInteger finalResult = 1;
    foreach (var partialResult in parallelTasks.Select(t => t.Result))
    {
        finalResult *= partialResult;
    }
    return finalResult;
}
/// <summary>
/// Multiplies all the integers up to upperBound, with a step equal to DOP
/// starting from a different int
/// </summary>
/// <param name="upperBoud"></param>
/// <param name="startFrom"></param>
/// <returns></returns>
public BigInteger Multiply(long upperBound, int startFrom)
{
    BigInteger result = 1;
    for (var i = startFrom; i <= upperBound; i += DegreeOfParallelism)
        result *= i;
    return result;
}
On my machine this computes 100000! in about 30 seconds and the result is what Wolfram Alpha says it should be.
Update
After running a few more tests, I discovered something which I didn't expect: printing out the 100000! result to the console takes ~18 seconds (the result has 456574 digits).
The results of the 100000! computation alone (without printing the number) the are:
- ~10 seconds for parallel execution
 
- ~16 seconds for sequential execution