My advice is to follow the second approach: Parallelize only the outer loop, and keep the inner loops sequential (for/foreach). Don't place Parallel.ForEach loops the one inside the other. The reasons are:
The parallelization adds overhead. Each Parallel loop has to synchronize the enumeration of the source, start Tasks, watch cancellation/termination flags etc. By nesting Parallel loops you are paying this cost multiple times.
Limiting the degree of parallelism becomes harder. The MaxDegreeOfParallelism option is not an ambient property that affects child loops. It limits only a single loop. So if you have an outer Parallel loop with MaxDegreeOfParallelism = 4 and an inner Parallel loop also with MaxDegreeOfParallelism = 4, the inner body might be invoked concurrently 16 times (4 * 4). It is still possible to enforce a sensible upper limit by configuring all loops with the same TaskScheduler, and specifically with the ConcurrentScheduler property of a shared ConcurrentExclusiveSchedulerPair instance.
In case of an exception you'll get a deeply nested AggregateException, that you'll have to Flatten.
I would also suggest considering a third approach: do a single Parallel loop on a flattened source sequence. For example instead of:
ParallelOptions options = new() { MaxDegreeOfParallelism = X };
Parallel.ForEach(NetworkInterface.GetAllNetworkInterfaces(), options, ni =>
{
foreach (UnicastIPAddressInformation ip in ni.GetIPProperties().UnicastAddresses)
{
// Do stuff with ni and ip
});
});
...you could do this:
var query = NetworkInterface.GetAllNetworkInterfaces()
.SelectMany(ni => ni.GetIPProperties().UnicastAddresses, (ni, ip) => (ni, ip));
Parallel.ForEach(query, options, pair =>
{
(ni, ip) = pair;
// Do stuff with ni and ip
});
This approach parallelizes only the Do stuff. The calling of ni.GetIPProperties() is not parallelized. The IP addresses are fetched sequentially, for one NetworkInterface at a time. It also intensifies the parallelization of each NetworkInterface, which might not be what you want (you might want to spread the parallelization among many NetworkInterfaces). So this approach has characteristics that make it compelling for some scenarios, and unsuitable for others.
One other case worth mentioning is when the objects in the outer and inner sequences are of the same type, and have a parent-child relationship. In that case check out this question: Parallel tree traversal in C#.