I wanted to optimize recursion with openMP. So, I started from this question: best-way-to-parallelize-this-recursion-using-openmp
While looking to optimize recursive functions I was first interested in openMP. Starting from the code below that I found there (https://en.cppreference.com/w/cpp/chrono) :
#include <iostream>
#include <chrono>
 
long fibonacci(unsigned n)
{
    if (n < 2) return n;
    return fibonacci(n-1) + fibonacci(n-2);
}
 
int main()
{
    auto start = std::chrono::steady_clock::now();
    std::cout << "f(42) = " << fibonacci(42) << '\n';
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
Compiled with g++ without any optimization it gives the result:
f(42) = 267914296
elapsed time: 1.88232s
Then I used openMP similarly to the answer upside... but I had to stop the process because it took a very very long time. I do not know a lot about openMP recursion parallelism since I just use it to optimize for loops in my code... Also, I found something in gcc documentation :
__attributes__((const)) 
Then I added it to my openMP "optimized" version but all I obtained is a Core Dump !!!
So I removed my omp pragmas to check that the core dump is due to my code or something else...
Then the code became:
#include <iostream>
#include <chrono>
__attribute__ ((const))
long fibonacci(unsigned n)
{
   
   int long r[2];
   if (n < 2) return n;
   r[0]=fibonacci(n-1);
   r[1]=fibonacci(n-2);
   return (r[0]+r[1]);
}
 
int main()
{
    auto start = std::chrono::steady_clock::now();
    std::cout << "f(42) = " << fibonacci(42) << '\n';
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
I compiled it with the following command:
g++ fibonacci.cpp -o fibonacci -Wall -O2 -march=native
Now it take a lot less time to perform the fibonacci computation of 42:
f(42) = 267914296
elapsed time: 0.00106504s
in power save mode and in performance mode it gives
f(42) = 267914296
elapsed time: 0.000187806s
Here a lscpu of my computer (if somebody want to compare results):
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            9
    CPU(s) scaling MHz:  95%
    CPU max MHz:         4500,0000
    CPU min MHz:         800,0000
    BogoMIPS:            8403,00
Now, I do not know how much time it could take with openMP because I was not able to obtain a satisfactory result... I hope somebody will find a way to add omp pragmas to this kind of recursive cases.
As required, here is the omp version 1 of this case:
#include <iostream>
#include <chrono>
#include <omp.h>
__attribute__ ((const))
long fibonacci(unsigned n)
{
   
    int long a, b;
    if (n < 2) return n;
    #pragma omp parallel
    #pragma omp single nowait
    {         
       #pragma omp task shared(a)
       a=fibonacci(n-1);
       #pragma omp task shared(b)
       b=fibonacci(n-2);
       #pragma omp taskwait
    }
    return (a+b);
}
 
int main()
{
    auto start = std::chrono::steady_clock::now();
    std::cout << "f(42) = " << fibonacci(42) << '\n';
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
And the version that drive me to a "silly code":
#include <iostream>
#include <chrono>
#include <omp.h>
__attribute__ ((const))
long fibonacci(unsigned n)
{
   
   int long r[2];
   int i;
   if (n < 2) return n;
   #pragma omp parallel
   #pragma omp single
   for(i=0;i<2;++i) 
   {
       #pragma omp task shared(r)   
       r[i]=fibonacci(n-i-1);
   }      
    
    return (r[0]+r[1]);
}
 
int main()
{
    auto start = std::chrono::steady_clock::now();
    std::cout << "f(42) = " << fibonacci(42) << '\n';
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
The build command for these omp versions is :
g++ omp_fibonacci_for.cpp -o omp_fibonacci_for -Wall -fopenmp
Both of them seems to loop "for ever" (fortunately Ctrl-C worked fine). But, while writing the end of this question, the second version with "for", process successfully:
f(42) = 267914296
elapsed time: 480.953s
To be clear I'm not looking for "best perf", my objective is to try to obtain a omp version that process in a reasonable time with the goal of finding the way to use openMP on recursive code. Honestly I was expecting a better execution time than 480s ! I was also surprised by the time it took in the "poor debug" version that took less 0.001s to process...
The thing is now I'd like to know if the way I use openMP is acceptable if I apply this model with "havier" tasks or if I do something wrong.