In C++17 the standard algorithms are available in a parallel version. You specify the execution policy (std::seq), parallel (std::par), or parallel and vectorised (std::par_unseq), and it will do the multithreading for you in the background.
So for what you want to do you can make use of std::transform with a lambda function to capture the operation you want to perform on every element of your input vector, and the results are put in the results vector (size has to be same):
#include <execution>
#include <algorithm>
#include <vector>
int compute_something(int i, int j) {
    return i * j;
}
int main()
{
    auto params = std::vector<int>(1000, 5);
    std::vector<int> results(1000, 0);
    std::transform(std::execution::par_unseq, params.begin(), params.end(),
        results.begin(), [](int i) { return compute_something(i, 4); }
    );
}
Of course, it is possible embed the computation within lambda for such a simple calculation as you have in compute_something. Then the code becomes:
std::transform(std::execution::par_unseq, params.begin(), params.end(),
        results.begin(), [](int i) { return i * 4; }
Not all compilers have implemented execution policy yet. So if your compiler doesn't support it you can do it another way: use std::async and process the input vector in chunks. To do this you would have to define a new function that takes iterators and returns result vector. Then you can combine the results at the end.
Example:
#include <future>
#include <vector>
using Iter = std::vector<int>::iterator;
std::vector<int> parallel_compute(Iter beg, Iter end)
{
    std::vector<int> results;
    //Reserve memory to avoid reallocations
    auto size = std::distance(beg, end);
    results.reserve(size);
    for (Iter it = beg; it != end; ++it)
    {
        results.push_back(*it * 4); //Add result to vector
    }
    return results;
}
int main()
{
    const int Size = 1000;
    //Chunk size
    const int Half = Size / 2;
    //Input vector
    auto params = std::vector<int>(Size, 5);
    //Create futures
    auto fut1 = std::async(std::launch::async, parallel_compute, params.begin(), params.begin()+ Half);
    auto fut2 = std::async(std::launch::async, parallel_compute, params.begin()+ Half, params.end());
    //Get results
    auto res1 = fut1.get();
    auto res2 = fut2.get();
    //Combine results into one vector
    std::vector<int> results;
    results.insert(results.end(), res1.begin(), res1.end());
    results.insert(results.end(), res2.begin(), res2.end());
}
The launch::async policy will ensure two threads are created. However, I wouldn't create too many threads - one per core is a reasonable strategy. You could make use of std::thread::hardware_concurrency() to get number of concurrent threads supported by system. Creating threads and managing them introduces some overhead and can be counterproductive if you create too many.
Edit:
To avoid expensive allocations for individual small vectors, we can create a result vector at the start and pass iterators to the result range for each parallel invocation of parallel_compute. Since each thread will be accessing a different part of the result vector, we don't need synchronisation:
#include <future>
#include <vector>
using Iter = std::vector<int>::iterator;
void parallel_compute(Iter beg, Iter end, Iter outBeg)
{
    for (Iter it = beg; it != end; ++it)
    {
        *outBeg++ = (*it * 4); //Add result to vector
    }
}
int main()
{
    const int Size = 1000;
    //Chunk size
    const int Half = Size / 2;
    //Input vector
    auto params = std::vector<int>(Size, 5);
    //Output vector
    std::vector<int> results(Size, 0);
    //Create futures
    auto fut1 = std::async(std::launch::async, parallel_compute, params.begin(), params.begin() + Half, results.begin());
    auto fut2 = std::async(std::launch::async, parallel_compute, params.begin() + Half, params.end(), results.begin() + Half);
    //Get results
    fut1.wait();
    fut2.wait();
}