I think it's better to use _mm256_cmp_ps for your question. I have implemented the following program for this purpose. This is more than what you want. If you want to save ones you should set all mask elements to 1, but if you want to save another number you can change the mask value to whatever you want.
//gcc 6.2, Linux-mint, Skylake
#include <stdio.h>
#include <x86intrin.h>
float __attribute__(( aligned(32))) f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 1.0};
// float __attribute__(( aligned(32))) r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0}
// in C++11, use alignas(32). Or C11 _Alignas(32), instead of GNU C __attribute__.
void printVecps(__m256 vec)
{
float tempps[8];
_mm256_store_ps(&tempps[0], vec);
printf(" [0]=%3.2f, [1]=%3.2f, [2]=%3.2f, [3]=%3.2f, [4]=%3.2f, [5]=%3.2f, [6]=%3.2f, [7]=%3.2f \n",
tempps[0],tempps[1],tempps[2],tempps[3],tempps[4],tempps[5],tempps[6],tempps[7]) ;
}
int main()
{
__m256 mask = _mm256_set1_ps(1.0), vec1, vec2, vec3;
vec1 = _mm256_load_ps(&f[0]); printf("vec1 : ");printVecps(vec1); // load vector values from f[0]-f[7]
vec2 = _mm256_cmp_ps ( mask, vec1, _CMP_LT_OS /*0x1*/);
printf("vec2 : ");printVecps(vec2); // compare them to mask (less)
vec3 = _mm256_min_ps (vec2 , mask); printf("vec3 : ");printVecps(vec3); // select minimum from mask and compared results
return 0;
}
The output for mask = {1,1,1,1,1,1,1,1} is :
vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00
vec2 : [0]=-nan, [1]=0.00, [2]=-nan, [3]=-nan, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00
vec3 : [0]=1.00, [1]=0.00, [2]=1.00, [3]=1.00, [4]=0.00, [5]=1.00, [6]=1.00, [7]=0.00
And for mask = {2,2,2,2,2,2,2,2} is :
vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00
vec2 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00
vec3 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=2.00, [6]=2.00, [7]=0.00
This depends on the non-commutative behaviour of _mm256_min_ps with NaNs to replace the NaN elements with 1.0. NaN > 1.0 : NaN : 1.0 = 1.0, because NaN > anything is always false.
Beware that gcc before 7.0 treats the 128b _mm_min_ps intrinsic as commutative even without -ffast-math (even though it knows the minps instruction isn't). Use an up-to-date gcc, or make sure that gcc chooses to compile your code with the operands in the order needed by this algorithm. (Or use clang). It's possible that gcc won't ever swap the operands with AVX, only with SSE (to avoid extra movapd instructions), but the safest thing is to use gcc7 or later.