I am trying to do something with SIMD calculations. I have come quite far in my problem where I then get stuck and wonder how this could be done.
I think the easiest way is to describe this step by step what I have done:
I use Vector128<byte> which then handles 16 bytes at a time
- I have created a 2 dimensional array(array2D) with 9 columns and 16 rows per column. I have put the numbers in a sequence of: 0 and 2. This means that for example Row: 0 has only 0s. Row: 1 has only 2s etc. 
- Now I - Avx.LoadVector128for each column/dimension which gives: 9- Vector128<byte>which I put in:- dimensionLIST
- Now the task is to count how many of the numbers: - 0 and 2that could be found on EACH ROW. (We have 16 rows). This information is in the end stored in:- counts[0]
- Looking at the result of - counts[0]in the- MessageBox. Below is shown:- MessageBox.Show(counts[0]);
(represents 16 rows)
[0,9,0,9,0,9,0,9,0,9,0,9,0,9,0,9]
9, 2s were found on every other row.
Now the goal is to count how many "9" that were found in:
 [0,9,0,9,0,9,0,9,0,9,0,9,0,9,0,9] which is 8.
So somehow we want the the integer 8 as Scalar somehow here?
    public unsafe static void SIMDfunction()
    {
        //Create dummy values
        byte[,] array2D = new byte[9, 16]; byte num = 0;
        for (int i = 0; i < 9; i++)
        {
            for (int i2 = 0; i2 < 16; i2++)
            {
                array2D[i, i2] = num;
                if (num == 0) { num = 2; } else { num = 0; }
            }
        }
        /*----------------------------------------------------------------------------------------*/
        unsafe
        {
            //Below starts SIMD calculations!
            fixed (byte* ptr = array2D)
            {
                //Add all 9 dimensions as Vector128
                List<Vector128<byte>> dimensionLIST = new List<Vector128<byte>>();
                for (int i = 0; i < 9; i++)
                {
                    byte* featuredimension = &*((byte*)(ptr + i * 16)); //This gives the first dimension with start: 0
                    dimensionLIST.Add(Avx.LoadVector128(&featuredimension[0])); //add "featuredimension" as a vector of the 16 next numbers: [0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3]
                }
                //Now count how many of: 0,1,2,3 are found in total in all "dimensionLIST" together?
                Span<Vector128<byte>> counts = stackalloc Vector128<byte>[1];
                Span<Vector128<UInt64>> sum64 = stackalloc Vector128<UInt64>[1];
                byte nr2 = 2; byte nr3 = 9; 
                for (int i = 0; i < dimensionLIST.Count; i++) //Each column
                {
                    //Compare: dimensionLIST[i] with Vector128 val to find out how many matches of 2 in this loop
                    //[0,2,0,2,0,2,0,2,0,2,0,2,0,2,0,2], [2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]
                    var match = Avx.CompareEqual(dimensionLIST[i], Vector128.Create(nr2)); //Create Vector128 for numbers: 2
                    counts[0] = Avx.Subtract(counts[0], match);
                }
                //STEP1: Show result on how many 2s are found == 9 occurences of "2"!
                var result = Avx.CompareEqual(Vector128.Create(nr3), counts[0]); //counts[0]: [0,9,0,9,0,9,0,9,0,9,0,9,0,9,0,9] (In total 9 2s are found on those indexes)
                //result:[0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255] Puts - 1 where integer == 9
                MessageBox.Show(result.ToString());
                //Now the goal is to count how many "9" that were found in: [0,9,0,9,0,9,0,9,0,9,0,9,0,9,0,9] which is 8.
                //So somehow we want the the integer 8 as Scalar somehow here?
            }
        }
    }
 
    