Context:
I wanted to test the speedup that we can achieve by using Macbook Pro 2019 GPU for tensorflow operations.
As advised, in the following snippet, I am using tensorflow library's in-built function tf.multiply() to fetch the output of multiplication operation of a tensor with a constant:
import tensorflow as tf
tensor = tf.constant([[1, 2],
                      [3, 4]])
def cpu():
    with tf.device('/CPU:0'):
        tensor_1 = tf.multiply(tensor, 2)
        return tensor_1
def gpu():
    with tf.device('/device:GPU:0'):
        tensor_1 = tf.multiply(tensor, 2)
        return tensor_1
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()
import time
n = 10000
start = time.time()
for i in range(n):
    cpu()
end = time.time()
cpu_time = end - start
start = time.time()
for i in range(n):
    gpu()
end = time.time()
gpu_time = end - start
print('GPU speedup over CPU: {}x'.format((cpu_time / gpu_time)))
Results:
I could achieve a maximum speedup of 1X.
My Question:
Ideally, if the tf.multiply() is optimised to run on the GPU, why am I not getting a better speedup of lets say 2X to 10X?
System & environment:
- MacOS Ventura 13.3.1
 - Processor - 2.6 GHz 6-Core Intel Core i7
 - Graphics - AMD Radeon Pro 5300M 4 GB, Intel UHD Graphics 630 1536 MB
 - Memory - 16 GB 2667 MHz DDR4
 tensorflow-macospython package version = 2.9.0tensorflow-metalpython package version = 0.6.0- Python version - 3.8.16