I have deployed some simple services as a proof of concept: an nginx web server patched with https://stackoverflow.com/a/8217856/735231 for high performance.
I also edited /etc/nginx/conf.d/default.conf so that the line listen 80; becomes listen 80 http2;.
I am using the Locust distributed load-testing tool, with a class that swaps the requests module for hyper in order to test HTTP/2 workloads. This may not be optimal in terms of performance, but I can spawn many locust workers, so it's not a huge concern.
For testing, I spawned a cluster on GKE of 5 machines, 2 vCPU, 4GB RAM each, installed Helm and the charts of these services (I can post them on a gist later if useful).
I tested Locust with min_time=0 and max_time=0 so that it spawned as many requests as possible; with 10 workers against a single nginx instance.
With 10 workers, 140 "clients" total, I get ~2.1k requests per second (RPS).
10 workers, 260 clients: I get ~2.0k RPS
10 workers, 400 clients: ~2.0k RPS
Now, I try to scale horizontally: I spawn 5 nginx instances and get:
10 workers, 140 clients: ~2.1k RPS
10 workers, 280 clients: ~2.1k RPS
20 workers, 140 clients: ~1.7k RPS
20 workers, 280 clients: ~1.9k RPS
20 workers, 400 clients: ~1.9k RPS
The resouce usage is quite low as portrayed by kubectl top pod (this is for 10 workers, 280 clients; nginx is not resource-limited, locust workers are limited to 1 CPU per pod):
user@cloudshell:~ (project)$ kubectl top pod
NAME                           CPU(cores)   MEMORY(bytes)
h2test-nginx-cc4d4c69f-4j267   34m          68Mi
h2test-nginx-cc4d4c69f-4t6k7   27m          68Mi
h2test-nginx-cc4d4c69f-l942r   30m          69Mi
h2test-nginx-cc4d4c69f-mfxf8   32m          68Mi
h2test-nginx-cc4d4c69f-p2jgs   45m          68Mi
lt-master-5f495d866c-k9tw2     3m           26Mi
lt-worker-6d8d87d6f6-cjldn     524m         32Mi
lt-worker-6d8d87d6f6-hcchj     518m         33Mi
lt-worker-6d8d87d6f6-hnq7l     500m         33Mi
lt-worker-6d8d87d6f6-kf9lj     403m         33Mi
lt-worker-6d8d87d6f6-kh7wt     438m         33Mi
lt-worker-6d8d87d6f6-lvt6j     559m         33Mi
lt-worker-6d8d87d6f6-sxxxm     503m         34Mi
lt-worker-6d8d87d6f6-xhmbj     500m         33Mi
lt-worker-6d8d87d6f6-zbq9v     431m         32Mi
lt-worker-6d8d87d6f6-zr85c     480m         33Mi
I portrayed this test on GKE for easier replication, but I have come to the same results in a private-cloud cluster.
Why does it seem that it does not matter how many instances I spawn of a service?
UPDATE: As per the first answer, I'm updating information with information on the nodes and on what happens with a single Locust worker.
1 worker, 1 clients: 22 RPS
1 worker, 2 clients: 45 RPS
1 worker, 4 clients: 90 RPS
1 worker, 8 clients: 174 RPS
1 worker, 16 clients: 360 RPS
32 clients: 490 RPS
40 clients: 480 RPS (this seems over max. sustainable clients per worker)
But above all, it seems that the root problem is that I'm at the limit of capacity:
user@cloudshell:~ (project)$ kubectl top pod
NAME                                 CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
gke-sc1-default-pool-cbbb35bb-0mk4   1903m        98%       695Mi           24%
gke-sc1-default-pool-cbbb35bb-9zgl   2017m        104%      727Mi           25%
gke-sc1-default-pool-cbbb35bb-b02k   1991m        103%      854Mi           30%
gke-sc1-default-pool-cbbb35bb-mmcs   2014m        104%      776Mi           27%
gke-sc1-default-pool-cbbb35bb-t6ch   1109m        57%       743Mi           26%