How to implement global rate limiting with Kubernetes NGINX ingress controller

Question

I am looking to implement global rate limiting to a production deployment on Azure in order to ensure that my application do not become unstable due to an uncontrollable volume of traffic(I am not talking about DDoS, but a large volume of legitimate traffic). Azure Web Application Firewall supports only IP based rate limiting.

I've looked for alternatives without to do this without increasing the hop count in the system. The only solution I've found is using limit_req_zone directive in NGINX. This does not give actual global rate limits, but it can be used to impose a global rate limit per pod. Following configmap is mounted to the Kubernetes NGINX ingress controller to achieve this.

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-ingress-ingress-nginx-controller
  namespace: ingress-basic
data:
  http-snippet : |
     limit_req_zone test zone=static_string_rps:5m rate=10r/m ;
  location-snippet: |
          limit_req zone=static_string_rps burst=20 nodelay;
          limit_req_status 429;

static_string_rps is a constant string and due to this all the requests are counted under a single keyword which provides global rate limits per pod.

This seems like a hacky way to achieve global rate limiting. Is there a better alternative for this and does Kubernetes NGINX ingress controller officially support this approach?(Their documentation says they support mounting configmaps for advanced configurations but there is no mention about using this approach without using an additional memcached pod for syncing counters between pods)

https://www.nginx.com/blog/rate-limiting-nginx/#:~:text=One%20of%20the%20most%20useful,on%20a%20log%E2%80%91in%20form. https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#global-rate-limiting

I doubt there would be any other approach apart from what is provided on NGINX ingress doc. If you want to implement a rate limit for legitimate traffic, you need to store some information to reject the traffic whenever it exceeds the given threshold. It can be IP, user information, or something else. So you would have to use some storage solution like Memcache. Even different ingress solutions also use approaches like these. — Nitishkumar Singh, Aug 11 '21 at 07:33
Yes. I agree with you. Do you think above approach is recommended for production? This can create per pod global rate limiting — Kalana Dananjaya, Aug 12 '21 at 06:17
I don't have a real example of this being used but I assume it should be production-ready. You can ask in k8 slack channel for any production implementation being used. — Nitishkumar Singh, Aug 12 '21 at 11:45
@JakubSiemaszko according to the answers I got from Kubernetes slack community, anything that requires global coordination for rate limiting is going to have a potentially severe bottleneck for performance and will create a single point of failure. Therefore using a solution like `memcached` for global rate limiting seem to be not recommended(Its not mentioned in any doc though). `limit_req_zone` is production ready and the above approach seems to be the recommended way to achieve some sort of global rate limiting(Although its not exactly global rate limiting) — Kalana Dananjaya, Aug 18 '21 at 07:31
yes. certainly. I do have a blog post. Give me sometime to summarize it to an answer — Kalana Dananjaya, Aug 18 '21 at 10:54
@JakubSiemaszko can you accept the answer if it covers all the details? — Kalana Dananjaya, Aug 25 '21 at 07:52

score 4 · Answer 1 · answered Aug 22 '21 at 05:15

According to Kubernetes slack community, anything that requires global coordination for rate limiting is going to have a potentially severe bottleneck for performance and will create a single point of failure. Therefore even if we do use an external solution to this would cause bottlenecks and hence it is not recommended.(However this is not mentioned in the docs)

According to them using limit_req_zone is a valid approach and it is officially supported by the Kubernetes NGINX Ingress controller community which means that it is production ready.

I suggest you use this module if you want to apply global rate limiting(Although its not exact global rate limiting). If you have multiple ingresses in your cluster, you can use the following approach to apply global rate limits per ingress.

Deploy the following ConfigMap in the namespace in which your K8 NGINX Ingress controller is present. This will create 2 counters with the keys static_string_ingress1 and static_string_ingress2.

NGINX Config Map

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-ingress-ingress-nginx-controller
  namespace: ingress-basic
data:
  http-snippet : |
     limit_req_zone test zone=static_string_ingress1:5m rate=10r/m ;
     limit_req_zone test zone=static_string_ingress2:5m rate=30r/m ;

Ingress Resource 1

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress-1
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/affinity: cookie
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/configuration-snippet: |
      limit_req zone=static_string_ingress1 burst=5 nodelay;                                                   
      limit_req_status 429;  
spec:
  tls:
  - hosts:
    - test.com
  rules:
  - host: test.com
    http:
      paths:
      - path: /
        backend:
          serviceName: test-service
          servicePort: 9443

Similary you can add a separate limit to the ingress resource 2 by adding the following configuration snippet to ingress resource 2 annotations.

Ingress resource 2

annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      limit_req zone=static_string_ingress2 burst=20 nodelay;                                                   
      limit_req_status 429;

Note that the keys static_string_ingress1 and static_string_ingress2 are static strings and all requests passing through the relevan ingress will be counted using one of they keys which will create the global rate limiting effect.

However, these counts are maintained separately by each NGINX Ingress controller pod. Therefore the actual rate limit will be defined limit * No. of NGINX pods

Further I have monitored the pod memory and CPU usage when using limit_req_zone module counts and it does not create a considerable increase in resource usage.

More information on this topic is available on this blog post I wrote: https://faun.pub/global-rate-limiting-with-kubernetes-nginx-ingress-controller-fb0453447d65

Please note that this explanation is valid for Kubernetes NGINX Ingress Controller(https://github.com/kubernetes/ingress-nginx) not to be confused with NGINX controller for Kubernetes(https://github.com/nginxinc/kubernetes-ingress)

How to implement global rate limiting with Kubernetes NGINX ingress controller

1 Answers1

Linked