Socket.io scaling issue

Question

I have 3 nodejs apps running on a GCP compute engine instance(2cpu, 2GB ram, ubuntu 20.04) with Nginx reverse proxy. One of them is a socket.io chat server. The socket.io app uses @socket.io/cluster-adapter to utilize all available CPU cores. I followed this tutorial to update the Linux settings to get maximum number of connections. Here is the output of ulimit command,

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7856
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 500000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7856
virtual memory          (kbytes, -v) unlimited

cat /proc/sys/fs/file-max
2097152

/etc/nginx/nginx.conf

user www-data;
worker_processes auto;
worker_rlimit_nofile 65535;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
        worker_connections 30000;
        # multi_accept on;
}
...

/etc/nginx/sites-available/default

...
//socket.io part
location /socket.io/ {
         proxy_set_header X-Real-IP $remote_addr;
         proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
         proxy_set_header Host $http_host;
         proxy_set_header X-NginX-Proxy false;
         proxy_pass http://localhost:3001/socket.io/;
         proxy_redirect off;
         proxy_http_version 1.1;
         proxy_set_header Upgrade $http_upgrade;
         proxy_set_header Connection "upgrade";
        }
...

My chat server code,

const os = require("os");
const cluster = require("cluster");
const http = require("http");
const { Server } = require("socket.io");
const { setupMaster, setupWorker } = require("@socket.io/sticky");
const { createAdapter, setupPrimary } = require("@socket.io/cluster-adapter");
const { response } = require("express");
const PORT = process.env.PORT || 3001;
const numberOfCPUs = os.cpus().length || 2;

if (cluster.isPrimary) {
  const httpServer = http.createServer();
  // setup sticky sessions
  setupMaster(httpServer, {
    loadBalancingMethod: "least-connection", // either "random", "round-robin" or "least-connection"
  });

  // setup connections between the workers
  setupPrimary();
  cluster.setupPrimary({
    serialization: "advanced",
  });

  httpServer.listen(PORT);

  for (let i = 0; i < numberOfCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", (worker) => {
    console.log(`Worker ${worker.process.pid} died`);
    cluster.fork();
  });
} 
//worker process
else {
  const express = require("express");
  const app = express();
  const Chat = require("./models/chat");
  const mongoose = require("mongoose");
  const request = require("request"); //todo remove
  var admin = require("firebase-admin");
  var serviceAccount = require("./serviceAccountKey.json");
  const httpServer = http.createServer(app);
  const io = require("socket.io")(httpServer, {
    cors: {
      origin: "*",
      methods: ["GET", "POST"],
    },
    transports: "websocket",
  });

  mongoose.connect(process.env.DB_URL, {
    authSource: "admin",
    user: process.env.DB_USERNAME,
    pass: process.env.DB_PASSWORD,
  });

  app.use(express.json());

  app.get("/", (req, res) => {
    res
      .status(200)
      .json({ status: "success", message: "Hello, I'm your chat server.." });
  });

  // use the cluster adapter
  io.adapter(createAdapter());
  // setup connection with the primary process
  setupWorker(io);

  io.on("connection", (socket) => {
    activityLog(
      "Num of connected users: " + io.engine.clientsCount + " (per CPU)"
    );
    ...
    //chat implementations
  });
}

Load test client code,

const { io } = require("socket.io-client");

const URL = //"https://myserver.com/"; 
const MAX_CLIENTS = 6000;
const CLIENT_CREATION_INTERVAL_IN_MS = 100;
const EMIT_INTERVAL_IN_MS = 300; //1000;

let clientCount = 0;
let lastReport = new Date().getTime();
let packetsSinceLastReport = 0;

const createClient = () => {
  const transports = ["websocket"];

  const socket = io(URL, {
    transports,
  });

  setInterval(() => {
    socket.emit("chat_event", {});
  }, EMIT_INTERVAL_IN_MS);

  socket.on("chat_event", (e) => {
    packetsSinceLastReport++;
  });

  socket.on("disconnect", (reason) => {
    console.log(`disconnect due to ${reason}`);
  });

  if (++clientCount < MAX_CLIENTS) {
    setTimeout(createClient, CLIENT_CREATION_INTERVAL_IN_MS);
  }
};

createClient();

const printReport = () => {
  const now = new Date().getTime();
  const durationSinceLastReport = (now - lastReport) / 1000;
  const packetsPerSeconds = (
    packetsSinceLastReport / durationSinceLastReport
  ).toFixed(2);

  console.log(
    `client count: ${clientCount} ; average packets received per second: ${packetsPerSeconds}`
  );

  packetsSinceLastReport = 0;
  lastReport = now;
};

setInterval(printReport, 5000);

As you can see from the code, I'm only using websocket for transports. So, it should be able to serve up to 8000 connections as per this StackOverflow answer. But when I run the load test, the server becomes unstable after 1600 connections. And CPU usage goes up to 90% and memory usage up to 70%. I couldn’t find anything in the Nginx error log. How can increase the number of connections to at least 8000? Should I upgrade the instance or change any Linux settings? Any help would be appreciated.

UPDATE I removed everything related to clustering and ran it again as a regular single-threaded nodejs app. This time, the result was a little better, 2800 stable connections (CPU usage 40%., memory usage 50%). Please note that I'm not performing any disk I/O during the test.

This is probably not related to your question but why do you need sticky session ? I remembered I had a hard to make it work and with ws only you don't it. Check my question here https://stackoverflow.com/questions/46891819/how-to-make-sticky-session-works-with-socket-io-w-or-w-o-pm2 — Qiulang, Sep 21 '22 at 12:17
So, if I remove sticky session and use only websocket, will the workers be able to communicate to each other? It's a chat application and the users may connect to different worker processes. That's why I use sticky session, so that user A can send message to user B who is connected on a different worker. — Sujith S Manjavana, Sep 21 '22 at 13:00
YES. You may need https://socket.io/docs/v4/redis-adapter/ but I see you already use node cluster. I am not sure how to make these 2 work together. — Qiulang, Sep 21 '22 at 13:41
There are an important point here. Looks your express server is not scalable as you need. *Nginx receives incoming traffic and deliver to "more idle" instance of express. May you need create 8 instances eg. From port 3001 to 3009 they each one should handle 1000 clients. To ensure communication across sockets.io you can gracefully create a common route to all subscriptions. So your nginx have role to distribute to all express instances clientes. This is a very nice case. Tell us about your progress — Aloiso Junior, Sep 21 '22 at 23:49
@AloisoJunior Maybe I should give it a try. Could you please explain how I can do that with Nginx and pm2? — Sujith S Manjavana, Sep 23 '22 at 07:20
It probably does not need to listen port 3001 to 3009, there are many examples of running nignx & pm2 you can google, start from pm2's own document https://pm2.keymetrics.io/docs/tutorials/pm2-nginx-production-setup — Qiulang, Sep 23 '22 at 10:19
@Qiulang I think my current setup is almost similar to the example you provided. — Sujith S Manjavana, Sep 23 '22 at 11:30
Here a good example, https://github.com/socketio/socket.io/tree/main/examples/cluster-nginx create more than one instance of node server with nginx balancing incoming traffic and redis adapter to ensure clients communication across their nodes server. — Aloiso Junior, Sep 23 '22 at 12:55
@AloisoJunior I removed everything related to clustering and ran it again as a regular single-threaded nodejs app. This time, the result was a little better, 2800 stable connections (CPU usage 40%., memory usage 50%). — Sujith S Manjavana, Sep 23 '22 at 20:35
Can you show your client connection code? How are you making socket connections with the server? — Rahul Sharma, Sep 25 '22 at 07:24
@JamesRisner I'm not sure about that. Why I'm getting more connections when running without clustering? — Sujith S Manjavana, Sep 25 '22 at 15:51
Have you tried setting `upgrade` to false while creating the client? `const socket = io(URL, { upgrade: false, transports });` — Rahul Sharma, Sep 25 '22 at 17:10

Tibic4 · Answer 1 · 2022-09-27T16:52:19.980

You are using the cluster adapter, which is not meant to be used with sticky sessions. You should use the Redis adapter instead. Each worker will connect to Redis and will be able to communicate with each other. You can also use the Redis adapter with sticky sessions, but you will need to use the Redis adapter on the primary process as well.

To answer your another question:

"if I remove sticky session and use only websocket, will the workers be able to communicate to each other?"

Yes, the workers will be able to communicate to each other. I don't think it's a good idea to use sticky sessions for a chat application. You should use a pub/sub system like Redis or NATS to communicate between the workers. For example, you can use Redis to publish a message to a channel and the other workers will receive the message and send it to the client.

When you use sticky sessions, each worker will be connected to a single client. So, if you have 4 workers, you will be able to serve 4 clients at the same time. If you use the cluster adapter, each worker will be connected to all the clients. So, if you have 4 workers, you will be able to serve 4 * 4 clients at the same time. So, you will be able to serve more clients with the cluster adapter.

Example of using the Redis adapter:

const { createAdapter } = require("socket.io-redis");
const io = require("socket.io")(httpServer, {
  cors: {
    origin: "*",
    methods: ["GET", "POST"],
  },
  transports: "websocket",
});

io.adapter(createAdapter("redis://localhost:6379"));

Example of using the NATS adapter:

const { createAdapter } = require("socket.io-nats");
const io = require("socket.io")(httpServer, {
  cors: {
    origin: "*",
    methods: ["GET", "POST"],
  },
  transports: "websocket",
});

io.adapter(createAdapter("nats://localhost:4222"));

Try both options and see which works best for you.

Could you please explain, why it's a bad idea to use cluster adapter with sticky sessions? https://socket.io/docs/v4/cluster-adapter/ I was following this documentation and it also uses sticky sessions. Also, I tried it to run without cluster as a regular single-threaded app and I got only 2800 stable connections. So, there might be some other problems in my code. — Sujith S Manjavana, Sep 27 '22 at 10:31
Could it be a linux related issue? As I'm still not getting more than 3000 connections even after removing the cluster module. — Sujith S Manjavana, Sep 27 '22 at 16:19
I don't think it's a Linux issue. You can try to use the Redis adapter and see if you can get more connections. You can also try to use the NATS adapter. — Tibic4, Sep 27 '22 at 16:40

Socket.io scaling issue

1 Answers1

Linked