I have built a WebSockets server that acts as a chat message router (i.e. receiving messages from clients and pushing them to other clients according to a client ID).
It is a requirement that the service be able to scale to handle many millions of concurrent open socket connections, and I wish to be able to horizontally scale the server.
The architecture I have had in mind is to put the websocket server nodes behind a load balancer, which will create a problem because clients connected to different nodes won't know about each other. While both clients A and B enter via the LoadBalancer, client A might have an open connection with node 1 while client B is connected to node 2 - each node holds it's own dictionary of open socket connections.
To solve this problem, I was thinking of using some MQ system like ZeroMQ or RabbitMQ. All of the websocket server nodes will be subscribers of the MQ server, and when a node gets a request to route a message to a client which is not in the local connections dictionary, it will pub-lish a message to the MQ server, which will tell all the sub-scriber nodes to look for this client and issue the message if it's connected to that node.
Q1: Does this architecture make sense?
Q2: Is the pub-sub pattern described here really what I am looking for?