I have AWS API gateway setup for a public endpoint with no auth. It connects to a websocket that triggers a Lambda.
I was creating connections with Python's websocket-client lib at https://pypi.org/project/websocket_client/.
I noticed that connections would fail ~10% of the time, and get worse as I increased load. I can't find anywhere that would be throttling me seeing as my general API Gateway settings say Your current account level throttling rate is 10000 requests per second with a burst of 5000 requests.. That’s beside the point that just 2-3 requests per second would trigger issue fairly often.
Meanwhile the failure response would be like {u'message': u'Forbidden', u'connectionId': u'Z2Jp-dR5vHcCJkg=', u'requestId': u'Z2JqAEJRvHcFzvg='}
I went into my CloudWatch log insights and searched for the connection ID and request ID. The log group for the API gateway would find no results with either ID. Yet a search on my Lambda that fires on websocket connect, would have a log with that connection ID. The log showed everything running as expected on our side. The lambda simply runs a MySQL query that fires.
Why would I get a response of forbidden, despite the lambda working as expected?
The existing question over at getting message: forbidden reply from AWS API gateway, seems to address if it's ALWAYS returning forbidden for some private endpoints. Nothing lined up with my use case.
UPDATE
I think this may be related to locust.io, or python, which I'm using to connect every second. I installed https://www.npmjs.com/package/wscat on my machine and am connecting and closing as fast as possible repeatedly. I am not getting a Forbidden message. It's just extra confusing since I'm not sure how the way I connect would randomly spit back a Forbidden message some of the time.
class SocketClient(object):
def __init__(self, host):
self.host = host
self.session_id = uuid4().hex
def connect(self):
self.ws = websocket.WebSocket()
self.ws.settimeout(10)
self.ws.connect(self.host)
events.quitting += self.on_close
data = self.attach_session({})
return data
def attach_session(self, payload):
message_id = uuid4().hex
start_time = time.time()
e = None
try:
print("Sending payload {}".format(payload))
data = self.send_with_response(payload)
assert data['mykey']
except AssertionError as exp:
e = exp
except Exception as exp:
e = exp
self.ws.close()
self.connect()
elapsed = int((time.time() - start_time) * 1000)
if e:
events.request_failure.fire(request_type='sockjs', name='send',
response_time=elapsed, exception=e)
else:
events.request_success.fire(request_type='sockjs', name='send',
response_time=elapsed,
response_length=0)
return data
def send_with_response(self, payload):
json_data = json.dumps(payload)
g = gevent.spawn(self.ws.send, json_data)
g.get(block=True, timeout=2)
g = gevent.spawn(self.ws.recv)
result = g.get(block=True, timeout=10)
json_data = json.loads(result)
return json_data
def on_close(self):
self.ws.close()
class ActionsTaskSet(TaskSet):
@task
def streams(self):
response = self.client.connect()
logger.info("Connect Response: {}".format(response))
class WSUser(Locust):
task_set = ActionsTaskSet
min_wait = 1000
max_wait = 3000
def __init__(self, *args, **kwargs):
super(WSUser, self).__init__(*args, **kwargs)
self.client = SocketClient('wss://mydomain.amazonaws.com/endpoint')
Update 2
I have enabled access logs, the one type of log that wasn't there before. I can now see that my lambdas are always getting a 200 with no issue. The 403 is coming from some MESSAGE eventType that doesn't hit an actual routeKey. Not sure where it comes from, but pretty sure finding that answer will solve this.
I was also able to confirm there are no ENI issues.

