Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL protocol is shutdown #325

Open
yuvadm opened this issue Jun 25, 2020 · 9 comments
Open

SSL protocol is shutdown #325

yuvadm opened this issue Jun 25, 2020 · 9 comments

Comments

@yuvadm
Copy link

yuvadm commented Jun 25, 2020

I'm running a celery worker which uses a RabbitMQ broker via an SSL connection. From time to time we're seeing an error in our logs

SSLError(1, '[SSL: PROTOCOL_IS_SHUTDOWN] protocol is shutdown (_ssl.c:2162)')

Which is coming from this line:

https://github.com/celery/py-amqp/blob/master/amqp/transport.py#L430

My best understanding is that amqp is attempting to write to a socket that has previously been closed, is it possible to know which side terminate the socket? Can there be any better error handling in this case? Or any possibility to recover from the error?

@thedrow
Copy link
Member

thedrow commented Jun 28, 2020

Can you please attach the RabbitMQ logs from the time when the exception happened?

@yuvadm
Copy link
Author

yuvadm commented Jun 28, 2020

Here are some RabbitMQ logs surrounding the exact time we got an SSLError:

2020-06-28 13:04:29.152 [warning] <0.37.186> closing AMQP connection <0.37.186> (10.154.0.40:44566 -> 10.54.3.7:5671, vhost: 'our-pod-name', user: 'our-pod-name'):
client unexpectedly closed TCP connection
2020-06-28 13:04:29.188 [warning] <0.742.186> closing AMQP connection <0.742.186> (10.154.0.42:44640 -> 10.54.3.7:5671, vhost: 'our-pod-name', user: 'our-pod-name'):
client unexpectedly closed TCP connection

####### SSLError came up on Celery logs at 13:04:45 #########

2020-06-28 13:05:32.387 [info] <0.10410.186> accepting AMQP connection <0.10410.186> (10.154.0.87:43848 -> 10.54.3.7:5671)
2020-06-28 13:05:32.407 [info] <0.10410.186> connection <0.10410.186> (10.154.0.87:43848 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.533 [info] <0.10423.186> accepting AMQP connection <0.10423.186> (10.154.0.87:43850 -> 10.54.3.7:5671)
2020-06-28 13:05:35.552 [info] <0.10423.186> connection <0.10423.186> (10.154.0.87:43850 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.660 [info] <0.10456.186> accepting AMQP connection <0.10456.186> (10.154.0.40:43852 -> 10.54.3.7:5671)
2020-06-28 13:05:35.680 [info] <0.10456.186> connection <0.10456.186> (10.154.0.40:43852 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.771 [info] <0.10465.186> accepting AMQP connection <0.10465.186> (10.154.0.56:58578 -> 10.54.3.7:5671)

@yuvadm
Copy link
Author

yuvadm commented Jun 28, 2020

Some more details on our setup, the Celery workers are connecting to Rabbit on a separate k8s cluster, with a NAT instance sitting in the middle.

Our current working theory is that the NAT instance is doing something to some of the connections, and Celery/Kombu/pyamqp seems to be angry about it.

We'll continue to debug the exact root cause, but curious if we can get the Celery stack to be more graceful about these dropped connections.

@thedrow
Copy link
Member

thedrow commented Jun 29, 2020

Since I don't really know what this error is and I can't find a description of it anywhere, I'm not sure how to help.

@thedrow
Copy link
Member

thedrow commented Jul 1, 2020

This kind of error is raised when the SSL wrapper tries to write to a closed TCP connection. It is usually related to connection problems, firewalls or SSL interception.

I asked on StackOverflow. That's the answer I got.

@yuvadm
Copy link
Author

yuvadm commented Jul 1, 2020

@thedrow Thanks for helping us debug this error!

We're still trying to debug where our connections are being dropped and fix the root cause.

Eventually it would be nice to fix this behavior to handle drops more gracefully, and I'd be happy to help propose a patch for that.

@Kristin0
Copy link

@yuvadm did you find something ?

@yuvadm
Copy link
Author

yuvadm commented Feb 13, 2022

@Kristin0 Didn't really continue working on this issue, sorry

@thedrow
Copy link
Member

thedrow commented Feb 17, 2022

@Kristin0 If you have this problem. Please share all the details as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants