openstack: rabbitmq timeouts causing services to be slow/unresponsive/buggy
RabbitMQ was found to be the cause of the original problem, thus the updated title/description.
Timeline of events (all times CEST):
- 2021-09-19 17:38: #20 (closed) happened
- 2021-09-19 ~18:30-20:00: OpenStack is upgraded as part of #13 (closed) and !19 (merged) as it was an opportunity to do it as services were down anyway and there was no pressure to bring them back up
- 2021-09-20 - 2021-09-21: not much work is done on OpenStack and thus we didn't see anything wrong
- 2021-09-22: we start seeing errors on VM creating and created this issue
- 2021-09-23 01:34: volumes are not being properly detached/re-attached to instances when managed by Cinder CSI. Upgrading Cinder CSI doesn't help. Some logs report timeouts with RabbitMQ. Deleting and re-creating the RabbitMQ cluster doesn't help. Re-creating volumes fixes the issue and we go to sleep.
- 2021-09-23 ~18:00:
terraform apply
inovercloud/terraform/
results in Designate erroring with504 Gateway Time-out
. Again, logs show timeouts with RabbitMQ. - 2021-09-24 ~22:00 - ~01:40: Ceph is upgraded to 40Gbit/s (4x10Gbit/s), which shouldn't help as monitoring wasn't showing network pressure, and as expected, it didn't. (This maintenance was scheduled to be done before next week anyway)
- 2021-09-24 03:05: after some testing, Designate still errors with
Timed out waiting for a reply to message ID [...]
(full logs below) - 2021-09-24 ~03:14: RabbitMQ is deployed again with
rabbitmq_enable_healthchecks: "no"
and Designate is restarted. No more errors. - 2021-09-24 ~03:15: all services are being restarted
- 2021-09:24 ~04:00: the all clear is given.
Logs from services (in this case designate, but we were seing errors coming from a bunch of places):
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware Traceback (most recent call last):
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self._queues[msg_id].get(block=True, timeout=timeout)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line 322, in get
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return waiter.wait()
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/queue.py", line 141, in wait
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return get_hub().switch()
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.greenlet.switch()
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware _queue.Empty
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware During handling of the above exception, another exception occurred:
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware Traceback (most recent call last):
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/api/middleware.py", line 238, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return request.get_response(self.application)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1313, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware status, headers, app_iter = self.call_application(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1278, in call_application
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = application(self.environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 129, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware resp = self.call_func(req, *args, **kw)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 193, in call_func
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.func(req, *args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/api/middleware.py", line 320, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return request.get_response(self.application)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1313, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware status, headers, app_iter = self.call_application(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1278, in call_application
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = application(self.environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 129, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware resp = self.call_func(req, *args, **kw)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 193, in call_func
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.func(req, *args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystonemiddleware/auth_token/__init__.py", line 341, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware response = req.get_response(self._app)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1313, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware status, headers, app_iter = self.call_application(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1278, in call_application
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = application(self.environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 129, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware resp = self.call_func(req, *args, **kw)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 193, in call_func
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.func(req, *args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_middleware/base.py", line 124, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware response = req.get_response(self.application)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1313, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware status, headers, app_iter = self.call_application(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1278, in call_application
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = application(self.environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 129, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware resp = self.call_func(req, *args, **kw)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 193, in call_func
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.func(req, *args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_middleware/base.py", line 124, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware response = req.get_response(self.application)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1313, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware status, headers, app_iter = self.call_application(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1278, in call_application
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = application(self.environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 129, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware resp = self.call_func(req, *args, **kw)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/dec.py", line 193, in call_func
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.func(req, *args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/api/middleware.py", line 226, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return request.get_response(self.application)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1313, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware status, headers, app_iter = self.call_application(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/webob/request.py", line 1278, in call_application
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = application(self.environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/pecan/middleware/recursive.py", line 56, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.application(environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/pecan/middleware/errordocument.py", line 75, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware app_iter = self.app(environ, replacement_start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/pecan/core.py", line 840, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return super(Pecan, self).__call__(environ, start_response)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/pecan/core.py", line 683, in __call__
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware self.invoke_controller(controller, args, kwargs, state)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/pecan/core.py", line 574, in invoke_controller
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware result = controller(*args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/utils.py", line 216, in wrapper
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return f(*args, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/api/v2/controllers/zones/recordsets.py", line 52, in get_all
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware recordsets = common.retrieve_matched_rrsets(context, self, zone_id,
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/api/v2/controllers/common.py", line 23, in retrieve_matched_rrsets
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware controller_obj.central_api.get_zone(context, zone_id)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/designate/central/rpcapi.py", line 149, in get_zone
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.client.call(context, 'get_zone', zone_id=zone_id)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 509, in call
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self.prepare().call(ctxt, method, **kwargs)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 175, in call
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware self.transport._send(self.target, msg_ctxt, msg,
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self._driver.send(target, ctxt, message,
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware return self._send(target, ctxt, message, wait_for_reply, timeout,
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware result = self._waiter.wait(msg_id, timeout,
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware message = self.waiters.get(msg_id, timeout=timeout)
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware raise oslo_messaging.MessagingTimeout(
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID de8afcd630ea4c17a4061b183d797d83
2021-09-24 01:05:18.016 20 ERROR designate.api.middleware
2021-09-24 01:05:18.019 20 INFO eventlet.wsgi.server [req-8845c9f9-c988-4bbd-b071-166b5b6f07b0 b867d59eb3f34e2eab684b74e2ba7e87 28d6433dd9db4e7298c84b1058ea825a - - -] 192.168.210.230,192.168.210.105 "GET /v2/zones/d2c5ce5e-d24c-49fe-8665-0b70f24b26ed/recordsets HTTP/1.1" status: 504 len: 277 time: 60.0090218
2021-09-24 01:05:18.927 20 ERROR designate.api.middleware [req-8845c9f9-c988-4bbd-b071-166b5b6f07b0 b867d59eb3f34e2eab684b74e2ba7e87 28d6433dd9db4e7298c84b1058ea825a - - -] Timed out waiting for a reply to message ID a6b6e0a05c0243e0aa434d792b370240: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a6b6e0a05c0243e0aa434d792b370240
Root cause
!19 (merged) didn't introduce much changes, so we thought we would get away with it. Plus, there were some nice improvements, such as additional docker healthchecks for a bunch of services. In particular, there was this commit, which introduced the bug we were hitting. Disabling docker healthchecks for RabbitMQ resolved the issue. No further investigation has been made and an issue will be opened upstream.
Original description
Creating a VM ends up in the VM stuck in Scheduling
state. Deleting it when in that state works, but re-creating it gives the same result.
Deleting a VM sometimes works, sometimes it stays stuck as Deleting
.
There is nothing obvious in Nova logs.