Ticket #691 (defect)
Opened 1 year ago
Last modified 1 year ago
Repeated Ctrl-C hangs wsgiserver
Status: closed (fixed)
| Reported by: | jlowery | Assigned to: | fumanchu |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.0 |
| Component: | CherryPy code | Keywords: | |
| Cc: |
Ever since I have gotten cherrypy3, I've had problems with the cherrypy process hanging in the shell after a CTRL-C is sent to process. My only resort would be to send a SIGTERM or SIGKILL.
I was looking around in the code and came up with this:
When CTRL-C is sent to to the cherrypy process, each connection thread with an active connection waits until until a request does not return a body or close_connection on the request is set to True (wsgiserver/__init__.py:611). The forever loop here implements HTTP/1.1 and Keep-Alive. with the close_connection attribute on the HTTPRequest.
Therefore, there is a noticable time lag between when a KeyboardInterrupt? is sent and all of the worker threads are able to process their SHUTDOWN request (because conn.communicate is blocking) wsgiserver/__init__.py:693.
While the loop is spinning on the Connection's communicate method, sending another CTRL-C to the process causes the thread to abort and the callstack bubble up out of the thread join. At this point, the only recourse is to manually SIGTERM.
It would appear that since the shutdown requests are put in a worker queue, there currently isn't any way to give the HTTPConnection a flag to say stop processing requests.
I suppose that the best thing to do would allow the KeyBoardInterrupts? to not lock the process during the shutdown process.
A fix I came up with is adding KeyboardInterrupt? to the thread join loop in wsgiserver/__init__.py near line 986
Go from
while self._workerThreads: worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: worker.join() except AssertionError: pass
to
while self._workerThreads: worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: worker.join() except (AssertionError, KeyboardInterrupt): pass
This at least allows the process to shutdown gracefully after multiple CTRL-C's.
To reproduce this situation (test system Ubuntu Feisty):
test file:
#broke.py import cherrypy class Root(object): @cherrypy.expose def index(self): return 'hi' cherrypy.quickstart(Root()) $python broke.py
<open localhost:8080 in web browser to activate a connection> tap CTRL-C twice.
However, I wish that I could figure out how to interrupt the connection loop when a shutdown is occuring, since Keep-Alive could keep the server in the shutdown state for an arbitrarily long time while multiple requests from the same client.
Attachments
Change History
06/06/07 13:47:49: Modified by fumanchu
- status changed from new to assigned.
06/06/07 13:52:17: Modified by jlowery
Well, I tested this out. .close() doesn't work, but shutdown() does. I'm still learning the threading internals of cherrypy, so my solution here may have thread-related side-effects. I also think this fix is kind of kludgy. Maybe you know a better way to fit it into the design.
All of these changes are in wsgiserver/__init__.py
Modify the WorkerThread? class thusly (minus my debug print lines of course):
class WorkerThread(threading.Thread): """Thread which continuously polls a Queue for Connection objects. server: the HTTP Server which spawned this thread, and which owns the Queue and is placing active connections into it. ready: a simple flag for the calling server to know when this thread has begun polling the Queue. Due to the timing issues of polling a Queue, a WorkerThread does not check its own 'ready' flag after it has started. To stop the thread, it is necessary to stick a _SHUTDOWNREQUEST object onto the Queue (one for each running WorkerThread). """ conn = False def __init__(self, server): self.ready = False self.server = server threading.Thread.__init__(self) def run(self): try: self.ready = True while True: conn = self.conn = self.server.requests.get() #print 'handling request on thread %s with conn %s' % (self, conn) if conn is _SHUTDOWNREQUEST: return try: #print 'thread %s communicating' % self conn.communicate() #print 'thread %s end communicate' % self finally: conn.close() except (KeyboardInterrupt, SystemExit), exc: self.server.interrupt = exc
In the CherrypyWSGIServer:stop method, change:
# Must shut down threads here so the code that calls # this method can know when all threads are stopped. for worker in self._workerThreads: self.requests.put(_SHUTDOWNREQUEST)
to
# Must shut down threads here so the code that calls # this method can know when all threads are stopped. for worker in self._workerThreads: self.requests.put(_SHUTDOWNREQUEST) if worker.conn and not worker.conn.rfile.closed: worker.conn.rfile._sock.shutdown(_socket.SHUT_RDWR)
also need to import _socket in the top imports.
One side effect of this is that it is possible for the socket calls under the communicate() method to throw a bad file descriptor exception. I was able to generate one, but it did not hang the process and everything shut down properly. (Just got an exception occured in WorkerThread? message).
In effect, this kills the worker thread dead in its tracks.
Hitting CTRL-C stops the process instantly, although this should probably be tested on win32 as winsock can be a PITA sometimes.
06/06/07 13:52:57: Modified by fumanchu
I can test on win32...I'll try to fold this into test_states so we can test on multiple platforms more easily, and get the regression benefits.
We should give the deployer a "server.shutdown_timeout" config entry--setting it to zero makes SIGTERM or Ctrl-C stop the process instantly, setting it to a positive float gives existing connections N seconds to complete before being forcibly shut down. That should allow a reasonable way to shutdown gracefully--if you need immediate forced shutdown, there's always SIGKILL.
06/06/07 13:53:31: Modified by jlowery
Makes sense. For the timeout mechanism. Do you think the best way would be to write a Shutdown thread that is launched in stop() that time.sleep() before killing all of the sockets?
06/06/07 13:56:09: Modified by fumanchu
It looks like we can just call worker.join(timeout=self.shutdown_timeout):
"When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call isAlive() to decide whether a timeout happened."
current = threading.currentThread() while self._workerThreads: worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: worker.join(self.shutdown_timeout) if worker.isAlive: # We exhausted the timeout. Forcibly shut down the socket. c = worker.conn if c and not c.rfile.closed: c.rfile._sock.shutdown(_socket.SHUT_RDWR) # except (AssertionError, KeyboardInterrupt): except AssertionError: pass
06/06/07 14:27:38: Modified by Jeremy Lowery
Ok. I gave your timeout to join solution a try and it works well. Suppose can just disregard the shutdown thread solution.
06/06/07 14:33:36: Modified by Jeremy Lowery
Just tested and on Ubuntu if you don't have the KeyboardInterrupt? except, the process still hangs on a non-zero shutdown_time with multiple CTRL-C's.
06/06/07 14:58:59: Modified by lawouach
Would that work?
current = threading.currentThread() while self._workerThreads: self.requests.put(_SHUTDOWNREQUEST) worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: worker.join() except AssertionError: pass
Basically the idea would be to force the normal processing of shutting down of each worker thread.
As in: http://www.cherrypy.org/browser/trunk/cherrypy/wsgiserver/__init__.py#L687
06/06/07 15:11:56: Modified by fumanchu
The _SHUTDOWNREQUEST's are done first so that idle threads don't pick up new requests while other threads are being shut down. Moving those queue.put calls would make shutdown take a lot longer under heavy load.
06/06/07 15:16:57: Modified by fumanchu
> if you don't have the !KeyboardInterrupt except, > the process still hangs on a non-zero shutdown_time > with multiple CTRL-C's.
Right; we need both. Can you produce a unified diff of all the changes so far, and attach it to this ticket? I'm still working on the test case.
06/06/07 15:20:59: Modified by Jeremy Lowery
I believe that the _sock.shutdown call would be needed to force an abort on a workerThread that is currently handling a client.
A softer technique that could be used would be to modify the HTTPRequest and change close_connection to a property that always returns true when the server is in shutdown state (probably by setting a global flag or threading the state through the worker).
This way, the communicate loop will finish up the last HTTP request that it was served before it closes the connection.
The only exception here would be for long lived HTTP requests. I believe streaming is an example of where this wouldn't would.
06/06/07 15:33:08: Modified by Jeremy Lowery
Here is a svn diff from the head. I don't see anywhere in the interface for me to attach a file:
Index: cherrypy/wsgiserver/init.py =================================================================== --- cherrypy/wsgiserver/init.py (revision 1658) +++ cherrypy/wsgiserver/init.py (working copy) @@ -47,6 +47,7 @@
quoted_slash = re.compile("(?i)%2F") import rfc822 import socket +import _socket try:
import cStringIO as StringIO
except ImportError?: @@ -673,7 +674,8 @@
it is necessary to stick a _SHUTDOWNREQUEST object onto the Queue (one for each running WorkerThread?). """ - + conn = None + def init(self, server):
self.ready = False self.server = server @@ -683,7 +685,7 @@ try:
self.ready = True while True: - conn = self.server.requests.get() + conn = self.conn = self.server.requests.get()
if conn is _SHUTDOWNREQUEST:
return @@ -766,7 +768,7 @@
ssl_private_key = None def init(self, bind_addr, wsgi_app, numthreads=10, server_name=None, - max=-1, request_queue_size=5, timeout=10): + max=-1, request_queue_size=5, timeout=10, shutdown_timeout=3):
self.requests = Queue.Queue(max) if callable(wsgi_app): @@ -790,6 +792,7 @@ self._workerThreads = [] self.timeout = timeout + self.shutdown_timeout = shutdown_timeout
def start(self):
"""Run the server forever.""" @@ -968,12 +971,20 @@ # Don't join currentThread (when stop is called inside a request). current = threading.currentThread() + + while self._workerThreads:
worker = self._workerThreads.pop() if worker is not current and worker.isAlive:
try: - worker.join() - except AssertionError?: + worker.join(self.shutdown_timeout) + if worker.isAlive: + # We exhausted the timeout. + # Forcibly shut down the socket. + c = worker.conn + if c and not c.rfile.closed: + c.rfile._sock.shutdown(_socket.SHUT_RDWR) + except (AssertionError?, KeyboardInterrupt?):
pass
def populate_ssl_environ(self):
06/06/07 15:34:41: Modified by Jeremy Lowery
Ok, that didn't work well. get it here.
06/06/07 15:46:21: Modified by Jeremy Lowery
Here is a diff to allow configuration of the parameter:
06/06/07 15:47:16: Modified by fumanchu
- attachment multictrlc.patch added.
06/06/07 17:33:16: Modified by fumanchu
- status changed from assigned to closed.
- resolution set to fixed.
Fixed in [1660]. I had to add some special logic for SSL. I also added a join() even after the join(timeout), because the spec says stop MUST block until all threads are shut down. I also allowed a shutdown_timeout of None to mean "wait forever; don't shutdown sockets".


Excellent work! I've been aware of this for some time and just haven't had the time to really nail it down. Thanks for the patch!
Re: "interrupt the connection loop"...I think you're talking about the "while" loop inside HTTPConnection.communicate? That usually waits on rfile.readline() inside parse_request, and *that* is waiting on socket._fileobject._sock.recv(). I wonder what happens if we call rfile._sock.close() in the main thread while another thread is waiting on recv()?