Ticket #853 (defect)
Opened 10 months ago
Last modified 1 month ago
consistent time out of the WSGI server
Status: assigned
| Reported by: | lawouach | Assigned to: | fumanchu (accepted) |
|---|---|---|---|
| Priority: | high | Milestone: | 3.2 |
| Component: | wsgiserver | Keywords: | |
| Cc: | davidf@sjsoft.com, ged@openhex.org |
It appears that the fix for #810 has a collateral effect of consistent and rather unexpected 408 response due to time out. I wonder whether or not the fix is the wrong one.
Change History
09/21/08 16:33:38: Modified by stefan_betz@gmx.net
09/27/08 22:56:31: Modified by fumanchu
I can't see how the code before the fix would have done anything in the same situation except drop the conn, which hardly seems better. IMO the new code simply exposes flaws a little more. Perhaps we should increase the default socket_timeout? Apache waits 300 seconds by default (although that's during a request; its KeepAliveTimeout?, the time allowed between requests, is only 5 seconds).
10/03/08 07:24:09: Modified by lawouach
I can't see why either to be honest. That being said, I do not use CP behind any webserver and I had constant time out errors with httplib2. Commenting the lines in wsgiserver fixed the behavior.
11/24/08 12:17:50: Modified by davidf@sjsoft.com
- cc set to davidf@sjsoft.com.
We've had exactly this problem on multiple sites running Windows and CherryPy's wsgi server. Reverting r1971 effectively fixed this (it was consistently reproducible before). At the moment we're having to deploy our own patched build of CherryPy to avoid it...
11/26/08 07:51:03: Modified by lawouach
Robert, I've been through the code of the socket module [1] and it seems the timeout state is revealed by the polling sub-system (poll or select). However it isn't clear what can trigger such error in the underneath stack.
A few notes however:
- I don't recall any declared or suggested bugs of a time out nature prior to rev 1971.
- Line 135 of the wsgiserver indicates we wanted to silent the "timed out" error since (that line has been there for a long time).
Now to be fair I can understand the point of #810 but it appears the fix might not be the proper one and we need to revert it.
11/27/08 23:34:03: Modified by fumanchu
Agreed. AFAICT Apache doesn't ever emit 408 either; it appears to just drop the conn. Feel free to revert [1971] (but try to replace the test with the new behavior).
02/03/09 06:43:14: Modified by guest
I'm not sure whether it is the same problem, but on CP3.1 and IE6 on the client side, I *systematically* get a 408/409 error code (while there is absolutely no load on the server) *if* I try to submit a form without a value for all the input fields. There is no log on the server about the error. No problem for non "form" pages, no problem on firefox, nor with CP3.0.x. FWIW, here is the exact message I get:
Due to current high demand, the page you are looking for cannot be delivered right now.
HTTP Error 408 / 409 - Not acceptable / Resource conflict
02/03/09 06:44:06: Modified by guest
- cc changed from davidf@sjsoft.com to davidf@sjsoft.com, ged@openhex.org.
05/12/09 15:52:15: Modified by fumanchu
- status changed from new to assigned.
Fixed in the python3 branch in [2272]. Needs backport to trunk.
A "normal" persistent connection should be something like this:
Connect -> [request, response] -> [request, response] -> [request, response] -> close
The source of this ticket is that the wsgiserver will potentially raise a 408 at any one of those "->" points (actually any point until the request headers are read).
But imagine the server receives a request, sends a huge response, and calls readline() to wait for the next request. It's possible that the client might wait to close the conn until after it reads and parses that huge response. It's even possible that the server will write out the 408 while the client is reading the previous response, in which case the 408 output might be appended to the client's read. Bad news.
Instead, the server should not raise 408 between requests. Instead, it should only raise 408 if either 1) no request has been started at all, or 2) we're in the middle of a request; that is, some data has been received. And of course, even in (2) it should not respond with 408 if the current response has already been started.
05/27/09 16:02:56: Modified by fumanchu
Some questions asked of the http wg at http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0409.html


Works for me with Apache 2.2 on a Debian Etch Machine! Is this a Problem in the Keepalive Handling of the Python socket Module?
mfg Betz Stefan