Ticket #511 (enhancement)
Opened 2 years ago
Last modified 2 years ago
CherryPy has problems handling 8-bit ascii
Status: closed (wontfix)
| Reported by: | jan@bitmine.se | Assigned to: | fumanchu |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | CherryPy code | Keywords: | latin-1 8-bit ascii |
| Cc: |
It might be just me being unfamiliar with CherryPy, but couldn't the line
chunk = str(chunk)
in _cpwsgi.py:96 instead be written like
chunk = chunk.encode("latin-1") ?
This would allow people to use 8-bit ascii without resorting to Unicode for those languages where 8 bits normally are sufficient for handling of every character?
As it stands today it raises the following exception if I try to return characters in our native Swedish character set and I really don't see the point of switching to unicode for just covering this case.
Traceback (most recent call last): File "c:\Python24\lib\site-packages\cherrypy\_cpwsgi.py", line 96, in wsgiApp chunk = str(chunk) UnicodeEncodeError?: 'ascii' codec can't encode character u'\xe5' in position 997 : ordinal not in range(128)
Change History
04/14/06 20:55:21: Modified by fumanchu
- owner changed from rdelon to fumanchu.
- status changed from new to assigned.
04/16/06 15:24:18: Modified by jan@bitmine.se
Very good. I'm glad you recognize this as a problem.
05/02/06 18:34:27: Modified by fumanchu
- status changed from assigned to closed.
- resolution set to wontfix.
Well, I've had another look and I can't figure out any way to do this if the chunks are already of type 'str'. CP needs to be able to emit strings with arbitrary encodings; it's up to each developer to choose the correct encoding and set the correct "charset" value in their "Content-Type" header. If the chunks are unicode, then we can default to "ISO-8859-1" as the spec requires. That fix is in [1090].
12/09/06 12:58:20: Modified by fumanchu
2.x Fix in [1483].


We should probably say "ISO-8859-1" to avoid confusing Windows users ;) I agree that that is a saner default than ASCII (and is required by the HTTP spec). It can't quite be as simple as "chunk.encode()", since "chunk" may already be a str object, encoded in some other charset. So it would need a little isinstance() logic.