Download Install Tutorial Docs FAQ Tools WikiLicense Team IRC Planet Involvement Shop Book

Ticket #511 (enhancement)

Opened 2 years ago

Last modified 2 years ago

CherryPy has problems handling 8-bit ascii

Status: closed (wontfix)

Reported by: jan@bitmine.se Assigned to: fumanchu
Priority: normal Milestone:
Component: CherryPy code Keywords: latin-1 8-bit ascii
Cc:

It might be just me being unfamiliar with CherryPy, but couldn't the line

chunk = str(chunk)

in _cpwsgi.py:96 instead be written like

chunk = chunk.encode("latin-1") ?

This would allow people to use 8-bit ascii without resorting to Unicode for those languages where 8 bits normally are sufficient for handling of every character?

As it stands today it raises the following exception if I try to return characters in our native Swedish character set and I really don't see the point of switching to unicode for just covering this case.

Traceback (most recent call last): File "c:\Python24\lib\site-packages\cherrypy\_cpwsgi.py", line 96, in wsgiApp chunk = str(chunk) UnicodeEncodeError?: 'ascii' codec can't encode character u'\xe5' in position 997 : ordinal not in range(128)

Change History

04/14/06 20:55:21: Modified by fumanchu

  • owner changed from rdelon to fumanchu.
  • status changed from new to assigned.

We should probably say "ISO-8859-1" to avoid confusing Windows users ;) I agree that that is a saner default than ASCII (and is required by the HTTP spec). It can't quite be as simple as "chunk.encode()", since "chunk" may already be a str object, encoded in some other charset. So it would need a little isinstance() logic.

04/16/06 15:24:18: Modified by jan@bitmine.se

Very good. I'm glad you recognize this as a problem.

05/02/06 18:34:27: Modified by fumanchu

  • status changed from assigned to closed.
  • resolution set to wontfix.

Well, I've had another look and I can't figure out any way to do this if the chunks are already of type 'str'. CP needs to be able to emit strings with arbitrary encodings; it's up to each developer to choose the correct encoding and set the correct "charset" value in their "Content-Type" header. If the chunks are unicode, then we can default to "ISO-8859-1" as the spec requires. That fix is in [1090].

12/09/06 12:58:20: Modified by fumanchu

2.x Fix in [1483].

Hosted by WebFaction

Log in as guest/cpguest to create tickets