#8903 defect closed duplicate (duplicate)

IRCClient fails to parse non-UTF8 messages under Python 3

Reported by: mark williams Owned by:
Priority: normal Milestone: Twisted 16.6
Component: core Keywords:
Cc: Branch:
Author:

Description (last modified by mark williams)

Words' IRC client fails internally under Python 3 (#6320) because it attempts to decode all incoming data as utf-8. While the protocol appears to be ASCII, RFC1459 does not specify an encoding for the contents of PRIVMSGs or channel topics. I ran a naive a survey of the 10 most popular networks, as found here:

http://irc.netsplit.de/networks/top100.php

And found that ~10% of PRIVMSGs and channel topics were in latin 1 (according to chardet, at least). The code for my survey is here:

https://github.com/markrwilliams/encodingcollector/

(Note that is incredibly impolite and will get you banned from most networks.)

Here's what the problem looks like in practice. The following script prints all the channels on a popular Italian IRC server:

from twisted.internet import defer, endpoints, task
from twisted.words.protocols import irc


class ListBot(irc.IRCClient):
    nickname = "nothing"

    def __init__(self, done):
        self._done = done

    def signedOn(self):
        self.sendLine("LIST")

    def irc_RPL_LIST(self, prefix, params):
        print(prefix, params)

    def irc_RPL_LISTEND(self, prefix, params):
        self.transport.loseConnection()
        self._done.callback(None)


@task.react
def main(reactor):
    transport = endpoints.clientFromString(reactor, 'tcp:irc.chlame.net:6667')
    end = defer.Deferred()
    start = endpoints.connectProtocol(transport, ListBot(end))
    start.addCallback(lambda _: end)
    return start

Under Python 2, it will print successfully and then terminate, while under Python 3, it will encounter the following exception and hang:

Unhandled Error
Traceback (most recent call last):
  File "/path/to/twisted/src/twisted/python/log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/path/to/twisted/src/twisted/python/log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/path/to/twisted/src/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/path/to/twisted/src/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/path/to/twisted/src/twisted/internet/posixbase.py", line 597, in _doReadOrWrite
    why = selectable.doRead()
  File "/path/to/twisted/src/twisted/internet/tcp.py", line 208, in doRead
    return self._dataReceived(data)
  File "/path/to/twisted/src/twisted/internet/tcp.py", line 214, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/path/to/twisted/src/twisted/internet/endpoints.py", line 116, in dataReceived
    return self._wrappedProtocol.dataReceived(data)
  File "/path/to/twisted/src/twisted/words/protocols/irc.py", line 2631, in dataReceived
    basic.LineReceiver.dataReceived(self, data)
  File "/path/to/twisted/src/twisted/protocols/basic.py", line 571, in dataReceived
    why = self.lineReceived(line)
  File "/path/to/twisted/src/twisted/words/protocols/irc.py", line 2637, in lineReceived
    line = line.decode("utf-8")
builtins.UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 51: invalid continuation byte

A user cannot register an errback to handle this because the exception occurs as a result of a read event. They also can't write code that doesn't encounter this exception without monkeypatching Twisted.

I believe this is a serious regression and the IRC port deserves more attention.

Change History (2)

comment:1 Changed 13 months ago by mark williams

Description: modified (diff)

comment:2 Changed 13 months ago by mark williams

Resolution: duplicate
Status: newclosed

#6320 is reopened. Closing this.

Note: See TracTickets for help on using tickets.