Opened 22 months ago

Last modified 22 months ago

#6083 defect new

TLS writeSequence() behaves differently from the vanilla TCP one

Reported by: hynek Owned by:
Priority: lowest Milestone:
Component: core Keywords:
Cc: Branch:
Author: Launchpad Bug:

Description (last modified by exarkun)

I'm not 100% sure which part here you consider a bug, but it definitely is rather unfortunate how it's right now:

If transport.writeSequence() is called with a generator on a vanilla TCP transport, it silently does nothing. The reason is that the iterable is checked for unicode strings first in a for-loop. When it tries to reuse the iterable for actually sending data, it’s exhausted so nothing is written: http://twistedmatrix.com/trac/browser/tags/releases/twisted-12.2.0/twisted/internet/abstract.py#L340

OTOH the TLS version doesn't check for unicode but does a simple ''.join(iovec) instead, therefor it works perfectly: http://twistedmatrix.com/trac/browser/tags/releases/twisted-12.2.0/twisted/protocols/tls.py#L532

The situation means that if you write your code for TLS and start testing for vanilla, your protocols freeze and you have no idea why.

My preferred solution would be to teach writeSequence() generators (if generator: iovec = list(iovec)) but I suppose that would be too invasive. So a check for iterators together with an exception would be fine.

Also please note, that TLS doesn't check for unicode, so that's a bug on its self.

Opinions?

Change History (17)

comment:1 Changed 22 months ago by exarkun

  • Milestone Twisted-12.3 deleted

Release milestones only for regressions.

comment:2 Changed 22 months ago by exarkun

Note the documentation for writeSequence says that it accepts lists, not iterators.

comment:3 Changed 22 months ago by hynek

Note that the summary of the bug is “TLS writeSequence() behaves differently from the vanilla TCP one” and not “writeSequence() doesn’t support generators”.

While I would it prefer to do so, I really care just about them behaving the same and not one silently failing while the other works fine.

comment:4 Changed 22 months ago by exarkun

Note that the summary of the bug is “TLS writeSequence() behaves differently from the vanilla TCP one” and not “writeSequence() doesn’t support generators”.

Yes, I did read the ticket. Then I added some relevant information to it.

While I would it prefer to do so, I really care just about them behaving the same and not one silently failing while the other works fine.

You'll probably find lots of places where different implementations of a particular interface have different behavior particularly when considering mis-uses of those interfaces.

I did not and am not saying there is nothing that should change here, but if you expect consistency in general for incorrect uses of interfaces, you may be in for disappointment.

comment:5 Changed 22 months ago by hynek

Dear JP,

I appreciate your concern for the adequacy of my world view. Rest assured, that I’m long enough in software to know that it usually sucks. But not long enough to become cynical and not wanting to do something about it. :)

As you might have guessed by now, I’m happy to help fixing it (for as many writeSequences as possible) given you provide enlightened guidance which approach you favor.

Cheerio,
Hynek

comment:6 Changed 22 months ago by exarkun

  • Description modified (diff)

comment:7 Changed 22 months ago by exarkun

Please leave your attitude at home.

My preferred resolutions to this ticket, in order of decreasing preference:

  1. Don't worry about it. Use interfaces in the documented way.
  2. Provide a helper for defining writeSequence implementations inefficiently (in other words, like the current implementations are defined) in terms of write. Use it in the various places Twisted currently has redundant writeSequence implementations.
  3. Add type checks requiring list arguments to the existing writeSequence implementations. Perhaps with some code to convert anything else iterable to a list and emit a deprecation warning.

Thanks.

comment:8 Changed 22 months ago by glyph

Hynek,

It's worth noting that the whole point of writeSequence is to have a fast path to call writev or sendmsg. Therefore supporting arbitrary iterables isn't as much of an optimization: if we're going to dynamically construct an arbitrary-length buffer in Python by calling next an arbitrary number of times, then there's little appreciable advantage over the caller doing for y in z: x.write(y). (The "".join implementation is just a stopgap to make the interface work, not a good implementation.)

That said, comprehensible error messages are always nice, so feel free to submit a patch.

comment:9 follow-up: Changed 22 months ago by hynek

JP,

what you mistook for attitude was my try to use humor to cope with your unprovoked unfriendliness & belittling. Don’t worry, won’t happen again, I’ll choose my battles more wisely.


Glyph,

given JP’s order of preference, I’d rather not too.

What has got a bit lost in this discussion though is the fact, that TLS’s writeSequence() doesn’t check for unicode strings like the vanilla one does. This could be seen a bug in itself and fixing it would be the simplest viable solution to the whole dilemma of two common protocols behaving differently in subtle ways.

Also, the docs for writeSequence say:


Currently, this is a convenience method roughly equivalent to:

for chunk in iovec:
       fd.write(chunk)

Which is rather misleading given the real behavior. Yes I know it says “roughly”, but we’re not in a court room, I see no harm in being nice to users. (it doesn’t mention lists at all in the case of t.i.a.FileDescriptor btw)

comment:10 in reply to: ↑ 9 ; follow-up: Changed 22 months ago by glyph

Replying to hynek:

JP,

what you mistook for attitude was my try to use humor to cope with your unprovoked unfriendliness & belittling. Don’t worry, won’t happen again, I’ll choose my battles more wisely.

I don't see JP being unfriendly or belittling; I see him pointing out a specific technical point (the documentation and the declared interface indicate a specific type, you're filing a bug on what happens with a different type), and you responding with bizarre commentary about him denigrating the "adequacy of your worldview", which did not help the discussion. The comment about your attitude was perhaps also unhelpful but I think it's fairly clear that you got the personal dimension of this discussion started - exarkun's contribution was just a direct factual comment which you appear to have read something more into.

So, please cool it with the personal comments, Hynek, we're all just trying to help.

Glyph,

given JP’s order of preference, I’d rather not too.

Rather not what?

What has got a bit lost in this discussion though is the fact, that TLS’s writeSequence() doesn’t check for unicode strings like the vanilla one does. This could be seen a bug in itself and fixing it would be the simplest viable solution to the whole dilemma of two common protocols behaving differently in subtle ways.

I don't think that this has been "lost" at all - that appears to be the topic of the bug, and much of the discussion has centered around it.

Also, the docs for writeSequence say: 'Currently, this is a convenience method roughly equivalent to: "for chunk in iovec: fd.write(chunk)"'

Which is rather misleading given the real behavior. Yes I know it says “roughly”, but we’re not in a court room, I see no harm in being nice to users.

I agree that it is a little misleading, and perhaps should be more heavily qualified (passing the right types, etc).

(it doesn’t mention lists at all in the case of t.i.a.FileDescriptor btw)

By "in the case of" I assume you mean in the docstring for twisted.internet.abstract.FileDescriptor as opposed to the docstring for twisted.internet.interfaces.ITransport.writeSequence - in this case I think there's a distressing amount of overlap, since abstract.FileDescriptor.writeSequence really ought to be little more than a reference to the canonical source of truth; a Implementation of L{ITransport.writeSequence} or somesuch.

comment:11 Changed 22 months ago by glyph

There are a number of problems which have been pointed out so far:

  1. The behavior of writeSequence varies by implementation when various implementations are presented with invalid input; invalid input in this case includes both non-list types and lists which include unicode.
  2. Some of the variants of behavior do not present useful debugging information to determine why they are behaving strangely; they appear to silently drop data.
  3. Some of the documentation for this method - specifically, the docstring for FileDescriptor.writeSequence, implies that you can pass an iterable. The language describing the interface is insufficiently formal to counteract these informal implications. For example, ITransport.writeSequence lacks a @param declaration describing its parameter, and it doesn't say L{list}, it just says "list", which could be taken to mean the english word and not necessarily the python data type.

As to the first issue, which is the closest to the current summary of the ticket, I don't think we should necessarily do anything. Different behavior is fine for different implementations.

The second and third issues may be worth fixing though, and as part of them, unifying the error-checking behavior across the 30(!) implementations of writeSequence within Twisted may be a worthwhile task, to ensure that they all provide consistent and useful error messages. Undefined behavior is crummy, especially when a user might have certain expectations (it's certainly conventional for many APIs which take lists to also take generators and work fine with both).

However, there are also reasons not to do anything, or at least, not to do anything right now.

  1. writeSequence was designed as an API to allow for a particular type of optimization. We don't have any benchmarks to indicate that optimization is worthwhile, so the API is languishing somewhat. Perhaps we should focus on actually making it do something useful before doing a bunch of maintenance work.
  2. Since the only use of writeSequence is for performance (otherwise you'd just call write yourself), formally requiring that implementations must do a specific kind of type-checking on their arguments seems counterproductive. Now, perhaps if some ctypes or cffi or C code needs to copy the list into a struct msghdr, it can do type checking along with that, but as much as possible it should not be hamstrung in the fast path by needing to a bunch of checks for invalid input.

So for the time being it seems to me like the most pressing and practical issue would be to improve the documentation rather than to start changing the implementation(s) around. If it is super, crystal clear that you absolutely must always pass a L{list} of L{bytes}es, then it will be a bit less likely that people will bump into that problem, and the documentation should be fixed regardless of whatever other error-behavior improvements are done.

comment:12 in reply to: ↑ 10 Changed 22 months ago by hynek

Replying to glyph:

what you mistook for attitude was my try to use humor to cope with your unprovoked unfriendliness & belittling. Don’t worry, won’t happen again, I’ll choose my battles more wisely.

I don't see JP being unfriendly or belittling; I see him pointing out a specific technical point (the documentation and the declared interface indicate a specific type, you're filing a bug on what happens with a different type), and you responding with bizarre commentary about him denigrating the "adequacy of your worldview", which did not help the discussion. The comment about your attitude was perhaps also unhelpful but I think it's fairly clear that you got the personal dimension of this discussion started - exarkun's contribution was just a direct factual comment which you appear to have read something more into.

So, please cool it with the personal comments, Hynek, we're all just trying to help.

My commentary wasn’t anymore bizarre than telling me “may be in for disappointment […that software is sometimes inconsistent].” and I don’t see how that helped either. I honestly tried to keep it light with humor but got a lesson on my attitude. Yeah, I arrived at “but he started it”-whining and I’m sorry; I would have probably ignored it if it were for the first time.

Switching gears.

given JP’s order of preference, I’d rather not too.

Rather not what?

Write a patch that does type checking + explicit error messages. After all it was his least favorite solution.

What has got a bit lost in this discussion though is the fact, that TLS’s writeSequence() doesn’t check for unicode strings like the vanilla one does. This could be seen a bug in itself and fixing it would be the simplest viable solution to the whole dilemma of two common protocols behaving differently in subtle ways.

I don't think that this has been "lost" at all - that appears to be the topic of the bug, and much of the discussion has centered around it.

ISTM that the discussion is around the effect of the fact. Not the problem itself, that you can feed unicode to a method you’re not supposed to. That’s what I wanted to point out.

(it doesn’t mention lists at all in the case of t.i.a.FileDescriptor btw)

By "in the case of" I assume you mean in the docstring for twisted.internet.abstract.FileDescriptor as opposed to the docstring for twisted.internet.interfaces.ITransport.writeSequence - in this case I think there's a distressing amount of overlap, since abstract.FileDescriptor.writeSequence really ought to be little more than a reference to the canonical source of truth; a Implementation of L{ITransport.writeSequence} or some such.

Indeed.

comment:13 follow-up: Changed 22 months ago by hynek

(Preamble: the drama that based on a confluence of misunderstandings over the years has been resolved in a personal talk initiated by JP and is henceforth thankfully resolved.)

I’m sorry I keep bringing this up, but I’d still like to point again at the unicode check:

I went through all writeSequence() implementations and only four contained a check for unicode. The rest either delegated or just ''.join(iovec)’ed:

internet._pollingfile._PollableWritePipe
internet.abstract.FileHandle
internet.abstract.FileDescriptor
internet.iocpreactor.abstract.FileHandle

In the case of _PollableWritePipe, the check is repeated in write(), thus it’s superfluous anyway.

My proposal would be to rip the check from these three implementation which would both kind of solve the problem and would be consistent with the claim that writeSequence() has its roots in performance optimization and be in harmony with your second point. Or would you dislike the fact that something that isn’t really supported would work?

NB I don’t mean to extend writeSequence() to support generators per se, we can perfectly add the doc tweaks Glyph mentioned.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 22 months ago by glyph

  • Priority changed from normal to lowest

Replying to hynek:

(Preamble: the drama that based on a confluence of misunderstandings over the years has been resolved in a personal talk initiated by JP and is henceforth thankfully resolved.)

Glad to hear it.

I’m sorry I keep bringing this up, but I’d still like to point again at the unicode check:

I went through all writeSequence() implementations and only four contained a check for unicode. The rest either delegated or just ''.join(iovec)’ed:

internet._pollingfile._PollableWritePipe
internet.abstract.FileHandle
internet.abstract.FileDescriptor
internet.iocpreactor.abstract.FileHandle

In the case of _PollableWritePipe, the check is repeated in write(), thus it’s superfluous anyway.

My proposal would be to rip the check from these three implementation which would both kind of solve the problem and would be consistent with the claim that writeSequence() has its roots in performance optimization and be in harmony with your second point. Or would you dislike the fact that something that isn’t really supported would work?

The optimization that it is designed to support is to call something that supports writing from multiple memory segments without copying them, so this is somewhat moot with respect to its true purpose. I just don't want to commit to supporting this type of behavior; the problem is, if we go to the trouble to establish it in every implementation, it makes it hard to get away from.

On the other hand, unicode input would still cause an exception, just with slightly different timing, so, maybe that's fine.

NB I don’t mean to extend writeSequence() to support generators per se, we can perfectly add the doc tweaks Glyph mentioned.

Perhaps we can file a separate ticket for improving the documentation and get that addressed since it would be good to explain the intended behavior before trying to polish all possible unintended behaviors.

comment:15 in reply to: ↑ 14 Changed 22 months ago by hynek

Replying to glyph:

I went through all writeSequence() implementations and only four contained a check for unicode. The rest either delegated or just ''.join(iovec)’ed:

internet._pollingfile._PollableWritePipe
internet.abstract.FileHandle
internet.abstract.FileDescriptor
internet.iocpreactor.abstract.FileHandle

In the case of _PollableWritePipe, the check is repeated in write(), thus it’s superfluous anyway.

My proposal would be to rip the check from these three implementation which would both kind of solve the problem and would be consistent with the claim that writeSequence() has its roots in performance optimization and be in harmony with your second point. Or would you dislike the fact that something that isn’t really supported would work?

The optimization that it is designed to support is to call something that supports writing from multiple memory segments without copying them, so this is somewhat moot with respect to its true purpose.

Yeah I get that. I just find it funny to optimize for writev and add an O(N) check. :)

I just don't want to commit to supporting this type of behavior; the problem is, if we go to the trouble to establish it in every implementation, it makes it hard to get away from.

Agreed. So last attempt, then I leave you alone: unfortunately a generic writeSequence() mixin isn’t feasible as while they all do just a self.X.write(''.join(iovec)), the X is very volatile. OTOH the test wouldn’t be O(N) at all, as we could check the result of the join for being unicode since concatenating str with unicode always yields unicode. It could a be simple helper function that either raises TypeError or returns the string back. The resulting code would be something along of self.X.write(_assertString(''.join(iovec))) which is also pretty idiomatic.

On the other hand, unicode input would still cause an exception, just with slightly different timing, so, maybe that's fine.

Yes, that’s the circle my mind is running. ;)

NB I don’t mean to extend writeSequence() to support generators per se, we can perfectly add the doc tweaks Glyph mentioned.

Perhaps we can file a separate ticket for improving the documentation and get that addressed since it would be good to explain the intended behavior before trying to polish all possible unintended behaviors.

Done.

comment:16 follow-up: Changed 22 months ago by exarkun

Agreed. So last attempt, then I leave you alone: unfortunately a generic writeSequence() mixin isn’t feasible as while they all do just a self.X.write(.join(iovec)), the X is very volatile.

This is mistaken. While they do not all do the same thing, some of them do the same thing. Some of them can be refactored to avoid duplicating code.

OTOH the test wouldn’t be O(N) at all, as we could check the result of the join for being unicode since concatenating str with unicode always yields unicode. It could a be simple helper function that either raises TypeError or returns the string back. The resulting code would be something along of self.X.write(_assertString(.join(iovec))) which is also pretty idiomatic.

What would be the benefit of this? Are there write implementations that don't check for unicode already?

comment:17 in reply to: ↑ 16 Changed 22 months ago by hynek

Replying to exarkun:

Agreed. So last attempt, then I leave you alone: unfortunately a generic writeSequence() mixin isn’t feasible as while they all do just a self.X.write(.join(iovec)), the X is very volatile.

This is mistaken. While they do not all do the same thing, some of them do the same thing. Some of them can be refactored to avoid duplicating code.

I had a look, some numbers:

out of 34 implementations:

self.write(''.join(iovec)) = 10
self.X.write(''.join(iovec)) = 4 + 1 (self.q.put)
delegation = 5
delegation w/ check = 1

the rest is more complicated.

Thus, a simple mixin along of:

class DumbWriteSequenceMixin:
    def writeSequence(self, iovec):
        self.write(''.join(iovec))

would save us 10 implementations. I suspect most of those self.X.write()’s could be be solve by refactoring them to delegation.

What do you think?

OTOH the test wouldn’t be O(N) at all, as we could check the result of the join for being unicode since concatenating str with unicode always yields unicode. It could a be simple helper function that either raises TypeError or returns the string back. The resulting code would be something along of self.X.write(_assertString(.join(iovec))) which is also pretty idiomatic.

What would be the benefit of this? Are there write implementations that don't check for unicode already?

Consistency. I’d prefer to remove the unicode checks from writeSequence()altogether as said before; but I see Glyph’s point that he doesn’t want people to use generators which would implicitly work. While I think that it wouldn’t really hurt. :)

Note: See TracTickets for help on using tickets.