Opened 13 years ago

Closed 11 years ago

#1768 enhancement closed invalid (invalid)

Synchronous Task Queue

Reported by: edsuom Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: spiv, matt@…, moof, Wilfredo Sánchez Vega, scmikes Branch:
Author:

Description

Attached is a module that implements a synchronous task queue with priority scheduling, graceful shutdown, and single-thread operation. I found it necessary for my work with SQLAlchemy via SQLite. The main interface for using this from the Twisted event loop is the deferToQueue method.

Usage example:

class Cruncher():
    def __init__(self):
        self.s = SynchronousQueue(thisHandyAttr=thisValue,
                                  thatHandyAttr=thatValue, ...)

    def uglyBlockingMethod(self, x):
        return self.thisHandyAttr + verySlowResult(x)

    def crunch(self, number, numberToCrunchEventually, numberToCrunchASAP):
        """Deferred firing order due to priority queueing: d1, d4, d3, d2"""
        d1 = self.s.deferToQueue(uglyBlockingFunction, 2*number)
        d2 = self.s.deferToQueue(uglyBlockingFunction,
                            numberToCrunchEventually, niceness=5)
        d3 = self.s.deferToQueue(uglyBlockingFunction, number)
        d4 = self.s.deferToQueue(uglyBlockingFunction,
                                 numberToCrunchASAP, doNext=True)
        return DeferredList([d1,d2,d3,d4])

I'd like to see this included in Twisted, and of course would be willing to license it accordingly. The code is well documented and tested; see the accompanying unit test module.

Attachments (5)

syncbridge.py (14.1 KB) - added by edsuom 13 years ago.
Syncbridge Module
syncbridge.2.py (9.6 KB) - added by edsuom 13 years ago.
Syncbridge Unit Tests
taskqueue.py (29.5 KB) - added by edsuom 13 years ago.
The taskqueue module proposed for incorporation into twisted.internet.task
taskqueue.2.py (29.5 KB) - added by edsuom 13 years ago.
The taskqueue module proposed for incorporation into twisted.internet.task
test_taskqueue.py (19.7 KB) - added by edsuom 13 years ago.
Unit tests for the taskqueue module.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 13 years ago by spiv

Cc: spiv added

I don't see an attachment, or else I'd give this the "review" keyword :)

comment:2 Changed 13 years ago by spiv

Keywords: review added

Oops, I spoke too soon!

comment:3 Changed 13 years ago by edsuom

Cc: matt@… added

I have done a lot of work on this over the past couple of days, as many of you know from #twisted. I think I've finally got all the bugs worked out (a dangerous claim to make, I know) and a pretty complete test suite is passing.

I'm attaching updated module and test code, but be sure to diff it against the SVN trunk versions of syncbridge.py and its unit test before committing it.

I'm pretty proud of this and hope it gets incorporated into the Twisted code base. Bridging the chasm from Twisted to synchronous code is, as I've recently learned, a non-trivial undertaking.

Thanks to Matt Goodall for some valuable suggestions.

comment:4 Changed 13 years ago by Glyph

Resolution: invalid
Status: newclosed

Twisted is MIT licensed, sorry. Both those files come with GPL license annotations, which are unacceptable.

comment:5 Changed 13 years ago by Glyph

Keywords: review removed

comment:6 Changed 13 years ago by Glyph

I've received a few queries saying that closing this ticket was "too harsh" - I didn't mean it to be harsh, just a statement of fact. This code states it's GPL and that contravenes Twisted policy; we can't release under the BSD license if code is included.

If you want to reopen it if you would like to relicense; I'll probably have more comments in that review.

comment:7 Changed 13 years ago by edsuom

Keywords: review added
Resolution: invalid
Status: closedreopened

OK, I'd really like to see this included in Twisted, so I went ahead and changed the license for the code and its unit test module.

The newly posted versions are from SVN trunk, rev 72, which have some stability improvements over what was originally made part of the ticket. It's been thoroughly tested at this point and I'm pretty happy with it. (But of course, see the nice disclaimer in the MIT license...)

Changed 13 years ago by edsuom

Attachment: syncbridge.py added

Syncbridge Module

Changed 13 years ago by edsuom

Attachment: syncbridge.2.py added

Syncbridge Unit Tests

comment:8 Changed 13 years ago by Stephen Thorne

I'm willing to review this in a positive form, but I don't know exactly where these files are supposed to go. Exactly where in the tree should they be put?

comment:9 Changed 13 years ago by Stephen Thorne

Status: reopenednew

Glyph: can you review this?

comment:10 Changed 13 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to edsuom

I did promise a more extensive review. Here are some issues:

  • Documentation:
    • No docstring: InvalidMethodError
    • No docstring: ShutdownError
    • shutdown's terminatorFunction argument is undocumented.
    • serial classmethod.
    • JOIN_WAIT is unused and undocumented
    • DEBUG is undocumented - although perhaps it should be removed?
    • No example of use in the module docstring. Something like what's in the ticket description might be good.
    • EigenThread doesn't override anything, so it's not clear to me why it's a new type. Does it need to be a new type?
    • the get/del/setattr methods are all undocumented and it's also not clear to me why you need to do that.
    • None of the tests explain what they're testing.
  • Errors and Exceptions
    • I don't see any negative-path tests; i.e. none of the tests verify that illegal calls are properly rejected. This is especially important in error-sensitive code like threading code.
    • The exceptions don't provide any additional information; i.e. the errors in deferToQueue don't tell you about func or args. If you commit an invalid call it might be hard to tell what was being called.
    • What happens when you forget to call shutdown? Getting the timing right between calls running in threads and reactor shutdown is a major problem with the existing threadpool (which is why the reactor only exposes one), causing lockups at shutdown (amont other things).
  • Twisted conventions
    • Coding standard
      • decorator syntax is not allowed. We currently support 2.3, and I haven't run any other 2.3 compat tests on this code; might want to check that. (Even when we stop supporting 2.3, the new decorator syntax is mostly unnecessary; you can use decorators just fine without confusing new syntax.)
      • __slots__ creates some interoperability problems and is discouraged. It is used only four times throughout the codebase, only twice outside of tests; and probably some of even those uses should be removed or changed.
    • Existing Code
      • I don't understand why TLS is being used here; isn't just a dictionary on an object associated with a thread enough?
      • twisted.python.context already implements thread-local storage for Twisted. Should it be using that?
      • It seems like the functionality here is almost identical to creating a twisted.python.threadpool with both a minthreads and a maxthreads of 1. Could this be implemented as a smaller patch against threadpool instead of a stand-alone class?

Overall impressions

While at first blush this looks like a relatively robust piece of code, it doesn't seem to fit right within Twisted. I don't see the advantage: Twisted already has some thread queuing stuff, but it doesn't work like you want; but then, you already wrote syncbridge and it seems to work well in the case where it was developed, e.g. for sAsync. I don't see any new Twisted code using this any time soon, so it's an isolated API. Isolated APIs are bad because they don't see real-world usage; almost everything in twisted.python or twisted.internet was developed in support of something else, and has been used extensively within an application inside Twisted itself. Thread interaction code has a higher maintenance overhead than anything else in Twisted so I am hesitant to add more. I know sAsync already uses this in some real-world situations, but unless you're committed to putting all maintenance into Twisted and not slowly diverging as you maintain it, I would be concerned about its future.

Most importantly I don't see where this module fits into the narrative of Twisted's documentation. Most things that one would want to use syncbridge for are the sorts of things that Twisted implicitly discourages you from doing in the first place. It would be good to have a One Right Way to do sync/async handoffs like this, but then, I also suspect that new users of Twisted will experience the anguish of wishing that there were an easier way to make their code "magically non-blocking", then scurry off to a module that purports to do just that: only to discover that actually it doesn't, and blame their program's buggy and non-deterministic behavior on syncbridge (or, more likely, Twisted as a whole) when the real problem is that they didn't make it all the way through the transition to event-driven programming.

Although I'm concerned, I'm also not entirely opposed. For example, if you wanted to take over maintenance of the ailing twisted.enterprise, and syncbridge were your preferred way of doing thread queueing (modulo the other issues I've mentioned on this ticket) I don't think it would be a bad idea to merge; t.e's thread interaction isn't the greatest as it is, and certain databases (Oracle, for example) would like muuuch tighter control than twisted.enterprise currently gives over who starts threads, when, and why: for example, it costs money to start a connection on a goodly number of Oracle installations.

Do you have another motivation for wanting to have this in Twisted proper? If so, Jerub's question is appropriate: where would it go?

comment:11 Changed 13 years ago by (none)

Since the initial reaction to this review on IRC was pretty negative:

  • (12:51:43) edsuom: With all due respect, the comments seem a bit close-minded and NIH to me. I'm taking off for the day today and will respond more later. But I wonder if the response will really be heard, unfortunately. I've done all I can to make a quality contribution to Twisted, jumping through the unit-testing, submit-a-patch-via-tracker, and documentation hoops, and yet it is rejected.

I'd like to clarify a few things to focus your response, since I was concerned that the review might come over as overly harsh and apparently it has.

  • I am not rejecting this. I am highly conservative about adding new thread-based functionality to Twisted, but I can tell a lot of work went into this, that it is definitely useful to a certain class of user, and that you want to get it into Twisted.
  • You don't have to address every comment in this review, just a reasonable sample. I promised a thorough review and this was quite late, so I looked at everything I could. The most important one is negative path testing. It should only take a moment to write some brief docstrings for the exceptions and other things, so I don't think that should be a big deal.
  • The "where does it go" question needs to be addressed just from a practical perspective. The most logical place I can think of is to add it to the existing twisted.internet.threads module, since it depends on t.internet and therefore can't live in t.python with the rest of the threading stuff.
  • Even if you're not going to commit to long-term maintenance of something like twisted.enterprise, it's already managing its own thread pool, so perhaps a good argument for putting this into Twisted is that we should be moving towards having t.e or other thread-driven APIs use syncbridge, as it is better tested, better documented, more deterministic or some other feature I haven't noticed.

I look forward to your response.

comment:12 Changed 13 years ago by jknight

Would this be useful to replace the thready stuff in t.web2.wsgi? If so, perhaps a patch simplifying the wsgi wrapper by depending on this could be a good motivating use case for putting it in twisted.

(Note: I have _not_ examined this patch closely)

comment:13 Changed 13 years ago by Jean-Paul Calderone

I notice the code uses chainDeferred when it gets a terminator task. This won't properly pass the Deferred's result through to the other Deferreds, since chainDeferred clobbers the result and replaces it with None. I don't know if this is intentional or not but it seems like it is surprising.

Also calling chainDeferred in a non-reactor thread is not allowed. It most likely works right now, but this is only coincidental.

I have only given this a cursory glance thus far, but it seems like James might be right about using this for wsgi.

comment:14 Changed 13 years ago by edsuom

I am nearing completion of a major rewrite of syncbridge to a new module, taskqueue. I've tried to take all of the comments into account, and also gave the module a much cleaner, more modular design. A significant benefit of the refactoring is that the module is equally applicable to event-driven priority queuing of asynchronous tasks as it is to completion-driven queuing of synchronous tasks in one or more threads.

I'll post for reconsideration when it's fully unit tested. In the meantime, if you're curious, see the module and its unit test from SVN.

comment:15 Changed 13 years ago by edsuom

Keywords: review added
Owner: changed from edsuom to Glyph

After a lot of refactoring, coding, and testing work responsive to the valuable comments above, and successful usage for a while now in sAsync and another project of mine, I am resubmitting this for review.

I think the logical place for this would be twisted.internet.task.

The new name is taskqueue to better reflect just what this code is doing. It's actually not limited to allowing the use of synchronous code within a thread, although that was its genesis. I'm also using it in a non-released client-server application to queue up perspective broker calls at the client to limit the number of deferred sitting out on the wire for any given client. That is helping the management of disconnections and balances the load between clients.

The module file, from sAsync SVN r135, and its unit test module, are attached. Also, see the API docs. If the presence of the as-yet-untested LoadAverage class gives anyone pause when considering inclusion of this module in the twisted code base, please just take it out of the twisted version.

Changed 13 years ago by edsuom

Attachment: taskqueue.py added

The taskqueue module proposed for incorporation into twisted.internet.task

Changed 13 years ago by edsuom

Attachment: taskqueue.2.py added

The taskqueue module proposed for incorporation into twisted.internet.task

Changed 13 years ago by edsuom

Attachment: test_taskqueue.py added

Unit tests for the taskqueue module.

comment:16 Changed 13 years ago by edsuom

Please ignore the second copy of the taskqueue module that I accidentally did a repeat upload of. (It sure would be nice if Trac let you delete your own file attachments.)

comment:17 Changed 13 years ago by moof

Cc: moof added

FWIW, I do actually have a real world, though unfortunately not open source, use for this. And making it generic to cope with async tasks will definitely help me when porting the legacy code I'm using to an async version.

+1 from a user in terms of it actually being needed. twisted.internet.defer.DeferredSemaphore is just not enough. As to whether this is twisted-like enough, I'll leave to better judges than me.

comment:18 Changed 13 years ago by Jonathan Lange

Priority: normalhighest

comment:19 Changed 13 years ago by Wilfredo Sánchez Vega

Cc: Wilfredo Sánchez Vega added

comment:20 Changed 13 years ago by Glyph

Priority: highestnormal
Status: newassigned

It's good to have some use-cases for this. It is still a pretty involved branch to review. A first glance at this seems to indicate it's OK. Still, I want to make sure that it is the final word in task queueing in Twisted before we merge it, so that we don't end up with too many different ways to do this, and I'm too far underwater for that at the moment.

I don't anticipate this state is going to persist for more than a few weeks though. If anyone else would like to comment on this to direct my attention in the review process I'd appreciate it.

comment:21 Changed 13 years ago by scmikes

Cc: scmikes added

Edsom,

Good work on this.

I would make one suggestion to increase the flexibility of the TaskQueue.

Please add keyword args for mgr, heap, and loadAverageProducer that default to the implements in taskqueue.2.py.

This will provide a simple default behaviour for users. In addition the base functionality, the user can change any of the individual pieces to use an alternate scheduling/priority algorithms if the customer use case requires the flexibility.

Thanks for the great work, Mike Schneider

542 self._taskFactory = _TaskFactory() 543 self._mgr = WorkerManager() 544 self._heap = Priority() 545 self._loadAverageProducer = LoadAverageProducer(self._heap.heap)

comment:22 Changed 12 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to edsuom
Status: assignednew

At some point we came to an implicit decision that this would be deferred past the 2.5 release, since it is a hard branch to review with implications for other APIs that we didn't want to mess with until we'd finished that. I thought that had been documented here, but I see it hasn't, so I'm updating it for the record. Hopefully the release will be RSN, and we can start looking at 2.6.

Ed, for the time being, I'm going to kick it back to you so you can respond to scmikes' comments.

comment:23 Changed 11 years ago by Glyph

Resolution: invalid
Status: newclosed

I believe the resolution of this ticket was that the functionality is now included in the Twisted Goodies package and distributed separately.

Ed, feel free to re-open this ticket if I have misinterpreted the sequence of events.

comment:24 Changed 8 years ago by <automation>

Owner: edsuom deleted
Note: See TracTickets for help on using tickets.