Opened 7 years ago

Closed 5 years ago

#3245 enhancement closed worksforme (worksforme)

__slots__ in Deferred class?

Reported by: Mekk Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: glyph, radix, mgeisler Branch: branches/deferred-slots-3245
(diff, github, buildbot, log)
Author: mgeisler, Mekk, exarkun Launchpad Bug:

Description (last modified by exarkun)

Programs which use a lot of Deferreds could benefit from __slots__.

Simple test I did proved, that the program which created huge list of empty Deferreds had its memory usage (both RSS anv VSS) halved after such a change (longer post on the mailing list)

Important impact: d.some_unknown_attribute=something will fail. What can be ... a good thing, as it would catch some cases of deferred being mistakenly used as if it was real object.

The change is fairly small (defer.py):

Instead of:

   class Deferred:

write:

   class Deferred(object):
      __slots__ = ['debug', 'callbacks', 'called', 'paused', 'result',
                   'timeoutCall', '_runningCallbacks', '_debugInfo']

Attachments (2)

testdef.py (9.1 KB) - added by Mekk 7 years ago.
Primitive test (comment/uncomment Deferred/SlotsDeferred in main to switch between original and slots version)
deferred_slots.patch (2.8 KB) - added by mgeisler 7 years ago.
Patch against Twisted SVN r23670

Download all attachments as: .zip

Change History (24)

Changed 7 years ago by Mekk

Primitive test (comment/uncomment Deferred/SlotsDeferred in main to switch between original and slots version)

comment:1 Changed 7 years ago by exarkun

  • Description modified (diff)

Fixing description markup.

comment:2 Changed 7 years ago by Mekk

Here are the results of attached test on my machine (i686 Linux, twisted 2.5.0 as distributed with ubuntu):

$ python testdef.py
Testing Deferred, creating 250000 instances on each step
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
marcink 23273 20536 92 15159 56980 1 15:55 pts/18 S+ 0:00 python testdef.py
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
marcink 23273 20536 86 28353 108984 1 15:55 pts/18 S+ 0:02 python testdef.py
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
marcink 23273 20536 84 41550 161004 1 15:55 pts/18 S+ 0:05 python testdef.py

$ python testdef.py
Testing SlotsDeferred, creating 250000 instances on each step
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
marcink 23280 20536 54 7835 28072 0 15:55 pts/18 S+ 0:00 python testdef.py
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
marcink 23280 20536 99 13684 51164 0 15:55 pts/18 S+ 0:01 python testdef.py
UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD
marcink 23280 20536 80 19533 74260 1 15:55 pts/18 S+ 0:02 python testdef.py

comment:3 Changed 7 years ago by Mekk

Arrgh, again:

$ python testdef.py
Testing Deferred, creating 250000 instances on each step
UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
marcink  23273 20536 92 15159 56980   1 15:55 pts/18   S+     0:00 python testdef.py
UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
marcink  23273 20536 86 28353 108984  1 15:55 pts/18   S+     0:02 python testdef.py
UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
marcink  23273 20536 84 41550 161004  1 15:55 pts/18   S+     0:05 python testdef.py

$ python testdef.py
Testing SlotsDeferred, creating 250000 instances on each step
UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
marcink  23280 20536 54  7835 28072   0 15:55 pts/18   S+     0:00 python testdef.py
UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
marcink  23280 20536 99 13684 51164   0 15:55 pts/18   S+     0:01 python testdef.py
UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
marcink  23280 20536 80 19533 74260   1 15:55 pts/18   S+     0:02 python testdef.py

comment:4 follow-up: Changed 7 years ago by exarkun

  • Owner changed from glyph to Mekk

Thanks for your investigation into this area (and particularly for supplying a repeatable benchmark :). Can you attach a unified diff of the Deferred change you described in the ticket description? That makes it easier for someone to look at or try out the change you're suggesting. Also, could you try the Twisted test suite with this change applied and report on the results? The test suite is run with the trial command, eg trial twisted. Thanks.

comment:5 Changed 7 years ago by Mekk

I do not have any experiences with installing separate copy of twisted and patching it, so I may take a look at it but surely not very soon, would need to learn a few things before.

Changed 7 years ago by mgeisler

Patch against Twisted SVN r23670

comment:6 Changed 7 years ago by mgeisler

I have just tried changing Deferred to use __slots__ -- please see the patch I just attached.

The test suite works, except for one test. I get:

[ERROR]: twisted.test.test_defer.LogTestCase.test_errorLogWithInnerFrameCycle

Traceback (most recent call last):
  File "/home/mg/tmp/Twisted/twisted/test/test_defer.py", line 653, in test_errorLogWithInnerFrameCycle
    _subErrorLogWithInnerFrameCycle()
  File "/home/mg/tmp/Twisted/twisted/test/test_defer.py", line 650, in _subErrorLogWithInnerFrameCycle
    d._d = d
exceptions.AttributeError: 'Deferred' object has no attribute '_d'
-------------------------------------------------------------------------------
Ran 4351 tests in 183.203s

FAILED (skips=115, expectedFailures=24, errors=1, successes=4211)

This error is to be expected since __slots__ forbids assignments to attributes not mentioned.

In the patch I changed the debug class attribute to a top-level variable in the module. That was necessary since __slots__ makes class attributes read-only. This is described here where it says:

  • __slots__ are implemented at the class level by creating descriptors (3.4.2) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment.

comment:7 Changed 7 years ago by Mekk

  • Owner changed from Mekk to glyph

mgeisler did the work you asked for ;-)

comment:8 in reply to: ↑ 4 Changed 6 years ago by mgeisler

Replying to exarkun:

Can you attach a unified diff of the Deferred change you described in the ticket description? Also, could you try the Twisted test suite with this change applied and report on the results?

I attached a patch in unified diff format and ran the test suite. Is there anything else I can do to push this forward?

comment:9 Changed 6 years ago by exarkun

  • Keywords review added
  • Owner glyph deleted

If there's code on a ticket that a Twisted committer needs to look at, it's really important to add the "review" keyword to the ticket. Otherwise, it's very likely the ticket will get lost amongst the 1000+ other open tickets. :)

comment:10 Changed 6 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/deferred-slots-3245

(In [25963]) Branching to 'deferred-slots-3245'

comment:11 Changed 6 years ago by exarkun

(In [25964]) Apply deferred_slots.patch

refs #3245

comment:12 Changed 6 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun
  • Status changed from new to assigned

There need to be some benchmarks for at least a few uses of Deferred. The original attachment from Mekk is kind of the right idea, but it needs to be refined.

comment:13 Changed 6 years ago by exarkun

  • Author changed from exarkun to mgeisler, Mekk, exarkun
  • Cc glyph radix added

So I measured two consequences of the changes in this branch:

  • memory usage for empty Deferreds is reduced to about 45% of what it is in trunk
  • The CPU cost of instantiating a Deferred is increased by about 75%

This seems like a worthwhile tradeoff to me. Some other people have suggested making the tradeoff configurable. I'm not really interested in working on the mechanism for that, though (rather, what I mean is that I think that taking the time to implement such a mechanism is a waste of time). If that's going to be a requirement, then I'm probably done working on this ticket for a while, otherwise I can probably do the remaining work to get this ready for another review.

comment:14 Changed 6 years ago by mgeisler

  • Cc mgeisler added

comment:15 Changed 6 years ago by mgeisler

Thanks for looking at it! And sorry for not putting the review keyword on the ticket.

I'm surprised to see the CPU cost go up. From an article in the November edition of Python Magazine I had the impression that using __slots__ would be marginally faster than the old way.

Maybe the difference is caused by Deferred being an old-style class. In the article they compared two new-style classes where one used __slots__ and the other didn't. If that is the case, then we will suffer the performance hit anyway when old-style classes go away.

Could you try measuring the impact of just changing class Deferred to class Deferred(object)?

In any case, I vote for slimming the Deferreds as much as possible :-)

comment:16 follow-up: Changed 6 years ago by exarkun

I measured the cost of just switching to new-style and there is a bit of a slowdown, but it's only a few percent (I didn't save those results, so I can't say exactly - also, there was a lot of noise in the benchmark results; the precision of my measurements may have been greater than the cost of switching to new-style). The cost seems to largely be in initialization of the Deferred. Without slots, only one attribute needs to be set in __init__. With slots, a lot more need to be. So this is basically interpretation overhead, and __init__ takes a lot more interpretation with slots.

comment:17 Changed 6 years ago by exarkun

  • Status changed from assigned to new

I meant may have been less than the cost, of course.

comment:18 in reply to: ↑ 16 Changed 6 years ago by mgeisler

I think that makes good sense. The benchmark I saw used no class-level fields -- the constructor did the same amount of work in both cases. But here the __slots__ approach is up against some clever copy-on-write done by Python when you assign to self.foo.

Thanks for testing, I hope you guys will end up deciding to put this in despite the CPU hit -- I'm doing cryptographic protocols and there the theory is that all (polynomial time) local computations are for free... If that is true, then burning a bit more CPU should be fine :-)

comment:19 Changed 6 years ago by Mekk

What about reducing the list of attributes a bit:

  • .debug is kept as class attribute for now (and as far as I understand recommended to be used by standalone functions), so maybe it could land outside the class
  • ._debugInfo is used only in debug mode, so maybe could be kept separately (say in some external dictionary keyed by deferred)

This would make us not paying for debugging when it is not active

Of course this only makes sense if removing those two variables gives any gain (I wrote the above after reading comment 16, if 7 extra attributes init costs 75%, then maybe 5 would reduce it to ~50%).
Commenting them out and rerunning the test which showed this 75% should tell

comment:20 Changed 6 years ago by glyph

Thanks very much to everyone who has been working on this. We need more performance investigation of Twisted :).

It's too bad that it seems to increase the cost of instantiating a Deferred. Deferreds should be the cheapest things in the world, so that nobody ever optimizes anything by building an ad-hoc crappy callback management mechanism to get away from the "slow" Deferred.

Of course, I wouldn't want anyone trying to get away from the "bloated" Deferred, either, but in my experience, CPU is more important than RAM. Even on the embedded device where I was doing Twisted development, the big deal was startup time and responsiveness, we still had a reasonable amount of RAM left. Specifically in the case of Deferred, the whole point is that these objects are supposed to be relatively short-lived, going away when whatever operation they represent has completed.

Looking at the history of this ticket, I don't see a compelling argument to favor memory conservation over CPU optimization. However, the existence of this ticket suggests that some people like to have zillions of Deferreds and that memory does matter to them more than speed. I'd appreciate it if someone could explain that rationale a bit more thoroughly. However, the apparent divergence of these perspectives is why I am sympathetic to some form of configuration (leaving the current behavior as the default).

Nobody really wants to configure Twisted to be "fast and bloated" or "lean and slow", though. Perhaps we should pursue other Deferred optimization strategies, like #2245?

comment:21 Changed 5 years ago by glyph

  • Resolution set to worksforme
  • Status changed from new to closed

Maybe we'll investigate this again later, but as things stand today I don't think we're going to pursue this.

comment:22 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.