[Twisted-Python] sharing a dict between child processes

Scott, Barry barry.scott at forcepoint.com
Thu Nov 7 03:23:40 MST 2019


On Wednesday, 6 November 2019 18:23:41 GMT Waqar Khan wrote:
> Thanks for the info.
> Yeah, it seems that UDS is the way to go. (I need to read more about them).
> 
> Actually, is there a simple example you can give that can help me
> understand this a bit better?
> Thanks

The code I have I cannot share sorry.
The twisted docs should get you going.

But as I said without that patch you will lose messages under load.
You have to handle the EAGAIN and retry the send.

We have a send queue that we drain on a timer if the send fails.
The failure is caused by the receiving end not process messages fast enough.

Barry



> 
> On Wed, Nov 6, 2019 at 9:07 AM Scott, Barry <barry.scott at forcepoint.com>
> 
> wrote:
> > On Wednesday, 6 November 2019 16:43:52 GMT Waqar Khan wrote:
> > > Hi Barry,
> > > 
> > >         Thanks for the response. Where can I read more about (1). It
> > 
> > seems
> > 
> > > like that is something I need to explore.
> > > As we already have (2) (cache for each process).
> > > Thanks again for your help.
> > 
> > We use the UDS (Unix domain sockets) to talk to a master process.
> > Twisted has support for this. But you need a small patch to avoid data
> > lose.
> > 
> > UDS does not lose data and is message based, not bytes based. We
> > use pickle to encode requests and responses.
> > 
> > Barry
> > 
> > The patch is:
> > 
> > --- Twisted-18.4.0.orig/src/twisted/internet/unix.py.orig       2018-08-01
> > 12:45:38.711115425 +0100
> > +++ Twisted-18.4.0/src/twisted/internet/unix.py 2018-08-01
> > 12:45:47.946115123
> > +0100
> > @@ -509,11 +509,6 @@
> > 
> >                  return self.write(datagram, address)
> >              
> >              elif no == EMSGSIZE:
> >                  raise error.MessageLengthError("message too long")
> > 
> > -            elif no == EAGAIN:
> > -                # oh, well, drop the data. The only difference from UDP
> > -                # is that UDP won't ever notice.
> > -                # TODO: add TCP-like buffering
> > -                pass
> > 
> >              else:
> >                  raise
> > 
> > You then have to handle the EAGAIN error and do retries yourself.
> > As it stands the patch is not good enough to put into twisted as a
> > full fix would need to put the handling of the retries into twisted.
> > 
> > I guess (2) does not work for you as the cache hit rate is low
> > and you need to share the cache to get a benefit. Cache entries
> > only get used a few times?
> > 
> > In our case the hit rate is high (99%+) and we just pay the cost of
> > populating the caches on process start up, which ends up being
> > noise.
> > 
> > Barry
> > 
> > > On Wed, Nov 6, 2019 at 8:39 AM Scott, Barry <barry.scott at forcepoint.com>
> > > 
> > > wrote:
> > > > On Wednesday, 6 November 2019 14:21:22 GMT Maarten ter Huurne wrote:
> > > > > On Wednesday, 6 November 2019 07:19:56 CET Waqar Khan wrote:
> > > > > > Hi,
> > > > > > So, I am writing a twisted server. This server spawn multiple
> > > > > > child
> > > > > > processes using reactor spawnProcess that initializes a process
> > > > > > protocol.
> > > > > > 
> > > > > > Now, each of the childprocess receives some REST requests. Each
> > > > > > process has a dict that acts as cache.
> > > > > > Now, I want to share dict across processes.
> > > > > > In general, python has SharedMemoryManager in multiprocessing
> > 
> > module
> > 
> > > > > > which would have helped.
> > 
> > https://docs.python.org/3/library/multiprocessing.shared_memory.html#m
> > 
> > > > > > ultiprocessing.managers.SharedMemoryManager.SharedMemory But since
> > 
> > I
> > 
> > > > > > am using twisted internal process implementation, how do I share
> > 
> > this
> > 
> > > > > > dict across the processes so that all the processes use this
> > > > > > common
> > > > > > cache?
> > > > > 
> > > > > Keeping a dictionary in SharedMemoryManager seems far from trivial.
> > > > > I
> > > > > don't think you can allocate arbitrary Python objects in the shared
> > > > > memory and even if you could, you would run into problems when one
> > > > > process mutates the dictionary while another is looking up something
> > 
> > or
> > 
> > > > > also mutating it.
> > > > > 
> > > > > It could in theory work if you implement a custom lock-less
> > 
> > dictionary,
> > 
> > > > > but that would be a lot of work and hard to get right. Also having
> > > > > shared memory mutations be synced between multiple CPU cores could
> > > > > degrade performance, since keeping core-local CPU caches in sync is
> > > > > expensive.
> > > > > 
> > > > > Would it be an option to have only one process accept the REST
> > 
> > requests,
> > 
> > > > > check whether the result is in the cache and only distribute work to
> > 
> > the
> > 
> > > > > other processes if you get a cache miss? Typically the case where an
> > > > > answer is cached is pretty fast, so perhaps you don't need multiple
> > > > > processes to handle incoming requests.
> > > > 
> > > > We have used a couple of ways to cache.
> > > > 1. Use a singleton process to hold the cache and ask it, via IPC, for
> > > > answers
> > > > from the other process.
> > > > 2. have a cache in each process
> > > > 
> > > > Barry
> > > > 
> > > > > Bye,
> > > > > 
> > > > >               Maarten
> > > > > 
> > > > > _______________________________________________
> > > > > Twisted-Python mailing list
> > > > > Twisted-Python at twistedmatrix.com
> > > > > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> > > > 
> > > > _______________________________________________
> > > > Twisted-Python mailing list
> > > > Twisted-Python at twistedmatrix.com
> > > > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python







More information about the Twisted-Python mailing list