[Twisted-Python] sharing a dict between child processes

Sean DiZazzo sean.dizazzo at gmail.com
Wed Nov 6 22:29:34 MST 2019


If you need guaranteed delivery of the data, why not just use a TCP
connection to the unix socket, instead of a UDP connection which inherently
can lose data?  In that case I don't think your patch would be needed.

I didn't look at the source, so perhaps I missed something.

On Wed, Nov 6, 2019 at 9:10 AM Scott, Barry <barry.scott at forcepoint.com>
wrote:

> On Wednesday, 6 November 2019 16:43:52 GMT Waqar Khan wrote:
> > Hi Barry,
> >         Thanks for the response. Where can I read more about (1). It
> seems
> > like that is something I need to explore.
> > As we already have (2) (cache for each process).
> > Thanks again for your help.
>
> We use the UDS (Unix domain sockets) to talk to a master process.
> Twisted has support for this. But you need a small patch to avoid data
> lose.
>
> UDS does not lose data and is message based, not bytes based. We
> use pickle to encode requests and responses.
>
> Barry
>
> The patch is:
>
> --- Twisted-18.4.0.orig/src/twisted/internet/unix.py.orig       2018-08-01
> 12:45:38.711115425 +0100
> +++ Twisted-18.4.0/src/twisted/internet/unix.py 2018-08-01
> 12:45:47.946115123
> +0100
> @@ -509,11 +509,6 @@
>                  return self.write(datagram, address)
>              elif no == EMSGSIZE:
>                  raise error.MessageLengthError("message too long")
> -            elif no == EAGAIN:
> -                # oh, well, drop the data. The only difference from UDP
> -                # is that UDP won't ever notice.
> -                # TODO: add TCP-like buffering
> -                pass
>              else:
>                  raise
>
> You then have to handle the EAGAIN error and do retries yourself.
> As it stands the patch is not good enough to put into twisted as a
> full fix would need to put the handling of the retries into twisted.
>
> I guess (2) does not work for you as the cache hit rate is low
> and you need to share the cache to get a benefit. Cache entries
> only get used a few times?
>
> In our case the hit rate is high (99%+) and we just pay the cost of
> populating the caches on process start up, which ends up being
> noise.
>
> Barry
>
> >
> > On Wed, Nov 6, 2019 at 8:39 AM Scott, Barry <barry.scott at forcepoint.com>
> >
> > wrote:
> > > On Wednesday, 6 November 2019 14:21:22 GMT Maarten ter Huurne wrote:
> > > > On Wednesday, 6 November 2019 07:19:56 CET Waqar Khan wrote:
> > > > > Hi,
> > > > > So, I am writing a twisted server. This server spawn multiple child
> > > > > processes using reactor spawnProcess that initializes a process
> > > > > protocol.
> > > > >
> > > > > Now, each of the childprocess receives some REST requests. Each
> > > > > process has a dict that acts as cache.
> > > > > Now, I want to share dict across processes.
> > > > > In general, python has SharedMemoryManager in multiprocessing
> module
> > > > > which would have helped.
> > > > >
> https://docs.python.org/3/library/multiprocessing.shared_memory.html#m
> > > > > ultiprocessing.managers.SharedMemoryManager.SharedMemory But since
> I
> > > > > am using twisted internal process implementation, how do I share
> this
> > > > > dict across the processes so that all the processes use this common
> > > > > cache?
> > > >
> > > > Keeping a dictionary in SharedMemoryManager seems far from trivial. I
> > > > don't think you can allocate arbitrary Python objects in the shared
> > > > memory and even if you could, you would run into problems when one
> > > > process mutates the dictionary while another is looking up something
> or
> > > > also mutating it.
> > > >
> > > > It could in theory work if you implement a custom lock-less
> dictionary,
> > > > but that would be a lot of work and hard to get right. Also having
> > > > shared memory mutations be synced between multiple CPU cores could
> > > > degrade performance, since keeping core-local CPU caches in sync is
> > > > expensive.
> > > >
> > > > Would it be an option to have only one process accept the REST
> requests,
> > > > check whether the result is in the cache and only distribute work to
> the
> > > > other processes if you get a cache miss? Typically the case where an
> > > > answer is cached is pretty fast, so perhaps you don't need multiple
> > > > processes to handle incoming requests.
> > >
> > > We have used a couple of ways to cache.
> > > 1. Use a singleton process to hold the cache and ask it, via IPC, for
> > > answers
> > > from the other process.
> > > 2. have a cache in each process
> > >
> > > Barry
> > >
> > > > Bye,
> > > >
> > > >               Maarten
> > > >
> > > > _______________________________________________
> > > > Twisted-Python mailing list
> > > > Twisted-Python at twistedmatrix.com
> > > > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> > >
> > > _______________________________________________
> > > Twisted-Python mailing list
> > > Twisted-Python at twistedmatrix.com
> > > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
>
>
>
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20191106/87e010b1/attachment-0002.html>


More information about the Twisted-Python mailing list