[Twisted-Python] pushing out same message on 100k TCPs

Phil Mayers p.mayers at imperial.ac.uk
Sat Feb 11 04:07:11 MST 2012


On 02/10/2012 08:20 PM, Tobias Oberstein wrote:
>
>> store the socket buffer as a (fairly complex) linked list of reference-counted
>> blocks, and use scatter-gather IO to the network card.
>
> Doesn't a (modern) kernel do something like that for virtual memory pages ie.?

Possibly. My knowledge of kernel memory management is a lot more patchy 
than network stacks.

One option you could investigate, that I was going to suggest in my 
original reply but didn't have the time to write up, is the sendfile() 
API. If you write your message to a temporary file, you could call 
sendfile() on all 100k connections using the same file descriptor. So, 
something like:

  fd = os.open(PATH, os.O_RDWR)
  os.write(fd, message)
  os.unlink(PATH)
  for connection in biglist:
    connection.sendfile(fd, offset=0, len=100)
  os.close(fd)

Now, as I understand it, sendfile() will perform zero-copy IO; since the 
contents of the file will undoubtedly be in the page cache, it should in 
theory DMA the data straight from the (single copy of the) data in RAM 
to the NIC buffers.

It should also handle refcounting for you - you unlink the filename 
after obtaining a descriptor, and close() the FD once you've called 
sendfile, and the kernel *should* in theory free the inode and page 
containing file data once all TCP ACKs have been received.

You'll still have to make 100k syscalls, and you may find the kernel 
chooses to copy the data anyway.

However - AFAIK Twisted does not support sendfile(), and it can be 
tricky to make it work with non-blocking IO.

:o(

You may also want to look at the splice() vmsplice() and tee() syscalls 
added to recent Linux kernels. tee() in particular can copy data from 
pipe to pipe without consuming, so can be repeated multiple times. It 
may be possible to assemble something that will do this task efficiently 
from those building blocks, but the APIs aren't available in Twisted.

>> and not useful.
>
> When using VM pages (_if_ that would be possible) and thus no data duplication,
> then why not useful?

Sorry, I should have been more precise - it's probably not often useful. 
There are not very many applications where sending the same TCP stream 
to that many clients at the same time is helpful - realtime video/audio 
over TCP spring to mind, and typically those need to adapt to slow 
clients by dropping them to a lower rate i.e. not the same stream any more.

As Glyph has mentioned, encryption is also a factor in todays internet.

I'm kind of curious about what your application is!




More information about the Twisted-Python mailing list