[Twisted-Python] Re: Send many large files with PB

Brian Warner warner at lothar.com
Mon May 22 00:30:32 MDT 2006


> Justin Mazzola Paluska:
>> How bad is the slow down?  Or, to ask the question another way, how
>> much CPU will the process actually take?  
>
> 100% CPU for all the time it takes. Serialization is CPU- and
> memory-intensive.

Well, the memory footprint is equal to the size of the chunk that you're
paging.. that's the reason for using FilePager, to reduce the chunk size. (I
think the instantaneous footprint is 2x the chunksize, but once the chunk has
been serialized, it drops back to close to 1x).

Serializing strings is equivalent to copying them. The banana format for
strings is just a couple of length bytes followed by the string contents, so
there's not a whole lot of complex CPU stuff going on, just strcpy.

The other advantage of FilePager is that the serialization CPU time is spread
out according to how fast the network is. I suspect that in most
environments, the process will be IO limited, and CPU usage during the
process would be far less than 100%. That said, it *is* less efficient than
an HTTP server that can just dump the file straight to the network (or better
yet use something like sendfile() to avoid the kernel/userspace transition
altogether).

>> Or is the best thing to do just use the PB to send a URL to the DEST
>> server?
>
> That's what I was hinting at, yes. Of course you should separately take
> care of any required authentication, authorization and encryption on the
> HTTP connection.

Yup, and if you do too much then you're verging back to the performance of
PB. Creating a random, unguessable URL which only allows a single download of
the target file will basically provide the authentication/authorization
features (modulo a man-in-the-middle attack), but not any confidentiality.
For some applications that might be enough, though.

 -Brian




More information about the Twisted-Python mailing list