[Twisted-Python] Persistence in browsers?

Tibi Dondera incoming at pronet-romania.com
Fri Feb 4 19:02:58 MST 2005


Hello,
 
I'm not sure if this is the right list, or maybe twisted-web is more
appropriate.
 
Also, I might ask a question which has already been answered many times,
but I was unable to find references.
 
Short version:
I have 10.000 - 100.000 web browsers that are connected to my site, and
I need to inform them __real-time__ (a max of 3-5 seconds delay) of an
event that happened on the server. Is twisted the right way to go, given
the fact that it promises asynchronous event handling ?
 
Long version:
I have an information flux on a web page, that must change, as stated
before, on some specific event that happens on a server.
I have thought of two ways of doing this:
1. The "ask every 5 seconds approach"
Pretty obvious, the browser connects every 5 seconds and requests the
page again. However, for 10.000 clients, the server soon dies, and the 5
seconds limit is still not respected (because times of response get
incredibly long when apache is submerged in requests).
 
2. The "ask and wait for answer approach"
The basic idea is the following:
- the browser connects to the web page
- there is a javascript snippet in the page that reconnects in the
background (using the javascript HTTPRequest object) to a special script
on the server.
- the server keeps the connection open (by sending spaces, literally,
once every 10-15 seconds - and sleeping in between, not to put too much
stress on the server either). When an event happens, the server sends
all the needed data to the client, that redisplays it (through
javascript).
Of course, there is the problem with apache and it's 5 minutes script
running limit (I have implemented this in PHP), but the javascript code
is pretty smart to handle this, and when a connection fails, it
reconnects and all goes well.
 
This was a little better than the first approach, at least in the
response times, that are now consistent with the requirements. However,
a new problem arrises: apache cannot handle a very large number of open
connections at the same time (every web browser has at least an open
connection, in this case). After my calculations (it's pretty hard to
compute exactly, as I know of no javascript-enabled crawler that I can
programmatically use), the server will be completely trashed at around
300 connections.
 
The problem gets even more complicated with today's browsers: they have
a limitation of 2 concurent connections to the same site (don't know if
you noticed, but you cannot download 3 files concurently from the same
site). And the HTTPRequest connections count toward this limit. So if a
client uses two of my information fluxes, he will be unable to visit the
site at the same time.
Don't know if twisted solves this last problem. If not, I'll try to find
a work around (messing around with the DNS seems like a good idea at
this point in time).
 
The question is if twisted can solve my problem of informing all my
clients of the event (the event will not happen concurently for all the
clients, so there is no problem with server load; however, all the
clients will be listening concurently for their specific event)
 
Thanks for any answer, or for any direction/pointers you can give me. I
might be totally wrong in my approach, so I'm really open to all
suggestions (except buying lots of servers to make this work, of
course).
 
Tiberiu DONDERA
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20050205/258fe898/attachment.html>


More information about the Twisted-Python mailing list