Dec 012009
 

It seems by default wsgi use 15 threads per daemon process, it sounds like a cool feature at the very beginning, but later it turned to nightmare to me.

I’m working on testing user authentication system with Cassandra as backend, obviously establishing connection to Cassandra for every request doesn’t make sense at all, so I’m trying to do some persistent connection stuffs.

The quick and easy solution for me is, establishing the connection in class level, so at least requests handled by same process can share the connection. So I created a class member to hold the connection, and __init__ of every class instance, I “ping” the connection by describe_keyspace (assume it access meta data only which would be fast). If there is anything wrong during the “ping”, I will shutdown current connection and establish a new one.

Too bad with default wsgi configuration – it runs in thread model, so by the time I close/re-open connection for the class (process), there could be other threads using the connection. It seems I can only run with some kind lock mechanism to avoid this from happening, which is tough and frankly speaking, I don’t know how to make it efficient.

So I changed wsgi configuration, make it one thread per process, by doing so I can be sure while I’m doing the “ping” there is only one request for this class (process), so I can safely do whatever I want, as within the class (process) there is no parallel execution.

The performance is pretty good, I will tune to see how many processes I should run for the wsgi daemon, but with 5 processes, I can reach 190 ms average/250 ms maximum for 10 concurrent clients, 380/450 for 20 concurrent clients, and550/650 for 30, pretty linear in term of average response time.

It seems, to me at least, thread is not a great idea always, I cannot tell how much performance gain can it gives me, but it at least makes things complicated.