Ticket #2506 (closed enhancement: fixed )

Opened 2 years ago

Last modified 2 years ago

Memcached client

Reported by: therve Assigned to: therve
Type: enhancement Priority: highest
Milestone: Component: core
Keywords: Cc: manlio.perillo@gmail.com, itamarst, ralphm
Branch: branches/memcached-2506 Author: therve
Launchpad Bug:

Description

Memcached (http://www.danga.com/memcached/) is a cache server to keep objects in memory. I attach a memcache protocol implementation, with tests.

Attachments

memcache.py (8.6 kB) - added by therve 2 years ago.
memcache protocol
test_memcache.py (6.2 kB) - added by therve 2 years ago.
memcache tests
memcache_1.py (0.7 kB) - added by synapsis 2 years ago.
flow control
memcache_3.py (469 bytes) - added by synapsis 2 years ago.
error handling
client.py (2.6 kB) - added by synapsis 2 years ago.
High level support for memcache
client.2.py (2.8 kB) - added by synapsis 2 years ago.
High level support for memcache
pool.py (4.1 kB) - added by synapsis 2 years ago.
High level support for memcache with a connection pool and a deterministic backend resolver

Change History

  2007-03-06 13:20:46+00:00 changed by therve

  • attachment memcache.py added

memcache protocol

  2007-03-06 13:21:07+00:00 changed by therve

  • attachment test_memcache.py added

memcache tests

  2007-03-06 13:45:42+00:00 changed by synapsis

  • cc set to manlio.perillo@gmail.com

follow-up: ↓ 3   2007-03-06 13:47:02+00:00 changed by exarkun

Some miscellaneous thoughts on the patch:

  • Woo tests
  • No tests for what the protocol sends though
  • Some docstrings missing
  • rawDataReceived buffers input with quadratic complexity - should accumulate into a list or something
  • pickle/marshal stuff seems out of place - memcache just deals with strings, right? so the low-level protocol implementation should just deal with strings. pickle/marshal could be a level on top of the base protocol for conveniently storing python objects

in reply to: ↑ 2   2007-03-06 14:27:41+00:00 changed by therve

Replying to exarkun:

Woo tests

:D

No tests for what the protocol sends though

Allright, I'll add this.

Some docstrings missing

Mainly in the tests, AFAIK.

rawDataReceived buffers input with quadratic complexity - should accumulate into a list or something

True. Although I don't know the real impact between the two. I guess that it depends of the size of the data.

pickle/marshal stuff seems out of place - memcache just deals with strings, right? so the low-level protocol implementation should just deal with strings. pickle/marshal could be a level on top of the base protocol for conveniently storing python objects

Well the flag system is here to deal with that, so it's a bit in a protocol. I first did that because python-memcached client used that, so it was compatible with it. I'll think more about that.

follow-up: ↓ 5   2007-03-08 09:42:47+00:00 changed by synapsis

I do not know the internals of memcached, but are you sure that you can issue new requests without waiting for previous requests to be completed?

As an example see http://twistedmatrix.com/trac/browser/trunk/twisted/mail/pop3client.py

in reply to: ↑ 4 ; follow-up: ↓ 6   2007-03-08 13:26:30+00:00 changed by therve

Replying to synapsis:

I do not know the internals of memcached, but are you sure that you can issue new requests without waiting for previous requests to be completed?

No, I'm not sure about that, but I see no reason why it should happen. Until proved I guess I'll consider it works.

The code is now in branch memcached-2506.

in reply to: ↑ 5   2007-03-10 18:25:25+00:00 changed by synapsis

Replying to therve:

Replying to synapsis:

I do not know the internals of memcached, but are you sure that you can issue new requests without waiting for previous requests to be completed?

No, I'm not sure about that, but I see no reason why it should happen. Until proved I guess I'll consider it works. The code is now in branch memcached-2506.

I have played a bit with the code and it seems that there are some problems. I have attached two test script.

  2007-03-10 18:28:03+00:00 changed by synapsis

  • attachment memcache_1.py added

flow control

  2007-03-10 18:28:34+00:00 changed by synapsis

  • attachment memcache_3.py added

error handling

follow-up: ↓ 8   2007-03-10 19:09:57+00:00 changed by therve

Please use the code from the branch.

in reply to: ↑ 7   2007-03-10 19:16:47+00:00 changed by synapsis

Replying to therve:

Please use the code from the branch.

I have used the code from the branch, simply copying the module to a local directory.

  2007-03-10 20:49:43+00:00 changed by therve

Your first example is wrong (see the second append).

I've made a workaround for the second example.

follow-ups: ↓ 11 ↓ 12   2007-03-16 17:18:01+00:00 changed by therve

synapsis ? Any feedback on the last modifications ?

Thanks.

in reply to: ↑ 10 ; follow-up: ↓ 13   2007-03-18 15:51:48+00:00 changed by synapsis

Replying to therve:

synapsis ? Any feedback on the last modifications ?

Here is my suggestions:

1) Write a test memcached server, to be used in the test case.

2) Remove the serialization layer from the base class. This layer should be added in a derived class. I do not understand why there is the de-serialization code in rawDataReceived, instead of being in the get method

3) The serialization layer has no support for Unicode strings

4) The interface to memcached is too high level. The set commands does not allow the user to specify the flag (because it is used by the serialization layer).

5) The get command, in the memcached protocol, can accept more than one keys (I do not see how this can be useful, however).

6) If this is possible, I would try to add an higher level sendCmd method. with a Cmd class where to store what you need (now you use a simple list).

7) When the memcached server returns an error, you should call errback on the deferred

8) In the get method you should store the key in the deque, so that you can do an assert when the data is received to check that you got what you have asked

8) Once you fix 7), there is no more need for self._testKey(key). _set should not raise an exception

Thanks.

Thanks to you.

in reply to: ↑ 10   2007-03-18 16:12:58+00:00 changed by synapsis

Replying to therve:

synapsis ? Any feedback on the last modifications ?

Some other notes.

1) In the docstring for incr (and decr), you wrote:

Warning: if it's not an int, it will coerce the value in cache to an

int but will not fail.

However, later you coerce the value to an int, so if it not an int it *will* fail

2) I do not understand why in _incrdecr you store the key and the val in the deque, since these are never used. The same in _set.

3) In lineReceived you call int(lenght) two times when handling the VALUE response (ok, not a real problem).

4) In lineReceived you do a lot of strings comparisons. Maybe a better implementation is to do:

resp = line.split() cmd = resp[0] args = resp[1:]

fn = getattr(self, cmd) fn(*args)

Unfortunately the incr/decr response line is "not standard" (there is no response name, but only the returned value).

Regards

in reply to: ↑ 11 ; follow-up: ↓ 14   2007-03-19 10:37:38+00:00 changed by therve

Replying to synapsis:

Here is my suggestions: 1) Write a test memcached server, to be used in the test case.

I don't think that's so useful. Do you have a problem that make you think it'll help?

2) Remove the serialization layer from the base class. This layer should be added in a derived class.

OK I removed it. I'll see later where add it.

I do not understand why there is the de-serialization code in rawDataReceived, instead of being in the get method

I don't remember :). It was probably simpler this way when I wrote it. rawDataReceived is only called in the get case, so it should be the same.

3) The serialization layer has no support for Unicode strings

Well, it had, with marshal.

4) The interface to memcached is too high level. The set commands does not allow the user to specify the flag (because it is used by the serialization layer).

OK added.

5) The get command, in the memcached protocol, can accept more than one keys (I do not see how this can be useful, however).

Yes, I intentionnaly left that over. I added a comment for that.

6) If this is possible, I would try to add an higher level sendCmd method. with a Cmd class where to store what you need (now you use a simple list).

For now I leave it this way. I understand your concerns, it's just that I don't know how to do that. And the protocol is relatively simple...

7) When the memcached server returns an error, you should call errback on the deferred

Done.

8) In the get method you should store the key in the deque, so that you can do an assert when the data is received to check that you got what you have asked

There's another problem in the current logic if I do that. I'll look at this.

8) Once you fix 7), there is no more need for self._testKey(key). _set should not raise an exception

Well, it's better to check problems early. If you send a 10Mb key to be said there's an error, you want to know it early :)

1) In the docstring for incr (and decr), you wrote:

Corrected.

2) I do not understand why in _incrdecr you store the key and the val in the deque, since these are never used. The same in _set.

I thought that could be useful. It probably can be removed.

3) In lineReceived you call int(lenght) two times when handling the VALUE response (ok, not a real problem).

Corrected.

4) In lineReceived you do a lot of strings comparisons.

Yes, it's a problem, but the protocol is bad for that. There are incr/decr, but also 'NOT STORED' and 'NOT FOUND' responses. There is probably something better to do, I just don't have the good solutation right now.

Thanks a lot for the extensive review!

in reply to: ↑ 13 ; follow-up: ↓ 15   2007-03-19 12:29:48+00:00 changed by synapsis

Replying to therve:

Replying to synapsis:

Here is my suggestions: 1) Write a test memcached server, to be used in the test case.

I don't think that's so useful. Do you have a problem that make you think it'll help?

No, but it can be useful because you can exchange a fake server with a real memcached server.

2) Remove the serialization layer from the base class. This layer should be added in a derived class.

OK I removed it. I'll see later where add it.

I do not understand why there is the de-serialization code in rawDataReceived, instead of being in the get method

I don't remember :). It was probably simpler this way when I wrote it. rawDataReceived is only called in the get case, so it should be the same.

You are right. rawDataReceived seems to be the best place. However since get only supports one key, the de-serialization can be done in get.

3) The serialization layer has no support for Unicode strings

Well, it had, with marshal.

Ok, but it is not a direct support and marshal adds some extra bytes.

4) The interface to memcached is too high level. The set commands does not allow the user to specify the flag (because it is used by the serialization layer).

OK added.

5) The get command, in the memcached protocol, can accept more than one keys (I do not see how this can be useful, however).

Yes, I intentionnaly left that over. I added a comment for that.

6) If this is possible, I would try to add an higher level sendCmd method. with a Cmd class where to store what you need (now you use a simple list).

For now I leave it this way. I understand your concerns, it's just that I don't know how to do that. And the protocol is relatively simple...

I will try to write something.

7) When the memcached server returns an error, you should call errback on the deferred

Done.

8) In the get method you should store the key in the deque, so that you can do an assert when the data is received to check that you got what you have asked

There's another problem in the current logic if I do that. I'll look at this.

Where is the problem?

8) Once you fix 7), there is no more need for self._testKey(key). _set should not raise an exception

Well, it's better to check problems early. If you send a 10Mb key to be said there's an error, you want to know it early :)

Ok, but I'm not sure if raise is the best solution. Maybe an errback?

1) In the docstring for incr (and decr), you wrote:

Corrected.

Only in incr.

2) I do not understand why in _incrdecr you store the key and the val in the deque, since these are never used. The same in _set.

I thought that could be useful. It probably can be removed.

3) In lineReceived you call int(lenght) two times when handling the VALUE response (ok, not a real problem).

Corrected.

4) In lineReceived you do a lot of strings comparisons.

Yes, it's a problem, but the protocol is bad for that. There are incr/decr, but also 'NOT STORED' and 'NOT FOUND' responses. There is probably something better to do, I just don't have the good solutation right now.

Again, you are right. The memcached protocol is really bad. Why not NOT_FOUND instead of NOT FOUND (as with CLIENT_ERROR)?

Thanks a lot for the extensive review!

in reply to: ↑ 14 ; follow-up: ↓ 16   2007-03-19 14:01:31+00:00 changed by therve

Replying to synapsis:

There's another problem in the current logic if I do that. I'll look at this.

Where is the problem?

I use the length of the list to differentiate get and stat, so I have to find a new solution.

Ok, but I'm not sure if raise is the best solution. Maybe an errback?

OK, I used a defer.fail.

1) In the docstring for incr (and decr), you wrote:

Corrected.

Only in incr.

Damn. Done in decr.

in reply to: ↑ 15   2007-03-19 15:10:57+00:00 changed by synapsis

Replying to therve:

Replying to synapsis:

There's another problem in the current logic if I do that. I'll look at this.

Where is the problem?

I use the length of the list to differentiate get and stat, so I have to find a new solution.

You can put the command name in the deque. Using a class:

class Cmd(object):

def init(cmd, value, key=None):

self.cmd = cmd # The name of the command self.value = value # The full command string self.key = key # The key, if any

self.deferred = defer.Deferred()

Each method like get or set just have to build the Cmd object. Then a sendCmd method can setup the deque and the timeout.

...

  2007-04-02 12:13:28+00:00 changed by itamarst

  • cc changed from manlio.perillo@gmail.com to manlio.perillo@gmail.com, itamarst

Using marshal for serialization is a bad idea, IMO, since it's not guaranteed to work across Python versions. If someone really really needs something faster than cPickle they can use some 3rd party serialization library.

  2007-04-02 12:20:48+00:00 changed by therve

But marshal is so fast and secure against cpickle... I can default to version 0, it will be compatible with 2.3/2.4/2.5 ?

  2007-04-02 12:46:50+00:00 changed by itamarst

Why do you think marshal is any more secure? E.g. this bug is apparently still not fixed: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1122301&group_id=5470

Also, I've seen issues where new python couldn't load old .pyc, so compatibility is limited.

  2007-04-02 12:56:11+00:00 changed by therve

I never get hit by this bug. But I had a lot of cPickle segfaults :).

I guess I could remove marshal serialization, and instead add a way to plug another serializer. I'll try that.

follow-up: ↓ 22   2007-04-03 08:34:42+00:00 changed by antoine

Taking a quick look at changeset [19939], I don't understand why there's such an asymmetry between cPickle and marshal there.

  • First, if marshal is useful, why not include the marshal-based memcache protocol in Twisted, rather than in the test cases?
  • Second, the way to use a custom serializer seems a bit strange, because you both have to override customLoads and customDumps and change a flag on the class/instance.
  • Is there any reason why cPickle is hardwired and uses its own flag instead of using the same customLoads/customDumps idiom?

That said, I don't know anything about the memcache protocol, but these design points looked strange to me.

in reply to: ↑ 21   2007-04-03 09:00:05+00:00 changed by therve

Replying to antoine:

Taking a quick look at changeset [19939], I don't understand why there's such an asymmetry between cPickle and marshal there. * First, if marshal is useful, why not include the marshal-based memcache protocol in Twisted, rather than in the test cases?

Marshal is useful, but it's not compatible between different versions of Python. Thus it's risky to encourage developers to use it.

* Second, the way to use a custom serializer seems a bit strange, because you both have to override customLoads and customDumps and change a flag on the class/instance.

It's for performance reason. If I don't do that I have to call customDumps for each object to check if it's overridden or not.

* Is there any reason why cPickle is hardwired and uses its own flag instead of using the same customLoads/customDumps idiom?

Yes, it's for compatibility with python-memcached library.

follow-up: ↓ 24   2007-04-13 16:20:22+00:00 changed by synapsis

Thanks for you works!

Here are the last suggestions:

1) Instead of using a method for testing key length, you can insert the check inline:

MAX_KEY_LENGTH = 255 ... if len(key) > MAX_KEY_LENGTH:

...

2) I'm not sure about the withFlags parameter in the get method

I have attached a module with an high level support for memcache (I'm going to use it in a web application).

The code is still a draft (and without unit tests), but maybe it can be added in the memcache module.

  2007-04-13 16:21:41+00:00 changed by synapsis

  • attachment client.py added

High level support for memcache

  2007-04-13 16:24:30+00:00 changed by synapsis

  • attachment client.2.py added

High level support for memcache

in reply to: ↑ 23 ; follow-up: ↓ 25   2007-04-13 17:58:37+00:00 changed by therve

Replying to synapsis:

Thanks for you works! Here are the last suggestions: 1) Instead of using a method for testing key length, you can insert the check inline: MAX_KEY_LENGTH = 255 ... if len(key) > MAX_KEY_LENGTH: ...

Mokay. I think I used a method because I needed to do it at 2 places.

2) I'm not sure about the withFlags parameter in the get method

I need it to build the serializer.

I have attached a module with an high level support for memcache (I'm going to use it in a web application). The code is still a draft (and without unit tests), but maybe it can be added in the memcache module.

That could be interesting. Of course it must have tests to be accepted :). Support for multiple cache backends would be great too.

in reply to: ↑ 24 ; follow-up: ↓ 26   2007-04-13 20:25:51+00:00 changed by synapsis

Replying to therve:

Replying to synapsis:

Thanks for you works!

...

2) I'm not sure about the withFlags parameter in the get method

I need it to build the serializer.

Yes, but I think that get should always return the tuple (value, flags).

I have attached a module with an high level support for memcache (I'm going to use it in a web application). The code is still a draft (and without unit tests), but maybe it can be added in the memcache module.

That could be interesting. Of course it must have tests to be accepted :).

Of course.

I'm hoping for a review; I'll add a proper patch later.

Support for multiple cache backends would be great too.

It should not be so hard to add a connection pool (N connection to the same backend). However I do not think this is really needed.

I'm not sure that having N connections to N different backends is a great idea.

in reply to: ↑ 25 ; follow-up: ↓ 27   2007-04-14 07:01:17+00:00 changed by therve

Replying to synapsis:

Yes, but I think that get should always return the tuple (value, flags).

That seems good, done.

It should not be so hard to add a connection pool (N connection to the same backend). However I do not think this is really needed.

No that's not what I thought.

I'm not sure that having N connections to N different backends is a great idea.

Well, with this you can do replications, you're not tied up to 4GB (if you use 32 bits), etc...

in reply to: ↑ 26 ; follow-up: ↓ 28   2007-04-14 09:09:51+00:00 changed by synapsis

Replying to therve:

I'm not sure that having N connections to N different backends is a great idea.

Well, with this you can do replications, you're not tied up to 4GB (if you use 32 bits), etc...

I'm not an expert in caching problems, however I think that memcached should be used only as a cache and not as a database.

I you want to use more backend, you should use different memcached clients.

If we add support for a connection pool to N different backend, how should we choose a backend for a request?

The simple answer is "at random", and this means that the N backends will end up containing the same data.

in reply to: ↑ 27 ; follow-up: ↓ 29   2007-04-14 19:15:04+00:00 changed by therve

Replying to synapsis:

I'm not an expert in caching problems, however I think that memcached should be used only as a cache and not as a database.

True. That's not what I meant though :).

I you want to use more backend, you should use different memcached clients. If we add support for a connection pool to N different backend, how should we choose a backend for a request? The simple answer is "at random", and this means that the N backends will end up containing the same data.

There are more deterministic way to do that. For example search for libketama. Also, look here http://www.danga.com/memcached/apis.bml :

Be aware that the most important part of the client is the hashing across
multiple servers, based on the key, or an optional caller-provided
hashing value

  2007-04-20 19:20:29+00:00 changed by synapsis

  • attachment pool.py added

High level support for memcache with a connection pool and a deterministic backend resolver

in reply to: ↑ 28   2007-04-20 19:25:07+00:00 changed by synapsis

Replying to therve:

I you want to use more backend, you should use different memcached clients. If we add support for a connection pool to N different backend, how should we choose a backend for a request? The simple answer is "at random", and this means that the N backends will end up containing the same data.

There are more deterministic way to do that. For example search for libketama. Also, look here http://www.danga.com/memcached/apis.bml :

I have added a draft implementation for a connection pool.

The connection pool uses a pluggable resolver. A resolver associate a key with a memcache backend address.

The default resolver uses an implementation like the one used by Django for URL resolver.

  2007-10-19 09:20:10+00:00 changed by ralphm

  • cc changed from manlio.perillo@gmail.com, itamarst to manlio.perillo@gmail.com, itamarst, ralphm

Reading the above it seems to me that the review of the original work on the memcached client implementation was as good as done and then the discussion went off on a tangent about connection pools. Could we defer that to another ticket and have the basics wrapped up and merged ?

  2007-10-19 10:03:34+00:00 changed by therve

  • keywords set to review
  • owner changed from therve to ralphm
  • priority changed from low to highest

The code hasn't really passed the review process. Are you willing to review it? The pool functionality is committed, so it could be good to review it too. If it's too soon, we can merge the first part and open a ticket for it.

  2007-11-06 22:32:19+00:00 changed by exarkun

  • keywords deleted
  • owner changed from ralphm to therve
  • MemCacheProtocol.connectionMade has an untested line. The overall utility of this method seems questionable to me. I'd just drop it.
  • lineReceived could definitely be improved. There are two cases to consider, I think. A line begins with a command which is a sequence of non-space bytes followed by a space. If you split on space and the first token is in the set VALUE, STAT, VERSION, CLIENT_ERROR, or SERVER_ERROR, then you can dispatch to a method (for example, try getattr(self, 'cmd_' + token) and call it if it works). If that fails, then there's the second case: the line is entirely a command. For this, you can replace space with _ and then try a similar getattr call. Then all the cases can be implemented as separate methods, which I suppose is a good thing (for clarity if nothing else).
  • MemCacheClient doesn't seem to provide much utility. I'd drop it and let people make their own client factories.
  • From reading the discussion, I think MemCacheSerializeProtocol intends to be compatible with another Python memcache client library. This should be reflected in the code somewhere. Unless it's not true. :) Is it?
  • MemCacheSerializeClient seems misplaced. It's implementing a particular policy, but it's not clear why I would want that policy, nor why it is particularly appropriate for memcache rather than for client protocols in general. Also it's untested.
  • Similar comment for MemCacheAdvancedClient, although I can sort of see how I might use this class. It seems like a bad API though - I wouldn't want arbitrary individual calls to be able to fail like this, I'll have to litter my code with reconnection logic.
  • Similar comment for CacheWrapper.
  • The interactive test code at the end of the module should go.

It does seem like splitting off everything that isn't core memcache protocol functionality should be separated out somehow. I'm not sure it makes sense to have it in twisted.protocols.memcache (although there isn't a convenient twisted.someplaceelse for it either), but considering that separately from the protocol code does make sense.

  2007-11-07 09:37:51+00:00 changed by therve

(In [21619]) Process (some) review comments.

Refs #2506

  2007-11-07 09:46:46+00:00 changed by therve

(In [21620]) Remove dubious stuff, some cleanups.

Refs #2506

  2007-11-07 09:54:46+00:00 changed by therve

  • keywords set to review
  • owner deleted

OK, I dropped useless things, and just keep the protocol. We'll see later what we can add. Back to review.

  2007-11-07 17:14:56+00:00 changed by exarkun

  • keywords deleted
  • owner set to therve

The test docstrings could use some work. For example, instead of Test a standard get call. I would say L{memcache.MemCacheProtocol.get} should return a L{Deferred} which is called back with the value associated with the given key if the server returns a successful result.

Is test_bigSet testing a particular code path or potential bug? If not, is it really a different test from test_strSet?

test_timeOut and test_tooLongKey don't return the Deferreds they need to (maybe you fixed this already :).

test_unicodeKey seems weird. I don't think that behavior is worth asserting about, particularly since if you pass u"foo" you'll get a TypeError? instead (I think).

MemCacheProtocol.rawDataReceived disables the timeout even if the entire result hasn't been received yet.

cmd_VALUE has an assert in it, it shouldn't. If this is an unrecoverable error, there should be an exception for that, and it should be tested.

There's no test coverage for the success case for pipelining (although test_timeOut provides a little coverage of the timeout case for pipelining).

  2007-11-07 21:24:55+00:00 changed by therve

(In [21632]) Fix timeout management, docs.

Refs #2506

  2007-11-07 22:10:25+00:00 changed by therve

(In [21634]) Doc for tests.

Refs #2506

  2007-11-08 10:48:17+00:00 changed by therve

(In [21646]) Documentation.

Refs #2506

  2007-11-08 10:55:03+00:00 changed by therve

(In [21647]) Add tests for pipeling support.

Refs #2506

  2007-11-08 10:55:47+00:00 changed by therve

  • keywords set to review
  • owner deleted

OK that should be better, back to review.

  2007-11-08 22:59:38+00:00 changed by exarkun

  • keywords deleted
  • owner set to therve
  • Some docstring issues:
    • test_errorDelete has a thinko in its docstring - ... a B{NOT FOUND} answer with should callback ....
    • spelling error in test_timeOutRaw docstring - resetted, should just be reset.
  • Some timeout issues:
    • a more idiomatic usage of TimeoutMixin? would probably be to set the timeoutPeriod attribute in __init__, then call resetTimeout with no arguments elsewhere. It may not really make any difference, so if you don't want to change it, that's fine.
    • in lineReceived, the code calls setTimeout(None), which cancels the current timeout, but will schedule a new timeout. This doesn't seem like desirable behavior, since it will leave a timeout counting down even if there are no requests pending. However...
    • when you fix this, be careful to address the pipelining case as well; a response to a first request shouldn't prevent a second request from timing out.
    • speaking of which, there's no test coverage for timeouts in the pipelined case.
    • also the behavior of the pipelined timeout case should have some documentation, I think. if I:
      • set the timeout to 2*N seconds
      • send a request
      • wait N seconds
      • send a request
      • wait N seconds

will the first request have timed out if it hasn't been responded to yet?

  • it might be simplest to put the setTimeout/resetTimeout call into dataReceived instead of having a call in each of rawDataReceived and lineReceived.

Some protocol methods are named with complete words - eg, flushAll or version; others are abbreviated, eg incr and decr. This seems to follow the wire format of the protocol. It would probably be friendly to programmers, particularly those who are not familiar with the wire protocol, if all the methods were full words, regardless of the underlying format.

  2007-11-09 10:25:18+00:00 changed by therve

(In [21672]) Process review comments.

Refs #2506

  2007-11-09 10:26:51+00:00 changed by therve

  • keywords set to review
  • owner deleted

Thanks a lot for your explanations on timeouts, I was a bit careless when I did this, it should be better now.

Back to review.

  2007-11-12 21:17:32+00:00 changed by exarkun

  • keywords deleted
  • owner set to therve
  • branch deleted
  • The unicode handling still seems a bit unusual to me (sorry I didn't mention this again in my previous review). It doesn't seem to me as though proto.get(u"foo") should behave differently from proto.get(u"fóó"). This might be called a "data driven" failure, where different values, even of the same type, cause different behaviors (in one case, success, in another, an error). Consider an American developer who never encounters non-ASCII during development, but writes code which uses unicode strings (because that's how they come out of his database API, perhaps). When he sends his code to someone who actually has some non-ASCII data, it will suddenly fail. Not sure if this is a scenario you're already aware of and have already considered or not, I wanted to point it out just in case it isn't and you also think this would be a bad behavior to offer. Another point related to this part of the code, but not necessarily unicode, is that key = str(key) means get and set will accept, for example, tuples of integers as keys. Not sure if this is desirable. :)
  • DEFAULT_PORT isn't in __all__ but the module docstring shows it being imported, and nothing else in the module refers to it.
  • The exception classes also aren't in __all__.
  • I suspect MemCacheProtocol? should be classic instead of new-style
  • MemCacheProtocol? should document its private instance attributes too
  • the TimeoutError? timeoutConnection creates should be created with some argument so that a developer could have some hope of knowing where it came from.
  • I deleted the + 2 from the if statement on line 161 and none of the tests failed.
  • memcached seems to misimplement increment and decrement, but I guess that is not MemCacheProtocol?'s fault :)

  2007-11-13 11:00:02+00:00 changed by therve

(In [21726]) Process review comments.

Refs #2506

  2007-11-13 11:06:04+00:00 changed by therve

  • keywords set to review
  • owner deleted
  • branch set to branches/memcached-2506

So I enforced the type of key to be str, it totally makes sense. I've also done the other points, back to review.

  2007-11-14 22:47:38+00:00 changed by jml

  • owner set to exarkun

Makes sense for exarkun to review this, as he has the most context.

follow-up: ↓ 51   2007-11-19 15:35:09+00:00 changed by exarkun

  • keywords deleted
  • owner changed from exarkun to therve
  • author deleted

The documentation for persistentTimeOut in MemCacheProtocol's docstring is missing a word: the timeout used to wait a response

timeoutConnection's TimeoutError still doesn't have a message.

        self._lenExpected = 0
        self._getBuffer = []
        self._bufferLength = 0

is duplicated between MemCacheProtocol.__init__ and MemCacheProtocol.rawDataReceived, but as far as I can tell neither is necessary - cmd_VALUE will set up the state whenever it is necessary. It's not particularly bad to do this extra stuff, but maybe it'd be better just to set them to None and let cmd_VALUE set up the real state.

Three cmd_ methods do this:

self._current.popleft().success(True)

and two do this:

self._current.popleft().success(False)

I'm tempted to suggest that some refactoring might be nice (eg, cmd_DELETED = cmd_STORED or cmd_DELETED = makeSimpleSuccess("cmd_DELETED docstring")) but maybe there's not really any point.

The various methods inconsistently enforce the str requirement:

  • set requires the key be a str, but for the value takes any object for which str() will return a string.
  • delete will take anything for which "%s".__mod__ returns a str and will fail with whatever the underlying transport.write fails with otherwise

It might be worth being consistent with ClientError for these cases as well. It's almost entirely consistency that I'm thinking about here, I don't particularly think all these checks need to happen, but if set fails with ClientError in a case, delete probably should too.

It might be worth thinking about whether these type-check failures need to be Deferreds, too. They should always be immediately determinable, and generally will indicate a bug in the calling code. On the other hand, even if it's a bug, there's some value in only presenting one error path.

This review looks kinda long but I don't think it actually is, this code looks pretty close to ready to merge.

  2007-11-19 16:11:24+00:00 changed by therve

(In [21878]) Process (most) review comments.

Refs #2506

in reply to: ↑ 49   2007-11-19 16:20:09+00:00 changed by therve

  • keywords set to review
  • owner deleted
  • author set to therve

Replying to exarkun:

timeoutConnection's TimeoutError still doesn't have a message.

Sorry about that, done.

I'm tempted to suggest that some refactoring might be nice (eg, cmd_DELETED = cmd_STORED or cmd_DELETED = makeSimpleSuccess("cmd_DELETED docstring")) but maybe there's not really any point.

I don't think it's justified here. Maybe I could find a way to make it without docstring, but after writing them it's a pity.

The various methods inconsistently enforce the str requirement:

Yes, I'm very tempted to remove any restriction, consenting adult etc...

It might be worth thinking about whether these type-check failures need to be Deferreds, too. They should always be immediately determinable, and generally will indicate a bug in the calling code. On the other hand, even if it's a bug, there's some value in only presenting one error path.

I prefer the failure path, because it's easier to manage, especially if there are many levels of indirections.

  2007-11-20 16:17:57+00:00 changed by exarkun

  • keywords deleted
  • owner set to therve

Looks good to me.

  2007-11-20 16:25:17+00:00 changed by therve

  • status changed from new to closed
  • resolution set to fixed

(In [21889]) Merge memcached-2506

Author: therve Reviewers: exarkun, synapsis, itamarst Fixes #2506

Add a memcache client protocol implementation.

Note: See TracTickets for help on using tickets.