more quick thoughts on encoding and decoding...

looking a bit at a protocol buffer actionscript implementation and google's python implementation -- it's interesting to note that they both first generate native descriptions of the classes; then they provide generic code that walks those descriptions to de/serialize buffers.

probably no one gave it much thought ( likely the c implementation is the same ) but i wonder why they went that route.

alternatively: they could have just generated the de/serialization code directly. for python and actionscript that could be especially interesting, because you wouldn't even need the class header. the serialize could just build the class by setting attributes dynamically. i'm not convinced that a standalone description would be smaller than the generated code, and i'd bet that generated code would be much faster.

interesting side note: looks like there are two projects for actionscript on google code: the one started by an adobe evangelist has no code; the other seems to work and was the inspiration for this post; is that an indication that adobe needs be more aware of what else is going on out in the open source world?
See full post...

python find item in list

Python reminds me frequently of zork: "You are in a maze of twisty passages; You are in a twisty maze of passages."

For instance, you'd think that python would have a built-in find first item in a list function, but it doesn't. Simple searches, python tutorials, and python's own docs don't help point towards a good solution to this basic problem.

It turns out, however, that you can use a generator expression to build a succinct, speedy search. Generator expressions are one of the wonderfully useful, but hidden avenues in python -- and the technique could use a bit more advertisement.

First, take a quick look at why some of the more common search techniques fall short.

list.index(), for instance, only works for items that have a pre-defined equality function ( primitive types or custom classes )

list comprehensions ( ex. [ i for in list if ... ] ) are great but comprehensions touch every member of a list; so, not so speedy. filter() likewise has the same issue.

A google search turned up an old post: that people seem to reference frequently, but for me, having to define or import a function defeats the desire for simplicity.

The generator expression, however, is short and to the point:
(i for i in list if ... ).next()
Essentially, the bits in parens creates an iterator that yields control every time the if-statement succeeds; next() asks that iterator for a single value.

I wish I could claim credit for the solution, but a link from a link from a link turned up a tip from a commenter on another blog. Which proves once again, that without the web I couldn't write good software for the web. (hrmm.... )

See full post...

Memcached Lockless Queue

I'm working on a project that needs a simple server side lightweight, scalable message queue. I'm interested in Erlang and Yaws, but the learning curve seems to me incredibly high -- not only on the code implementation front, but also the tool and management front. Moreover I have clients -- in the people sense of the word -- who may not be particularly interested in learning a new language, let alone a functional language to boot.

I know there are a huge number of Java based queuing technologies out there but currently Java isn't running on the box, so there's some "management" overhead there. Enter memcached. I'm familar with it from some work I've done with google app engine, and it's already running on the server in question.

Memcached is basically just a big dictionary that can, if you want it to, span multiple machines. New entries in the dictionary can be marked to expire after a given amount of time.

There are some aspects to memcached that I find nice. First, it's dead simple to setup, configure, and manage. Second, it has clients in a large number of languages. Third, if i ever do want to play with erlang -- there's a *server* implementation called cacherl that -- in theory -- works with all existing clients. Finally, on memcached's offical FAQ there's a link to an article about how to turn that dictionary into a queue. :)

It took me a while to understand exactly how what they describe works. It's important to understand that while memcached has atomic inc/dec/set it has no atomic exchange primitive -- that makes it difficult to implement all sorts of data structures, but at any rate here's my pseudo code based on the author's description.

# incr returns the new value of the counter;
# therefore the first thread that gets here sets "writerlock" to 1
while ( incr( "writerlock" ) != 1):
# if not one, then we weren't the first thread
# keep looping till the first thread finishes (below)

msgtowrite= incr( "queuehead" )

# "add" stores the value "newmsg" at the key "msgtowrite"
add( msgtowrite, newmsg )

# let some other writer thread go
set( "writerlock", 0 )

# same lock concept as the writer side
while ( incr( "readerlock" ) != 1):

msgtoread= incr("queuetail")

# optional: use "gets" to get more than one value.
# get will fail if it didnt exist
newmsg= get( msgtoread )

# optional: delete, or just let the msg expire someday.
# let some other reader thread go
set( "readerlock", 0 )

This makes total sense, and yet -- especially after having worked on two multithreaded games this year -- i feel like there must be a lockless way. In fact, I spent a large part of yesterday trying to tell myself -- it just doesn't matter -- use the locking queue, and another part telling myself -- but if lightweight is important than locking is bad. I did *try* to do other things, but really I just kept coming back to this issue again and again.

Finally, this morning the lessons of what I read elsewhere really finally sank in, particularly: Nginx and Memcached, a 400% boost! and Using Nginx, SSI and Memcache ( both highly recommended reads ).

It basically boils down to, let memcache do what memcache does best -- read and write values.

Using the same "message key equals message number" idea from the queue above, what if, on the writer side, we just "incr" the message number and wrote to that number slot. "incr" is atomic so no two messages can conflict.

On the reader side, what if we got the client application invested in the process. Make the reader start requesting at message 0, and have the reader -- not the server -- increment the "to-read" message number. If the message hasn't been written you wont find it -- but, as long as the writer doesn't fail -- you will get the message someday.

addmsg( newmsg ):
msgtowrite= incr("queuehead");
add( msgtowrite, newmsg );

readmsg( nextmsgid ):
return get( nextmsgid );

This nice and simple and has all sorts of cool properties that I was hoping to find:

  • First: it's non blocking; non-locking.
  • Second: you get a clear intention of the desire to write a message before you actually have to generate and write the message.
  • Third: a writer can reserve a sequence of messages by incrementing by the number of msgs it needs.
  • Lastly, although i didn't mention it above, the locking queue needs overflow detection for the lock counters -- since there are no locks, there's no need for overflow detection.

You can even, at the expense of increasing the client's complexity -- have the reader use 'gets' to read several messages at once and the client can track which succeeded and ask only for those messages that it missed. ( Feels like a re-invention of TCP SYNs and ACKs all over again I know. )

See full post...


i'm tired of wordpress spam, and i'm tired of wordpress re-formatting my posts.
some of that might be mitigated by moving to the latest version of wordpress,
but i've decided to give blogger a try for a while instead.

from this point on, regardless of the underlying bits, i will point to the blog. the main site will soon redirect, but the feed may be disrupted temporarily during the shift.

sorry to everyone for the trouble(!)

See full post...