Computers are strange places

Even ignoring FireFox's insanely slow startup speed, FireFox exhibits a behavior that on good days is amusing; on bad days annoying.

The address bar autocomplete with all of my bizilion bookmarks in its sqlite database is slower than its web based search bar autocomplete. Comparing FireFox to Chrome: the perceived difference in speed between their address bars ( Chrome's address bar uses Googgle search ) is even more stunning. Chrome autocomplete feels near instant, while FireFox sometimes goes so slowly i sometimes literally give up and do something else while waiting for it to finish thinking. ( go boot Chrome, for instance. )

Some of this is basic software design...

See full post...

Web Appliances

Just a passing thought but since so many web sites have http based apis for manipulating user data, and since the http and tcp is so ubiquitous: where are my cheap one-off hardware devices that can interface with them?

I'm thinking of standalone devices that are Tiger Electronics style cheap to do simple, relatively input free tasks, like for instance: tell me today's weather.

Sure: it makes more sense if you have a capable semi-mobile processor to let it do multiple configurable tasks ( ie. an IPhone ) -- but still -- just as there are now those little lcd picture frames, i would love a little lcd weather display, something that i can keep plugged into my stereo to stream KEXP, maybe a simple headline or RSS reader, maybe an astronomy picture of the day based picture frame.....

Am I crazy, or are there other simple web devices that would be nice to have around the home?
See full post...

What I've been working on as of late

It's probably about time that I link over to the project that I've been involved with as of late.

everMany is a small Flash gaming company, and I've been working on something called the EMP -- the everMany multi-player server.

It doesn't have much of a public face yet, but there is a news blog -- and there should be more information coming over the next several weeks as the server, the website, and the docs become ready for others to use.

Currently, the EMP is powering a new Diffusion Games Flash game called: Armor Wars -- it's a turn based card game, and the EMP client provides a lobbying framework and all the connection / communication with the EMP server.

At any rate -- that's what I've been up to lately. If you are into Flash games, especially if you like card or strategy games, be sure to check the game out ( it's free to play ).
See full post...

HMAC vs. raw SHA-1

okay- maybe this shouldn't have taken me quite so long to understand, but I've been a little bit confused about the differences between SHA-1 and HMAC.

HMAC employs a cryptographic hashing function (ex. SHA-1) but it wasn't clear to me why the cryptographic hashing function itself wasn't "good enough" -- why couldn't HMAC just be SHA-1.

SHA-1 generates a fixed size output of 20-bytes for an arbitrarily long message; but so does an HMAC when it uses SHA-1. So what's the difference?

Turns out the answer is actually relatively straightforward.
See full post...

perforce add with wildcards

i use perforce for my own source control'd backups. it supports two users for free; perfect for personal use.

it does have two drawbacks -- its directory diff tool is *horrible* ( i won't bore you with the gory details but mercurial piped through beyond compare is beautiful and light years ahead. if i didn't already have so much in perforce i *might* consider moving over to that instead. )

the point of this post though is that with perforce, in order to add a directory tree's worth of files, i keep wanting to type:
p4 add ...
but if you do you get the (unhelpful) message:
Can't add filenames with wildcards [@#%*] in them.
perforce rather than adding files recursively thinks you are trying to add a file named "...".

i don't know why this particular use case isn't handled in perforce ( though i can imagine most commands probably operate in the server's namespace so it would take some tiny bit of extra work. )

i'm used to a microsoft system called source depot in which a similar syntax does work. the command is therefore hard to unlearn. i find myself re-inventing the right command whenever i start a new sub/project.

at any rate: as a public service ;) and for future reference: on windows the equivalent is:
for /R [dir] %i in (*.*) do p4 add %i
where, optionally, "dir" is the base of the files to add.
See full post...

more quick thoughts on encoding and decoding...

looking a bit at a protocol buffer actionscript implementation and google's python implementation -- it's interesting to note that they both first generate native descriptions of the classes; then they provide generic code that walks those descriptions to de/serialize buffers.

probably no one gave it much thought ( likely the c implementation is the same ) but i wonder why they went that route.

alternatively: they could have just generated the de/serialization code directly. for python and actionscript that could be especially interesting, because you wouldn't even need the class header. the serialize could just build the class by setting attributes dynamically. i'm not convinced that a standalone description would be smaller than the generated code, and i'd bet that generated code would be much faster.

interesting side note: looks like there are two projects for actionscript on google code: the one started by an adobe evangelist has no code; the other seems to work and was the inspiration for this post; is that an indication that adobe needs be more aware of what else is going on out in the open source world?
See full post...

python find item in list

Python reminds me frequently of zork: "You are in a maze of twisty passages; You are in a twisty maze of passages."

For instance, you'd think that python would have a built-in find first item in a list function, but it doesn't. Simple searches, python tutorials, and python's own docs don't help point towards a good solution to this basic problem.

It turns out, however, that you can use a generator expression to build a succinct, speedy search. Generator expressions are one of the wonderfully useful, but hidden avenues in python -- and the technique could use a bit more advertisement.

First, take a quick look at why some of the more common search techniques fall short.

list.index(), for instance, only works for items that have a pre-defined equality function ( primitive types or custom classes )

list comprehensions ( ex. [ i for in list if ... ] ) are great but comprehensions touch every member of a list; so, not so speedy. filter() likewise has the same issue.

A google search turned up an old post: http://tomayko.com/writings/cleanest-python-find-in-list-function that people seem to reference frequently, but for me, having to define or import a function defeats the desire for simplicity.

The generator expression, however, is short and to the point:
(i for i in list if ... ).next()
Essentially, the bits in parens creates an iterator that yields control every time the if-statement succeeds; next() asks that iterator for a single value.

I wish I could claim credit for the solution, but a link from a link from a link turned up a tip from a commenter on another blog. Which proves once again, that without the web I couldn't write good software for the web. (hrmm.... )

See full post...

Memcached Lockless Queue

I'm working on a project that needs a simple server side lightweight, scalable message queue. I'm interested in Erlang and Yaws, but the learning curve seems to me incredibly high -- not only on the code implementation front, but also the tool and management front. Moreover I have clients -- in the people sense of the word -- who may not be particularly interested in learning a new language, let alone a functional language to boot.

I know there are a huge number of Java based queuing technologies out there but currently Java isn't running on the box, so there's some "management" overhead there. Enter memcached. I'm familar with it from some work I've done with google app engine, and it's already running on the server in question.

Memcached is basically just a big dictionary that can, if you want it to, span multiple machines. New entries in the dictionary can be marked to expire after a given amount of time.

There are some aspects to memcached that I find nice. First, it's dead simple to setup, configure, and manage. Second, it has clients in a large number of languages. Third, if i ever do want to play with erlang -- there's a *server* implementation called cacherl that -- in theory -- works with all existing clients. Finally, on memcached's offical FAQ there's a link to an article about how to turn that dictionary into a queue. :)

It took me a while to understand exactly how what they describe works. It's important to understand that while memcached has atomic inc/dec/set it has no atomic exchange primitive -- that makes it difficult to implement all sorts of data structures, but at any rate here's my pseudo code based on the author's description.

addmsg(newmsg):
# incr returns the new value of the counter;
# therefore the first thread that gets here sets "writerlock" to 1
while ( incr( "writerlock" ) != 1):
# if not one, then we weren't the first thread
# keep looping till the first thread finishes (below)
pass

msgtowrite= incr( "queuehead" )

# "add" stores the value "newmsg" at the key "msgtowrite"
add( msgtowrite, newmsg )

# let some other writer thread go
set( "writerlock", 0 )

getmsg():
# same lock concept as the writer side
while ( incr( "readerlock" ) != 1):
pass

msgtoread= incr("queuetail")

# optional: use "gets" to get more than one value.
# get will fail if it didnt exist
newmsg= get( msgtoread )

# optional: delete, or just let the msg expire someday.
# let some other reader thread go
set( "readerlock", 0 )

This makes total sense, and yet -- especially after having worked on two multithreaded games this year -- i feel like there must be a lockless way. In fact, I spent a large part of yesterday trying to tell myself -- it just doesn't matter -- use the locking queue, and another part telling myself -- but if lightweight is important than locking is bad. I did *try* to do other things, but really I just kept coming back to this issue again and again.

Finally, this morning the lessons of what I read elsewhere really finally sank in, particularly: Nginx and Memcached, a 400% boost! and Using Nginx, SSI and Memcache ( both highly recommended reads ).

It basically boils down to, let memcache do what memcache does best -- read and write values.

Using the same "message key equals message number" idea from the queue above, what if, on the writer side, we just "incr" the message number and wrote to that number slot. "incr" is atomic so no two messages can conflict.

On the reader side, what if we got the client application invested in the process. Make the reader start requesting at message 0, and have the reader -- not the server -- increment the "to-read" message number. If the message hasn't been written you wont find it -- but, as long as the writer doesn't fail -- you will get the message someday.

addmsg( newmsg ):
msgtowrite= incr("queuehead");
add( msgtowrite, newmsg );

readmsg( nextmsgid ):
return get( nextmsgid );

This nice and simple and has all sorts of cool properties that I was hoping to find:

  • First: it's non blocking; non-locking.
  • Second: you get a clear intention of the desire to write a message before you actually have to generate and write the message.
  • Third: a writer can reserve a sequence of messages by incrementing by the number of msgs it needs.
  • Lastly, although i didn't mention it above, the locking queue needs overflow detection for the lock counters -- since there are no locks, there's no need for overflow detection.

You can even, at the expense of increasing the client's complexity -- have the reader use 'gets' to read several messages at once and the client can track which succeeded and ask only for those messages that it missed. ( Feels like a re-invention of TCP SYNs and ACKs all over again I know. )


See full post...

blogger

i'm tired of wordpress spam, and i'm tired of wordpress re-formatting my posts.
some of that might be mitigated by moving to the latest version of wordpress,
but i've decided to give blogger a try for a while instead.

from this point on, regardless of the underlying bits, i will point dev.ionous.net to the blog. the main site will soon redirect, but the feed may be disrupted temporarily during the shift.

sorry to everyone for the trouble(!)

See full post...