Computers are strange places

Even ignoring FireFox's insanely slow startup speed, FireFox exhibits a behavior that on good days is amusing; on bad days annoying.

The address bar autocomplete with all of my bizilion bookmarks in its sqlite database is slower than its web based search bar autocomplete. Comparing FireFox to Chrome: the perceived difference in speed between their address bars ( Chrome's address bar uses Googgle search ) is even more stunning. Chrome autocomplete feels near instant, while FireFox sometimes goes so slowly i sometimes literally give up and do something else while waiting for it to finish thinking. ( go boot Chrome, for instance. )

Some of this is basic software design...

It's always a bad idea to do synchronous tasks while a user provides input -- nothing should interrupt my typing waiting to hear back from the autocomplete. Most likely, a separate thread or process should be off running the local db search, and either the input thread or a display thread should be polling for results, handling them as they come in ( perhaps with quick tail filter to throw out results that no longer match my current input ).

That said: while of course search providers have spent a lot more of time on autocomplete than the Mozilla people have -- no doubt doing typing prediction behind the scenes as well as just simple search -- there's something else going on more than just the algorithmic code design.

It appears faster to send packets out to the internet keypress by keypress -- faster to use a remote machine to lookup what i'm typing -- than it is to spin up my own machine's harddrive, seek out, swap in, and search an sqlite database.

To say that one more time: accessing a remote machine is faster than accessing my local machine.

Maybe someone out there has some numbers about what kind of concrete difference there is -- not just raw speed, but also latency. So much, no doubt, depends on the size of the data set coming back, but it does really illustrate -- to me at least -- why cloud computing is so powerful and why remote applications will continue to grow in scope.

Wherefore art thou VT100?

See full post...

Web Appliances

Just a passing thought but since so many web sites have http based apis for manipulating user data, and since the http and tcp is so ubiquitous: where are my cheap one-off hardware devices that can interface with them?

I'm thinking of standalone devices that are Tiger Electronics style cheap to do simple, relatively input free tasks, like for instance: tell me today's weather.

Sure: it makes more sense if you have a capable semi-mobile processor to let it do multiple configurable tasks ( ie. an IPhone ) -- but still -- just as there are now those little lcd picture frames, i would love a little lcd weather display, something that i can keep plugged into my stereo to stream KEXP, maybe a simple headline or RSS reader, maybe an astronomy picture of the day based picture frame.....

Am I crazy, or are there other simple web devices that would be nice to have around the home?
See full post...

What I've been working on as of late

It's probably about time that I link over to the project that I've been involved with as of late.

everMany is a small Flash gaming company, and I've been working on something called the EMP -- the everMany multi-player server.

It doesn't have much of a public face yet, but there is a news blog -- and there should be more information coming over the next several weeks as the server, the website, and the docs become ready for others to use.

Currently, the EMP is powering a new Diffusion Games Flash game called: Armor Wars -- it's a turn based card game, and the EMP client provides a lobbying framework and all the connection / communication with the EMP server.

At any rate -- that's what I've been up to lately. If you are into Flash games, especially if you like card or strategy games, be sure to check the game out ( it's free to play ).
See full post...

HMAC vs. raw SHA-1

okay- maybe this shouldn't have taken me quite so long to understand, but I've been a little bit confused about the differences between SHA-1 and HMAC.

HMAC employs a cryptographic hashing function (ex. SHA-1) but it wasn't clear to me why the cryptographic hashing function itself wasn't "good enough" -- why couldn't HMAC just be SHA-1.

SHA-1 generates a fixed size output of 20-bytes for an arbitrarily long message; but so does an HMAC when it uses SHA-1. So what's the difference?

Turns out the answer is actually relatively straightforward.

For sake of explanation, assume that you want to declare your undying love to someone you've been dating. You'd love to come up with a beautiful sonnet, but in the end you decide that simply saying "i love you" is enough.

You want the message to arrive intact and unaltered, but you don't care if the contents of message itself are known to the world. Knowing a little about cryptographic hashes: you generate a digest from your message using SHA-1.

That message results in: 'bb7b1901d99e8b26bb91d2debdb7d7f24b3158cf'.

On receipt of the message, your would-be-love recomputes the SHA-1 from the message, compares the computed digest to the sent digest. They match and all seems well.

A sinister rival however has other plans. They intercept your message, and replace the message with another "don't call me anymore", they then generate a brand new digest: 'e267e18f05cb6ea3b10b761bbac21a0f92bb8d0d' and replace your original digest. On receipt your love reads the message in disbelief; quickly calculating the hash to make sure the message hasn't been altered. But the hash itself has been changed so the altered hash matches altered message and chaos ensues.

Things look grim, but you explain to your would-be-love what's happened, and they decide to give you another chance. So that this doesn't happen again you decide to tell your lover from now on, whenever they get a message from you, before computing the hash prepend the text "our secret key.", and you will do the same.

This time that same message generates the digest '8a2c1bfa977478f73dbfab8508bc09360b20b569'

Simply replacing the digest doesn't work anymore. If naive attacker still attempts to use the 'e267e18f...' digest your lover would see that the key + the message doesn't compute. You don't send the key in the message itself, and no one knows your secret key so no one can generate a fake message.

There is however a problem still, and the problem is the reason for the difference between SHA-1 and HMAC.

SHA-1 uses an iterative algorithm. It generates digests by first splitting a message into blocks of 64 bytes and, one after the other, combining those blocks together to generate the 20 byte digest. But, since your message can be of any length, and since SHA by its iterative nature works by computing block after block of 64 bytes there is a problem.

Your rival trying once again to subvert your message could just tack additional data onto your message, and this time use the digest in your message as the seed to generate their own new digest of your message. They don't need your secret key because the key was already embedded the blocks that you built. They can't alter what you've written, but they can add more. Your lack of punctuation has in fact made this even easier.

By simply adding
"but please don't call me anymore" and updating the digest to '725fbcbd1e94d03c2e54b01da3944c6385d17e4d' your love will think the entire message is from you even though only the first part was -- and doubly so because of the secret key.

Good bye romance.

An HMAC fixes this.

The algorithm adds one more layer: essentially it takes the hash of your key + message, prepends the key to that hash, and then re-hashes the result. I say essentially because it actually does one other thing to make things more cryptographically sound. HMAC masks your key during the first -- inner -- hash with a fixed constant. Then on the second -- outer -- hash it masks your key again with a different fixed constant. The masking operations result in a different inner and outer key value, and the entire process effectively seals your message, hides your key, and makes it impossible to tack new data on the end.

According to wikipedia no known message extension attacks have ever been found.

Good luck romance.


See full post...

perforce add with wildcards

i use perforce for my own source control'd backups. it supports two users for free; perfect for personal use.

it does have two drawbacks -- its directory diff tool is *horrible* ( i won't bore you with the gory details but mercurial piped through beyond compare is beautiful and light years ahead. if i didn't already have so much in perforce i *might* consider moving over to that instead. )

the point of this post though is that with perforce, in order to add a directory tree's worth of files, i keep wanting to type:
p4 add ...
but if you do you get the (unhelpful) message:
Can't add filenames with wildcards [@#%*] in them.
perforce rather than adding files recursively thinks you are trying to add a file named "...".

i don't know why this particular use case isn't handled in perforce ( though i can imagine most commands probably operate in the server's namespace so it would take some tiny bit of extra work. )

i'm used to a microsoft system called source depot in which a similar syntax does work. the command is therefore hard to unlearn. i find myself re-inventing the right command whenever i start a new sub/project.

at any rate: as a public service ;) and for future reference: on windows the equivalent is:
for /R [dir] %i in (*.*) do p4 add %i
where, optionally, "dir" is the base of the files to add.
See full post...

more quick thoughts on encoding and decoding...

looking a bit at a protocol buffer actionscript implementation and google's python implementation -- it's interesting to note that they both first generate native descriptions of the classes; then they provide generic code that walks those descriptions to de/serialize buffers.

probably no one gave it much thought ( likely the c implementation is the same ) but i wonder why they went that route.

alternatively: they could have just generated the de/serialization code directly. for python and actionscript that could be especially interesting, because you wouldn't even need the class header. the serialize could just build the class by setting attributes dynamically. i'm not convinced that a standalone description would be smaller than the generated code, and i'd bet that generated code would be much faster.

interesting side note: looks like there are two projects for actionscript on google code: the one started by an adobe evangelist has no code; the other seems to work and was the inspiration for this post; is that an indication that adobe needs be more aware of what else is going on out in the open source world?
See full post...

python find item in list

Python reminds me frequently of zork: "You are in a maze of twisty passages; You are in a twisty maze of passages."

For instance, you'd think that python would have a built-in find first item in a list function, but it doesn't. Simple searches, python tutorials, and python's own docs don't help point towards a good solution to this basic problem.

It turns out, however, that you can use a generator expression to build a succinct, speedy search. Generator expressions are one of the wonderfully useful, but hidden avenues in python -- and the technique could use a bit more advertisement.

First, take a quick look at why some of the more common search techniques fall short.

list.index(), for instance, only works for items that have a pre-defined equality function ( primitive types or custom classes )

list comprehensions ( ex. [ i for in list if ... ] ) are great but comprehensions touch every member of a list; so, not so speedy. filter() likewise has the same issue.

A google search turned up an old post: http://tomayko.com/writings/cleanest-python-find-in-list-function that people seem to reference frequently, but for me, having to define or import a function defeats the desire for simplicity.

The generator expression, however, is short and to the point:
(i for i in list if ... ).next()
Essentially, the bits in parens creates an iterator that yields control every time the if-statement succeeds; next() asks that iterator for a single value.

I wish I could claim credit for the solution, but a link from a link from a link turned up a tip from a commenter on another blog. Which proves once again, that without the web I couldn't write good software for the web. (hrmm.... )

See full post...