why, mozilla, why?

this is just great:

ctrl-shift-x in firefox3 right-justifies edit-box text ( and text in your address bar! )

apparently some people do *like* this feature ( makes sending emails in hebrew easier ) but it's horribly close to "cut" ( ctrl-x ) especially if you are selecting text at the time ( via "shift" ).

even better: this isn't a documented shortcut and there isn't ( apparently ) anyway to disable it.
See full post...

Backwards Encoding

( ported and cleaned from www.ionous.net )

I have an flash app that periodically posts data to a server and the server occasionally responds to the app in kind. Rather then send the data as raw text, I'm using AMF to encode the messages the app and the server send each other. For various reasons I've been considering porting over to Google's protocol buffers -- but one notable issue with both solutions: they implement encoding and decoding backwards.

Some History

Back in the day, one of the keys to fast 3D rendering was reducing the number of data copies that your game had to perform. If, for instance, you had an interface that let you compose polygons on the fly ( as OpenGL does ) , you were sunk if you first built up a buffer, and then copied that buffer over to the a real final buffer in order to render. The same goes for any sort of software manipulation of things like character bones: if you can do your operations "in place" generally that will be faster than if you do them on a temp buffer and copy the results over to a final to be rendered.

That seems pretty straight forward but designing and implementing an interface that can stream well can be difficult in practice.

Protocol Buffers

Back to the present; take a brief look at Google's protocol buffer interface and how it's used.

You define a message structure, use the protocol buffer compiler to auto-generate a native class ( in the examples below: Python classes ), you then can write an instance of that class to a string of bytes and later "reconstitute" the instance from that string of bytes.

The main methods on the auto-generated protocol buffer class that you use are:
SerializeToString(): serializes the message and returns it as a string.
ParseFromString(data): parses a message from the given string.

And here, for example, is a snippet from their Python "Address Book" tutorial that loads an address book from disk:
f = open(sys.argv[1], "rb")
address_book= AddressBook()

Since f.read() pulls the entire file into memory, you are first copying your address book data from disk into memory, and then the auto-generated AddressBook class parses ( and copies) that memory into a new address_book instance.

Saving the address book to disk works in a similar way:
f = open(sys.argv[1], "wb")

The protocol buffer instance "address_book" first generates a string from your data and then you have to write that string to out to disk.

The Hidden Copy

For both load and save there's an additional hidden copy that you'd likely hit.

If you were really writing an address book application, it's likely you would have existing structures in your code to store the address. It's unlikely that you'd want to replace all of your data by the auto-generated protocol buffer structures. It would probably require a lot of work on your part to convert, test, and debug your code, but it would also couple your code directly to the Google protocol buffer classes -- bad form for many reasons.

What that means then, is that for saving address book data you are going to be required to write a shim that creates AddressBook instances, copies your in-memory data to those instances, then writes those instances to a string, and finally writes that string to disk. Loading address book data is similarly going to have to go from disk into memory, from memory into a protocol buffer instance, and from the instance into your own custom data structures.

The protocol buffer API is nice and simple, but that simplicity means that you have incurred multiple copies of your data as it moves from place to place.


Take a look at another great piece of software: memcached. The Python API for reading and writing data to the cache works exactly the same way that the protocol buffer API works.

First, you compose your data and pass it to the memcache API, memcache then encodes a command packet and copies your data into that packet.

If you were writing multiple key/value pairs, the pseudo-code might looks something like:
# compose data
out= { 'key1': value1, 'key2': value2 }

# pass to api
cache.set_multi( out )

# encode
for k,v in out.iterkeys():
write_key( k )
write_value( v )

These interfaces make logical sense in a procedural language sense: take this, turn it into that, pass it to the api, let the api do what it needs to do, done. From a practical sense however it's backwards. Decoding and encoding are processes not physical objects.


Here's one interface that does it right: let's pickle an object to a memory file:

f = open(sys.argv[1], "wb")
val= "some python data"
Pickler( f ).dump(val)

Granted the pickler cheats a bit -- often you can store your original data structures directly to disk, you don't have to funnel your data into a separate structure just to save -- but your data is passing straight through.

The pickler is just a process for getting val from memory to the file, not storage in and of itself.

Memcached Streaming

Back to the memcache example, a better look might be:
out= { 'key1': value1, 'key2': value2 }
cache.set_multi( out.iterkeys() )
for k,v in out:
   write_key( k )
   write_value( v )

While, that seems approximately the same in that particular case, it would avoid a copy, and would even allow you to write code like this:
new_employees= [ "bob", "mary", "sarge" ]

def generate_new_employee_data( employees ):
    for e in employees:
       id= uuid.uuid4()
       yield e, id

cache.set_multi( generate_new_employee_data( new_employees ) )

Protocol Buffer Streaming

For something like protocol buffers, it might be possible to have both a simple API that looks and feels just like the current one does -- and a more complex -- lower level -- API that can stream data without copying.

The streaming interface would allow a developer to assign "actions" to members of the protocol buffer structures. Those actions would tell the protocol buffer internals how, for encoding, to write user data structures directly to an output stream; for decoding, to write input data streams into user data structures.

If I have some more time, I'll try to mockup what such an interface might look like.

See full post...

Junk DNA, and the Mersenne Twist

I woke up this morning with what seems like an insight to me.... tying two completely unrelated topics together in a (meaningful to me) way in my head.

A news article about junk DNA that I read earlier this week, and a programming problem that a friend of mine has been working on.

Since this is a programming blog (yikes -- its true) I'll start with the programming problem.

See full post...


One of the things on my mind the last week or so has been: what kind of constructs in Python would allow for easy to use type safety / type checking?

Mix a little keyword parameter usage with a little bit of django-y model definitions and the net result is Bartleby.

See full post...

Source Insight

Search wikipedia for code editors and you won't find Source Insight. Most people out there have heard of SlickEdit, CodeWarrior, etc. but Source Insight? It's rare.

It deserves so much more recognition than it gets, however, because it does several key things better than everything else. Unfortunately (for me) it hasn't kept up over the years with language changes and even small things like Java attributes disturb its order so completely as to make the program unusable.

After somewhere near 12 years of use I've become physically grafted to it. This is great when the editor works but horrible when when it doesn't. Due to some of the problems it has I've been looking for a new editor this last couple of weeks -- but I still can't find one that works the way I want.

In order to prove to myself that it's not simply out of habit that I use Source Insight but that it still actually rocks after all these years... here's my review.

Simple, Beautiful Features

Search results are a normal text window.

Whether you do a "find in this file" or a "find in all files" your results go into a newly created text file. Yup. Not a specialized dockable floatable window with resizable columns or buttons to click on. Just a text window.  How emacs I know.

A few reasons why this works so well:
  • You can have limitless search windows, controllable in a very standard way: just like every other file buffer.
  • You can manually winnow your searches. I can't emphasize enough just how useful it is to search for a call to a function, toggle back and forth between the search window and the actual code, delete matches that aren't what you are looking for, append new searches, rinse repeat. Learning new code? Looking for a bug? Needing to refactor? This is a must have feature.
  • You can save your searches ( they are just text files after all ) and call them up later ( though see "annoyances" below )
The closest to this is SlickEdit which does allow you to manually winnow your search but still insists on having a custom window.

The furthest away is Eclipse. Can I ask: who thought putting search results into a tree view that you have to click through with your mouse was a good idea? It's essentially unusable.

Why doesn't any other professional editor copy this simple feature?

My opinion, is that I think the developers of most editors are focused on individual feature bells and whistles -- they don't see the benefits to the end user when different features use the same code and act in the same way. Source Insight seems to have limited itself to four, possibly five, control / window types -- everything therefore looks the same and operates the same even though you do different tasks with them. The next feature is in this category as well.

Case intelligent syllable-based winnowing.

In every dialog where there's a list of items: files, symbols, etc. you can winnow the list in real time by typing what you are looking for. For instance, in the file dialog: you initially see the list of all files in the project along with an edit box into which you can type. As you type the list of files is reduced to what matches the text you have typed. If you have two files: "Foo.txt" and "Fee.txt" -- if you first type "F" you will still see both files, then if you type, "Fo" you will only see "Foo.txt", if you instead typed "Fx" -- nothing would show up.

Lots of programs do this these days but Source Insight was doing it ten years ago and it still manages to do it better because it didn't stop with simple winnowing. It also uses "syllable matching".

Syllable matching is slightly a misnomer. Basically, SI starts by separating words out of the run-together phrases that programmers use to identify variables and types. MY_GROUP, MyGroup, myGroup, and my_group, all share the three two words. If I were to type either "my" or "group" into the symbol lookup window ( and this works for files and every other list as well ) Source Insight would consider that a match and ensure that, for instance, "MyGroup" would show up in the list of possible symbols.

It goes a tiny bit further than just raw matching though. If you specify a case for the partial words that you type by mixing cases, for instance: "camelCase" or "PascalCase", then it assumes that you are trying to give it more specific information and will limit its search to your matching case pattern. If you are unsure of the case but want to enter multiple syllables you can enter a space. A quick example of what this means, typing: "render object" in the symbol lookup might yield "renderMyObject", "MSG_RENDER_OBJECT", and "ObjectRenderFunction".

Just to explain why this particular feature is one I love: i can't tell you the number of times i can sort of remember the class name of something but not exactly. Or I need to open a file, but can't remember exactly where it is in the file path. If i can remember some piece of the item in question -- i can find it almost instantly.

Could it go further? It'd be great if it prioritized symbols by most likely match, only prioritized case based searches not actually winnowed out differently matched cases, and it'd be awesome if it remembered my recently searched items and ranked those high.


Visual Studio: the command window is a beautiful hidden feature. If you can find the menu option that let's you access it, and if you type the command "of" at the prompt, it will let you open a file using basic winnowing. Matching is case-insensitive and it doesn't do any form of syllable matching, but still much better than clicking endlessly through project file hierarchies. ( I know msdev can be extended with plugins and that there are a couple of really good ones out there, some that give you a special window to open files. ) Eclipse: only begins to winnow after you type something -- so you can't do searches by scrolling through lists.

Slickedit: it appears to do basic syllable matching, but truly horrible and essentially a show stopper for me: winnowing is done as a synchronous operation. If SlickEdit needs to search it stops you from typing. if you have a project with 100,000 symbols it is so amazingly slow and interruptive to use you might as well not.

The Context Window.

This is a dockable window that shows you the definition of whatever symbol your cursor is on, plus has a few lines ( the number depends on how large you make your context window ) on either side of the definition so you can see comments, and other related code.

This is very similar to SlickEdit's preview window. Strangely neither source insight, nor slickedit allow you to copy text from this window. More strangely: slickedit lets you try, makes it seem like you succeeded, but it doesn't actually copy the text out that you want.

SI's implementation has two nice features:

First, you can "lock" the window so no matter what other file you open or symbol you look at the window stays at what you were previewing.

Second, for variables, the window's focus changes over time. If you are looking at a variable, it will first show you the variable's declaration, and then, a second or so later, the declaration of its type. I kind of wish it would keep going up through class hierarchies, etc. but still it's very useful -- you can see lots of information about any given variable without having to leave the location you are editing. By using "lock", if you don't want it to go up to the type, you can stop the window after it shows the variable's declaration.

The context window incredibly helpful when navigating code, and it also provides some of what people look for in auto-completion: what parameters can this function I've just typed take?

SlickEdit has a similar feature, but strangely: you can only search for symbols of text you have highlighted. Want to look up something off the cuff and you are out of luck. ( I'm sure you could write a macro in SlickC to help with this )

Smart Reference Matching.

"Symbol search" allows you to search for all uses of a symbol in a file or project -- many editors have this now, but again, Source Insight has had it for quite a long time, and for several years now has had an excellent extension: Smart Reference Matching.

This extension of symbol search attempts to limit the search to the specific instance of a symbol name that you have highlighted. If you have two classes that both have a method ShouldaNamedItUniquely() then it will attempt to limit its search to particular uses of the class you are interested in. This is amazingly invaluable for projects that really like to use short method names where you can wind up with dozens of classes that have a method called "getName()" or the like. That said it doesn't always work perfectly -- so you have to somewhat intuitively learn when you can trust it and when you can't.

SlickEdit has search by matched color. I find it hard to activate and feel it has limited use because it just says: search for all methods called "getName" as opposed to limiting by symbol context. I think it's generally pretty rare for different types of symbols in a project to use exactly the same case and exactly the same word tense, and I find name duplication across types to be much more rare than within the same type.

Matched Symbol Color Coding.

Source Insight colors text not just based on what you mean the text to be, but whether name you've typed is valid in the context you've used it. This means that without compiling your code you can visually see whether a symbol you've used is valid in the current context or not. If, for instance, you have an instance 'x' of class 'X' with a method 'doStuff()', x.doMyStuff() won't color code, but x.doStuff() will.

This doesn't eliminate syntax all of your syntax errors completely -- but it gets damn close.

That said this feature does have its ups and downs. 80% of the time it works correctly but if you've got files it can't parse ( see deal breakers ) then it will fail.

Many editors have something similar -- but generally you either only get basic detection of intent or you have to compile the project to get the validation.

With only basic detection things that seem like methods to the editor, whether or not the method is a valid member of the class in question get colored as a method. ( SlickEdit is in this category )

Regarding compilation, it seems like, because Java has reflection, most Java editors choose this route. They compile behind the scenes as you type. It's better than SlickEdit -- but, if you've got incomplete classes, or other syntax errors -- a common occurrence when you're initially authoring new code -- you don't get proper color coding. Also it feels, to me, a little slow -- but maybe that's just Java GUI on Windows. Visual Studio also seems to do something like this compilation under the hood for C/++.

Change This Color: As an aside: SI allows you to select a colored piece of text in the editor and, in the context menu, popup the color dialog for that text colorization type. It's a super nice way to tweak the editor.

Color Inactive Code: Also on the matching/ color front for C/++: SI allows you to define project ( and global ) conditional compiler settings -- this lets SI know that the code is inactive. Although it doesn't have code collapsing, it will grey out those blocks, and will ( by default ) ignore those blocks when you search.

File History and Auto Recovery

Every time you save a file Source Insight generates a completely new backup of your file. This has saved me from my own stupidity so many times I can't even say. Even better, it appears to quietly auto-save to the new backup file up to the point when you manually save to your actual file. ( ie. manual save is probably: force auto-save, save-file, generate new-autosave file. ) This has saved me from the rare source insight crash, os crash, and power failure more than a few times.

Intelligent Cursor History

Just as a back button on a browser lets you move to earlier links, Source Insight tracks the history of your cursor, and lets you jump back to earlier locations. Many programs provide some basic implementation of this -- but the way SI's seems to work is that it only creates an internal bookmark if you either jumped there due to a search, edited some code, or hung out at a location for a while. This means minor movements of the cursor are collapsed in the history. Visual Studio, for instance, has a cursor history but i find it unusable because it remembers literally every movement.

Other nice stuff

Global Symbol Lookup. This is similar to Symbol Search, mentioned with Smart Reference Matching above, but instead of finding uses of a symbol, it allows you to jump to the definition of any symbol. This used to be a feature unique to SI but many editors offer this now. Source Insight still bests them due to its syllable matching.

Relation Lookup. Lets you see the hierarchy of classes, lets you choose to look up or down the hierarchy. Can display results in tree-view form or in graphical format. It'd probably has more power than I give it credit for. I generally find it hard to change the relation I'm looking at, so i usually don't bother with the window at all.

Parse Source Links. Using regular expressions you can turn any text file into a bunch of navigable links.

Project Add/Remove Files. Source Insight does this pretty badly. What's amazing to me is that everyone else does this even worse. In SlickEdit, for instance, there is *NO WAY* to add a file that you have open to your project. ( I'm sure you could write this in SlickC, but still )

Add /remove files recursively by wildcard. Nice. Simple. Easy. No one else does this. You either get: recursive, or you get wildcard, but you don't get both simultaneously.

Add any known type. Most programs have sets of wildcards for each type -- Source Insight has that, but it also allows you to automatically add any file type it knows about. That's really nice if you have a combo project made of C, xml, txt, vcprojs, html, etc. Most big projects are of this sort.

Toolbar / Menu / Key customization. All menu and keyboards commands etc. are referred to by a two part hierarchy of names corresponding exactly to the default menu layout, and the names are in plain english. For instance, the command to open a file is "File: Open" because it appears on the "File" menu with the name "Open".

Straight forward enough? Until I used SlickEdit I didn't realize how important this is. In SlickEdit if you want to add something you see on a menu to a key shortcut? Good luck -- you'll have to guess the command name. Same issue for popping up dialog boxes. For instance, to get to the "open file in project" list you have to hunt down the command: "active-files" and that's one of the less obscure commands. As example of the simple reuse of code: you can dump all of the key bindings, menu bindings, etc. to a text file so you can see / sort for commands. Other editors invent custom navigation and search dialogs -- which is nice i suppose -- but spend time on actual functionality not complexity please.

Keyboard oriented but mouse friendly. You can do every operation quickly and easily with the keyboard but you can also use your mouse if you want. Most programs that are heavily keyboard accelerated generally need at least some mouse based interaction to get stuff done. Those that don't ( ex. emacs ) aren't friendly at all if you want to use the mouse. Usually keyboard is the way to go, but sometimes it is just faster to click on what you can see.

Instant on. Ever opened netbeans or eclipse? It takes on the order of a minute. Good luck getting a new idea down quickly. Source Insight opens faster than you can blink even on my highly underpowered laptop ( okay ~10 seconds ). The editor doesn't feel big and bloated. It feels light and transparent. Unfortunately, most of the Java editors I've tried move like thick soup.

The Deal Breakers

User created languages are second class citizens so if Source Insight doesn't fully support a given language: good luck. Two of the worst for me are Java and ActionScript.

Java parsing can't handle @tags.

If you have an @tag, rather then skip the line because it doesn't understand it -- source insight's parser eats the whole file -- nothing in the file registers as a symbol after the tag. Yikes. Even better, because it's a predefined language, you can't create your regex interpretation of the tag to make the parser ignore it.

No action script support.

You can sort of get a good look but you can't make it recognize the weird javascript based variable syntax well enough to truly allow good symbol lookup.

Auto-Completion is terrible

Every time you type the popup symbol / auto completion box disappears and restarts parsing. It's also a little slow ( which is strange given how fast everything else is ). This isn't a deal breaker for me because I don't use auto-completion but I list it in this category because I know several people who can't live without it.

Miscellaneous Annoying Stuff

Some searches can't be saved. If there's both file and line info in displayed in the search then you can save the search and reload it later; if not you can't.

Symbol searches don't show line numbers. I think this is a recently introduced bug -- and it's annoying. It means you can't save symbol searches. Of course since you can't do this in most programs anyway I guess I shouldn't complain.

Bad colors / text sizes out of the box. No predefined palettes to swap between and the default is *crazy*. I've had several friends try Source Insight only to get turned off by how cartoony it initially looks.

"Goto implementation" for constructors never works. It works for destructors and all other methods of a class -- go figure.

Custom regular expressions. SI's syntax is pretty good but it's its own custom flavor. For instance: it doesn't let you do multiline regex -- this can occasionally be a pain when trying to do multi-line search and replace operations.

Cut and copy from SI .txt files to Microsoft products strip line feeds. I've got no idea why, but most ms programs think blocks of text pasted from Source Insight .txt files dont have proper line feeds. If I resave the .txt file as a .cpp it always works.

No extended features. Source Insight is just an editor and nothing else. Sure there's a very small macro language, and sure you can launch command line programs and parse their results. But there are no source control plugins, no package managers, no debuggers, no compilers, no maven like task managers, etc. This is all good, and all bad. It really would be nice if it supported plugins ( and if some existed :). If it did, it'd be nice -- unlike Eclipse and NetBeans -- to initialize them after load, so that the startup / access sequence stays fast.

No built in refactoring. Eclipse and Netbeans offer some pretty impressive auto-refactoring abilities that would be great to see in more products including Source Insight.

Windows only. Sigh. I've already been doing some development on Ubuntu, I'd like to work on a Mac. What will I do?


Can't live without it, and, unfortunately, sometimes can't live with it. I think for C and C++ I'm stuck. It's a great product with features I don't want to give up and that no one else has. But, unless a new round of fixes and language expansions rolls in, for everything else: I've got to move on.

I wish so much more for this great editor and wish others would see it and adopt it's features. However, I'm pretty sure there's just one person who maintains source insight these days -- and the company doesn't charge much: I paid maybe a hundred or so US dollars a few years ago for version 3 and get free updates once a year or so. It's still the same version today but the price is now a little over 200 dollars -- at any rate I suspect these days it's just someone's side job or hobby, and it could possibly be facing death in obscurity.

Maybe someday, if it's indeed on that path, the author(s) could open source the code and it could live a new fresh life again. Hopefully inspiring others with its goodness along the way.
See full post...

Making Lemonade

Let me posit a new law of the land: actor classes should never contain movement logic. Over the years I've followed this as a rule of thumb, but much to my regret, in my python dungeon crawler -- since I didn't quite know where the code was heading nor exactly the implications of using python -- I quietly bowed to the tyranny of the actor that does everything.

Don't go there. It's the dark side....

See full post...

Mu - the path of the unstuck

A couple of truism: Simple choices can have a wide, cascading effect. The design of your game's root classes can effect every other piece of code in game.

I'm porting my python dungeon crawler over to java, and am trying to get it up and running one small piece at a time. For the first time I'm trying out true unit tests; trying to get each piece debugged before setting the whole thing going as one.

I'm currently faced with a deceptively simple choice: should a "Room" class contain a list of pointers to "Actor" objects?

If you feel the answer is: yes, of course Rooms should contain Actors, you probably consider yourself a McCoy programmer: a programmer who favors simple, easy to understand code, that executes quickly.

If you say: good god no, you probably consider yourself a good engineer, a person who creates robust code for the long haul, a purely Hatfield programmer.

See full post...