YAML

I love YAML.

It's an easy to understand structural text format with a lot of promise for games. In my book I'm using YAML for all my examples because, even without in-depth explanations on syntax, people can intuitively grasp the meaning of simple YAML documents.

It's eminently grokable.

Yet YAML has a problem, primarily: one of adoption. Work on YAML began in 2001 and the first article that I've found went online in 2002. It has been around for six years, and still gets talked about, but after all this time, few, still, seem to be using it.

My measure of usage comes from a search of the net for articles, books, documentation, mailing list mentions, etc. It's possible that after writing this someone will come to me and show me secret YAML treasure trove, but it does appear I'm not the only one who feels this way. Why isn't YAML used more if it really looks so good?

Trouble in the magic kingdom

While there is no smoking gun I would guess there are two main reasons. The first has to do with a little something called XML, the other has to do with YAML itself.

XML is verbose, finding unmatched tags is obnoxious, and the actual document files may be hard to read without extra help. Even with extra help, I, for one, often find myself scratching my head over what any particular file actually means. Clearly I'm already voicing some of reasons behind a desire for an XML alternative, but here's the rub. The mechanics of XML are dead-simple. Start-tag, random text, matching end-tag, and, oh yeah: attributes. Your mom might not understand why the files look so complicated, but I bet she could understand how it works.

YAML on the other hand is just the opposite. Your mom would probably rate the clean, clear, easy to read file of YAML over XML. She could probably even add YAML data to a YAML file faster, and with less instruction, than she could XML data to an XML file. But, I dare your mom to write a YAML parser.

Here's where you tell me your mom is a computer science professor, or possibly, a programmer for IBM. Or, here's where you tell me you are a mom and where I backpedal by explaining I am talking about a generational divide not a gender divide. Then, you explain you've actually been working in computers since there were punch-cards. And, after we've gone back and forth like that a few times, that's when, had I cold hard cash to spare, I would bet her, you, or him (if per chance you want to offer up your dad) real money.

Failing the mom test

Given real motivation I'd bet you could write a YAML parser from scratch, but, and here's where YAML currently pales in comparison to XML, it won't be easy.

YAML's instructional documentation is too thin; YAML's technical documentation too thick. Neither will help you learn the complete YAML language in any decent amount of time. Further, once you have a basic grasp on YAML in all its various forms, you would still need to learn all of the tiny syntax and whitespace details necessary to create a conforming parser.

Is all this really necessary? Do you really have to be able to write your own YAML parser in order to use YAML in your projects? Isn't it already supported in Java, .NET, Ruby, Python, C, and many others?

My answer: you shouldn't have to, but you should be able to.

After spending a few hours with YAML's documentation you should be able to answer questions core to a parser's implementation.

Questions such as:
  • How exactly are line breaks managed?
  • How can I change the handling of line breaks?
  • What are tags and how can I use them?
  • What's the difference between explicit folded blocks and explicit literal blocks; heck: What's the difference b/t folded blocks and folded newlines?
  • Is it okay to leave out that space after a map key's colon: [ foo:bar ]? If not. Why not?
  • How about before a comment: [ foo:bar ]#comment?
In truth, I think you will spend your first few hours learning YAML just trying to find the relevant documentation. And, in fact, I suspect most people, after not finding what they need, will give up and go back to the safety of XML.

The Documentation

The YAML spec, at 85 pages and 222 BNF productions, is thorough, but incredibly opaque.

From the YAML wiki:
The cornerstone of the YAML project is the YAML Specification. The specification was created with the intent of giving implementors a guide to writing YAML compliant processors. Unfortunately the spec has grown in size, such that it is no longer readily comprehensible to mere mortals. It is even a challenge to decipher by the very folk who created it.
YAML's FAQ and reference card also seem somewhat underwhelming. The faq is just two questions long, and the reference card gives no explanation on what the various terms it uses mean.

YAML's de-facto reference implementation -- Syck -- is also not much help on the documentation front. While you can download the Syck source code, and hunt through the various readmes (count em: four readme files) and source files for information, ultimately you wont gain much insight into the language itself. Syck's lowest layer is generated from Yaac/Lex rules.

The best two resources out there -- the YAML Ruby cookbook and the Ruby YAML docs -- take a while to stumble upon. Even though YAML serialization is part of the standard Ruby distribution I don't believe you can find these resources from the Ruby site. Instead, you need to jump over to Source Forge for them. The first can give you a fairly good insight into the range of YAML's syntax and it can also serve as a good reference when writing YAML docs. The second can help you to understand of how bits of a parser might work. Neither resource, however, is perfect because both require learning a little bit of Ruby and neither covers the complete YAML spec.

YAML

On several fronts YAML needs help.
  • YAML needs clear, concise, documentation for all of its features that someone new to the language can quickly understand.
  • YAML needs a hyper-linked reference providing quick lookups of both indicators and concepts, allowing both newbies and long time users to drill down into areas of ambiguity.
  • YAML needs a substantial faq for end users to guides people through common usage issues.
  • YAML should bless Syck as the implementation gold standard. Baring that blessing it needs some other reference implementation.
  • Syck needs thorough documentation on its site on how it works and how you can use it in your apps. Additionally, Syck should provide information on how closely it sticks to the YAML specification. Where it deviates, if at all, should be clearly spelled out with reasons why.
  • Ideally, YAML would also provide a pseudo code parser, to help interested parties learn how the language (should) work.
  • Nifty keen would be a play space on the YAML site where users can interact with, or input their own, YAML examples. If it doesn't work there: then its not pure YAML.
Given these things, I think new programmers wouldn't be as "scared" of YAML as they might be now. I think tool vendors could create YAML importers and converters. I think game programmers could begin to integrate YAML -- which maps oh so easily to our data structures ( and so much more so than XML in my opinion ) -- into our games.

Just to say it one more time: I love YAML. Given some time, I may just put my money where my mouth is, and pitch in to make the world of YAML a better place. You should too. It'll be well worth the effort.

Head over to my del.icio.us page for some useful links.

0 comments: