Fluent SQL

Just a short post this week. 

I've been playing with SQL in a side project of mine and have hit upon a nice interface for building queries based upon fluent expressions. I've seen only just a few fluent expressions ever implemented in C++, and those mostly in research papers, so thought I thought this might be interesting to pass along to others.

Domain Specific Languages

To paraphrase Martin Fowler: domain specific languages (DSLs) define customized programming languages tailored to solving specific problems. A fluent interface expresses a given DSL in an existing object oriented language, leveraging the features of the existing language to enforce the rules of the DSL.

SQL can be considered a domain specific language. It really is only good for one thing: manipulating data stored in relational databases. A fluent interface for SQL implemented in C++ would provides a SQL like syntax using existing features of the C++ language.

What's the motivation for this work? Like any language SQL has a very specific syntax. The following is a SQL valid statement:

   select * from mytable;

It returns all of the data in the table "mytable". On the other hand:

   mytable from * select;

has all the same keywords, but won't do a thing, because the keywords don't appear in any meaningful order.

The Math DSL

While this next part is doubtless obvious, bear with me for a second. One way to think of a programming language is like a very strict state machine. Each new key word puts you into a new state, and from that state you can only move to certain other predefined states. ( Yacc / Bison, in fact, are predicated on this principle: they generate state machines that have the ability to parse languages based on the language's grammar rules. )
Take, for instance, the C++ statements:

   int first;
   int second= 5;
   int third=  5 + 11;   // this works
   int fourth= 5 +  ;    // umm... no.
   int fifth = 5 + "11"; // not this either.

First, you declare the int. From there you can end the line, or you can specify an assignment statement who's right side evaluates to an int. The second statement is the simplest example of that. The third line, however, shows you don't have to immediately end the line, you can continue the expression with an operator followed by another integer specification.

The fourth and fifth lines, of course, won't compile. The fourth has terminated the line from a state where termination isn't allowed. The fifth, on the other hand, has tried to specify a string constant (char[3]) where something that evaluates to an int was expected.

This is, in one sense, a math DSL embedded straight into C++. Once you start a math statement, you're stuck in math land until the line has terminated. No string operations for you.

The SQL DSL

Using SQLite you can query a locally stored database with the simple 'exec' command.

   sqlite3_exec(databaseHandle, sqlStatement, callback, context, &errorString);

The SQL statement passed into the exec function is just a raw character string. Because of that, the following two C++ statements compile just fine.

   sqlite3_exec(myDb, "select * from mytable", myCallback, 0,0);
   sqlite3_exec(myDb, "mytable from * select", myCallback, 0,0);

The errors with the invalid SQL statement in the second line only show up at run time. If we had to deal with that whenever we did math in C++, I think we'd all go insane.

   evaluate( "add int 5, 1", &third);
   evaluate( "add int 5, to", &fifth);  // oops.

To use SQL well it'd be worthwhile, then, to try leverage the same sorts of rules as the math DSL: let each new partial C++ statement box you into a place from where you can only move to a new valid statement.

In this manner, rather than writing the select command as a string, ideally you'd be able to use a select function call. Once that function has been called, you would want to return from it in a state that only allows you to specify things that the SQL grammar allows at that point. In my implementation, I wanted to make sure my SELECT calls were followed only by FROM calls and nothing else, my first attempt looked something like this:

class SQL {
public:
    // the select method
    static TheSelectState SELECT(const std::string & whichColumns)
    {
        return TheSelectState(whichColumns);
    }
    
    // 'SELECT' returns this
    struct TheSelectState
    {
        TheSelectState(const std::string & whichColumns)
        {
            QueryString+= "SELECT " + whichColumns + " ";
        }
        // the from method. callable after 'SELECT'
        TheFromState FROM(const std::string & whichTables)
        {
            return TheFromState( whichTables );
        }
    };
    
    // 'FROM' returns this
    struct TheFromState
    {
        TheFromState (const std::string & whichTables)
        {
            QueryString+= "FROM " + whichTables + " ";
        }
        // query string access. callable after 'FROM'
        operator const char *()
        {
            return QueryString.c_str();
        }
    };
    
private:
    static std::string QueryString;
};

And the code using it looked something like:

   std::string result= SQL::SELECT( "*" ).FROM( "myTable" );

This syntax takes a little while to sink in so let me walk you quickly through it. First, let me show you a roughly equivalent piece of code.

   std::string resultString;
   TheSelectState selectResult= SQL::SELECT( "*" );
   TheFromState fromResult    = selectResult.FROM( "myTable" );
   resultString               = (const char*) fromResult;

The C++ compiler evaluates simple statements left to right, nothing has changed that here. In the first example, the compiler creates a local variable string called "query". It sees the equals operator and expects you to provide something that evaluates to a string.

I started off a sequence to provide that string by calling the static method in my custom SQL class: "SELECT". SELECT expects a string specifying which columns you want to hear back about. In this case, *, all of them.

SELECT, however, does not itself return a string, instead it returns "TheSelectState". And it's just sitting there on the stack waiting for your next command.

The only method this state object exposes is the method "FROM". FROM itself expects a string denoting which tables to select from, and puts the compiler into a new state by returning "TheFromState" object. In this example, TheFromState has no explicit methods you can call, but it does provide access to the internal query string via an implicit cast. On this cast, the line fulfils the necessary string evaluation; the local result variable receives a copy of the internal query string.
Pretty neat if you ask me.

At the very end of this post you will find a better ( but yet still incomplete ) version of the above code. The main difference between the code below and the snippet I gave above, is that it eliminates the static string. Instead, each new state object carries the former string forward into the new state. This eliminates the need for any sort of weird "ClearString" function. The code below also provides a string format helper designed to eliminate the varargs that would otherwise be needed everywhere. ( See "Analysis", below, for an example of what I mean. )

Analysis

Doesn't the performance of this thing suck? Well... yes. I think that in debug mode it's probably anywhere from terrible to awful, especially if, under Visual Studio, you've left the default stl limits checking on. In an optimized build, the compiler probably does a pretty good job of eliminating the temporaries, although for certain there's still a fair bit of memory allocation that's probably going to happen. For the record, I definitely wouldn't recommend using this for tight inner loops in your renderer ( we *are* after all talking sql queries here :)

On the plus side there are two kinds of bugs that it helps with.

First, there's the bug of malformed SQL. Yes, for a while, as you come up with more and more complex SQL statements you want to write, you are going to have to implement some new classes, but after not too long you will have covered all the common basics. For the most part using by using these classes you have made it more difficult to make syntax and conceptual errors in the SQL statements themselves.

Second, there's the bugs of malformed string manipulation. By hiding all the string += or strcat madness needed when composing strings in C++, you've protected yourself against casual memory errors. Further, when you do need string formatting, you've moved the variables you need extremely close to the point at which they are used. Consider the following example, and think of what this would have looked like without the fluent interface.


   db->RunQuery(
   SQL::SELECT("type, name").
   FROM(SQL::Format("%s left join attrs", m_table.c_str()).
   WHERE(SQL::Format("tagref='%s'", m_element.c_str())).
   AND(SQL::Format("idref=%d", m_id).
   AND("implement!='FALSE'") );

One last thing of mention. With the fluent interface, Visual Studio, can almost make your life programming SQL queries in C++ complete. Why in the world it shows the private members I don't know.
fluent completion
A word of warning, though, don't try this in the current version of NetBeans, it's syntax completion doesn't appear to like the concept of inheriting public non-virtual members. Don't ask me why.

Links

You can learn more about Domain Specific Languages and Fluent Interfaces here:

The Code

I consider this public domain so, if you find it useful, have at.


class SQL {
    /**
     All queries are built of commands
     */
    struct Cmd {
        Cmd( Cmd *prev, const char *cmd, const char *param ) :
        m_query(prev ? prev->m_query: "") {
            m_query+= cmd;
            m_query+= " ";
            m_query+= param;
            m_query+= " ";
        }
        std::string m_query;
    };
public:
    struct Format {
        Format(const char *fmt, ...) {
            va_list args;
            va_start (args, fmt);
            _vsnprintf (buffer, FormatSize, fmt, args);
            va_end (args);
        }
        operator const char *() {
            return buffer;
        }
    protected:
        static const int FormatSize= 256;
        char buffer[FormatSize];
    };
    /**
     access to completed commands only valid from a terminal cmd
     */
    struct TerminalCmd: Cmd {
        operator const char *() {
            return m_query.c_str();
        }
    protected:
        TerminalCmd(Cmd *prev, const char *cmd, const char *param ) :
        Cmd(prev,cmd,param) {
        }
    };
    /**
     expressions can appear in several places
     */
    struct Expression: TerminalCmd {
        Expression AND(const char *expr) {
            return Expression(this, "and", expr);
        }
        Expression OR(const char *expr) {
            return Expression(this, "or", expr);
        }
    protected:
        Expression(Cmd *prev, const char *op, const char *expr) :
        TerminalCmd(prev, op, expr ) {
        }
    };
    /**
     limit the number of rows over which the query operates.
     */
    struct Where:Expression {
        Where(Cmd *prev, const char *whichRows) :
        Expression(prev, "where", whichRows) {
        }
    };
    /**
     specify a list of tables
     */
    struct From: TerminalCmd {
        Where WHERE(const char *whichRows) {
            return Where(this,whichRows);
        }
        From(Cmd *prev, const char *whichTables) :
        TerminalCmd(prev, "from", whichTables) {
        }
    };
    /**
     query the datatbase
     */
    struct Select: Cmd {
        From FROM(const char *whichTables) {
            return From(this,whichTables);
        }
        Select(Cmd*prev,const char *whichColumns) :
     Cmd(prev, "select", whichColumns) {
     }
    };
    /**==============================================
     Commands Start Here
     ==============================================*/
    static Select SELECT(const char *whichColumns) {
        return Select(NULL,whichColumns);
    }
};

0 comments: