Wednesday, 17 June 2015

Fast and Slow Sticky Notes in Haskell and Qt

Last year I gave a talk for the London Haskell User Group on building a Sticky Notes application using Haskell, Qt, and the HsQML binding between them. The video has been out for a while now and made the rounds elsewhere, but I've thus far neglected to post it on my own blog! Slides are here. See below for an update on the example program and an answer to the most commonly asked question after the fact.


Fast and Slow? An update!

For the sakes of simplicity, the original version of the sticky notes program shown in the talk uses an SQLite database to serve the view directly with no in-memory model or caching. This is at least the cause of some sluggishness and at worst pathological performance depending on your machine.

As alluded to in the talk, there is a better albeit more complicated way. This involves moving the slow database accesses onto a separate thread and keeping a fast in-memory model to support the user interface. Changes made by user are now queued up and then flushed to disk asynchronously.

I've now released an updated version (0.3.3.0) of the hsqml-demo-notes package which includes a new faster version of the program illustrating exactly this technique. This hsqml-notes executable is now built using the new code and, for reference, the original code is built into the hsqml-notes-slow executable.

The fast variant uses three MVars to communicate between the UI and database threads. An MVar is kind of synchronisation primitive offered by Haskell akin to a box which may either contain a data value or be empty. Operations on MVars take out or put in values to the box and will block if the MVar is not in the appropriate state to accept or produce a value. Hence, a variety of different constructs such as locks and semaphores can built out of MVars.

The first MVar in the new notes code, modelVar, contains a Map from note IDs to the data associated with each note. This is the in-memory model. It includes all the fields held in the database table plus an additional field which indicates whether there are any pending changes which need flushing to the database (whether the record is "dirty"). The MVar semantics here act as a lock to prevent more than one thread trying to manipulate the model at the same time.

A second MVar, cmdVar, is used as a shallow channel for the UI thread to signal the database thread when there is work to do. The database thread normally waits blocked on this MVar until a new command value is placed in it, at which point it takes out and acts upon it. The first command given to the database thread when the program starts is to populate the model with the data stored on disk. Thereafter, whenever a user makes a change to the model, the dirty bit is set on the altered record and a command issued to the database thread to write those dirty records to the database.

Finally, the third possible type of command causes the database thread to close the SQLite file and cleanly exit. In that case, the third MVar, finVar, is used as a semaphore to signal back to the UI thread once it has shut down cleanly. This is necessary because the Haskell runtime will normally exit once the main thread has finished, and the MVar provides something for it block on so that the database thread has time to finish cleaning up first.


What is the FactoryPool actually for?

QML objects require a relatively explicit degree of handling by Haskell standards because the idea that data values can have distinct identities to one another even if they are otherwise equal is somewhat at odds with Haskell's embrace of referential transparency. This sense of identity is important to the semantics of QML and can't be swept under the rug too easily. Crucially, using signals to pass events from Haskell to QML requires that both the Haskell and QML code are holding on to exactly the same object.

One way to accomplish this is to carefully keep track of the QML objects you create in your own code. The factory-pool is an attempt to provide a convenience layer atop object creation which saves the programmer from having to do this. It is essentially an interning table which enforces the invariant that there is no more than one QML object for each distinct value (according to the Ord type-class) of the Haskell type used to back it. If you query the pool twice with two equal values then it will give you the same object back both times. Importantly, it uses weak references so that objects which are no longer in use are cleared from the intern table and it doesn't grow in size indefinitely.

One of the difficulties people have had with understanding the factory-pool from the notes example is that it's not generally necessary to make it work. Aside from the initial loading of the database, all the activity is driven from the front-end and so displaying a single view isn't strictly reliant on the signals firing correctly. If you replace the code to retrieve an object from the pool with a plain object instantiation, the default QML front-end for the demo would still work the same, albeit more wastefully.

To see the pool doing something useful, try the "notes-dual" front-end (by specifying it on the command line), which I've come to think of as the most interesting of the demos. It displays two front-ends simultaneously backed by the same data model. Changes in one are immediately reflected in the other. This works because when each front-end retrieves the list of notes they both obtains the same ObjRef from the pool for each Note as each other. Hence, when a change is made in one front-end, and a change signal is fired, both the front-ends receive it and remain in sync.

Without the pool, the onus would be on the application code to keep track of the objects in some other fashion. For example, if each read of the notes property created a new set of objects every time it was accessed then many more objects would need to have signals fired on them in order to ensure that every piece of active QML had received notice of the event. Each front-end would still be backed by the same data, methods and properties would still have access to the same set of Note values, but their objects would be distinct from the perspective of signalling and, unless special care was taken, the front-ends wouldn't update correctly.