Getting PostgreSQL transactions under control with SQLAlchemy
At Authentise we use databases quite a bit. That's not really all that odd, most web technologies need some way of persisting state long term and a database does just that. We like PostgresQL. There's lots of reasons for that but mostly we prefer PostgresQL because
You like explicit schema at the storage layer rather than implicit schema at the application layer
PostgresQL trends towards completeness and correctness rather than helpfulness and inferrence (I'm looking at you MySQL)
PostgresQL has been around a good long time and has plenty of mature libraries and tools
To talk to our database we use the excellent sqlalchemy. We don't use ORMs. We don't like ORMs. We love sqlalchemy because at the very bottom, underneath the ORM everyone else seems to like, we find an excellent way to incrementally build up queries. That's something we do quite a bit when exposing the ability to users to filter results from the API. I'll go into that in another post.
In order to talk to our PostgresQL database from python through sqlalchemy we use psycopg2. psycopg2 has been around for a while and is an excellent piece of software. We have just one problem with it, and that's its adherence to PEP-249.
Wait, that's a PEP you haven't heard of? It's a good one. Go read it, it's not terribly long.
I'm lying, it's like 20 pages, don't go read it.
Ultimately PEP-249 is a really good thing. It makes different libraries dealing with different databases behave in a consistent way with a consistent API. We don't really care about consistency - we use one database and one way of talking to it in one programming languages across many different services.
Maybe that'll change some day and I'll regret this, but it's important to remember that when it comes to database we are as monocultural as we can be.
Carl Meyer has an excellent writeup at OddBird on PostgreSQL transactions and SQLAlchemy. We agree with Carl that:
Postgres has it right I prefer the ‘autocommit’ model. It’s simple, explicit, and unsurprising. Queries are never grouped together into a transaction unless you ask for one. If you issue a COMMIT or ROLLBACK, there is never any doubt about which queries you are committing or rolling back, because you explicitly issued the BEGIN to start the transaction.
We like the ergonomics of having to explicitly begin a transaction. Everyone who deals with PostgreSQL deals with current transaction is aborted. We don't fault PostgreSQL for this - we agree that the only solution in a transaction to an exception is to roll back the transaction. The problem is that we do not agree with the environment that has been created in which everyone has to deal with this.
In essence everyone who uses SQLAlchemy with PostgreSQL faces the fact that an error in their SQL will require them to rollback a transaction. This is extremely confusing if they didn't actual open a transaction to begin with but are dealing with the default transaction created by psycopg2. They then have to look into information about how to rollback a transaction they didn't even start. This will leave them thinking the ergonimics of PostgreSQL are terrible when it is infact a bad interaction between different parts of their stack. They can enable the default behavior of Postgres - autocommit mode - because that's supported by SQLAlchemy in newer versions, but they have to know that's what they're looking for.
You can read Carl Meyer's post for more information on how SQLAlchemy deals with autocommit mode and some of the underlying logic psycopg2 uses to comply with PEP-249. Let's just sum it up though
PostgreSQL starts connections in auto-commit mode. You have to explicitly manage transactions. That's good
PEP-249 implies you are using transactions always. That's bad.
psycopg2 follows PEP-249. That's good.
psycopg2 therefore implicitly creates transactions on connect that you can't easily explicitly manage. That's bad.
SQLAlchemy exposes some of this in a database-independent. That's good.
You can tell SQLAlchemy to get into autocommit mode, its default. That's good.
You can't actually use transactions inside SQLAlchemy when you enable its 'AUTOCOMMIT' isolation level. That's bad.
Carl Meyer then goes on to code up a basic system for massaging SQLAlchemy into a shape where you can do transactions via a context manager. That is awesome, and we totally used it for a little while. Eventually we outgrew it as our needs increased and ended up making a project. We've open sourced this project, and this post is our first public documentation on it.
Chryso first started as a way to get 'normal' PostgreSQL connections. From the perspective of user code the connection defaults to auto-commit and you can manage transactions very naturally using with connection.atomic(). If the code raises an exception we roll back the transaction, otherwise we commit when we leave the block. It's pythonic and it's awesome and we love it.
Chryso has since grown in a couple of key ways which I'll cover in future blog posts.
You can use chryso in a very straightforward way.
import chryso.connection connection = chryso.connection.Engine( 'postgres://test:password@localhost/my-db', tables, ) query = sqlalchemy.insert(...) connection.execute(query)
Instead of creating a connection via sqlalchemmy.create_engine() you create a chryso Engine instance. You pass in the tables because we have a couple of convenience wrappers with logging around create_all(). It's not super pretty, these are just things we find useful within the context of stuff we build.
If your query is bad for some reason you're in autocommit mode by default and you don't have to worry about transaction rollback or anything. Just try another query.
If you'd like to have transactions you can get them very easily
import chryso.connection connection = chryso.connection.Engine( 'postgres://test:password@localhost/my-db', tables, ) query1 = sqlalchemy.insert(...) query2 = sqlalchemy.insert(...) with connection.atomic(): connection.execute(query1).fetchall() connection.execute(query2).fetchall()
In this case if either of the queries fails the transaction will be rolled back and neither query will have any effect. You can even forcibly abort the transaction by raising an exception. You can also nest the atomic() calls and chryso will use savepoints and rollbacks/commits will work the way you'd expect if you're familiar with SAVEPOINT.
Chryso is a tiny project, and it scratches an itch for us. Next week I'll cover another feature we've started building in to it that we are organically growing as we need - error parsing. Aside from that it does some nice things around unit tests with py.test. If you're interested in using chryso or extending it, please reach out to us. We're not even totally sure if anyone else in the community is as obsessed as we are with proper DB transaction management, so, drop me a line at firstname.lastname@example.org