JVx and PostgreSQL, supporting Savepoints
Transactions are an integral mechanism of databases, without them we could hardly ensure data consistency. Well, we could of course, but it would be a lot more work. JDBC has of course support for transactions, but it also supports so called "Savepoints", which we will have a look at today.
What are transactions?
Normally, I'd assume that everyone of you knows what transactions are and what they are used for, but today I will quickly rehash it to make sure. Transactions enable us to define groups of statements to be executed on a database with all either succeeding or all failing. Imagine the following statements which might be executed on a database as one action:
1: Insert record into A 2: Update record of B 3: Update another record of B 4: Insert record into C
This might happen in the background if you click a button or do something similar. This example is straightforward but if we start thinking about possible problems we will soon realize that there is a lot of potential for things going wrong. For example if statement #3 would fail we would miss records in the tables B and C. Overall, our data would be in an inconsistent state after that. To make sure that it does not happen, we can define these statements as a single transaction.
Transaction |-Insert record into A |-Update record of B |-Update another record of B |-Insert record into C \-Commit
If one of these statements fail, all changes done by previous statements can be undone and the database will return to the state before we started manipulating it. That is great, because we can now guarantee that even though an error has happened, the data will remain in a consistent state.
We could even extend that with additional application logic, for example we insert three records and then we check if the values of all records add up to a certain threshold, if yes, we simply undo the changes. Or if we notice that a constraint has been violated (though, pretty much all databases support constraint definitions in one way or the other). Additionally, transactions are completely isolated from all other connected clients. That means that if we start a transaction and insert a record into table A, all other clients will not see this record until we commit our changes. So the data is never in an inconsistent state, not even temporarily or transiently.
What are Savepoints?
Savepoints are sub-transactions within another transaction. It allows to undo parts of an ongoing transaction.
Another example:
Transaction |-Insert record into A |-Insert child record into B |-Insert child record into B |-Insert another record into A |-Insert child record into B |-Insert child record into B |-Insert another record into A |-Insert child record into B |-Insert child record into B \-Commit
This is a little bit more fabricated and complicated, assume we want to insert three master records with two detail records each. The following conditions apply:
- If a master record fails to insert, nothing should be inserted.
- If a detail record fails to insert, the corresponding master record should also not be inserted.
This is hard to do with a simple transaction, but is quite easy when it comes to using savepoints:
Transaction |-Savepoint 1 | |-Insert record into A | |-Insert child record into B | \-Insert child record into B |-Savepoint 2 | |-Insert another record into A | |-Insert child record into B | \-Insert child record into B |-Savepoint 3 | |-Insert another record into A | |-Insert child record into B | \-Insert child record into B \-Commit
We create a savepoint before the insert of the master record, if the insert of the master record fails we rollback the transaction, if the insert of a detail record fails we rollback to the savepoint we created earlier. This allows quite complicated and nested transactions, especially because there is no defined limit to how deep savepoints can be nested.
Error behavior
Of course it can always happen that a statement fails for one reason or another, so it is important to know how the database behaves once an error occurred. We built ourselves another simple example:
Transaction |-Insert record into A |-Insert record into B (this fails) |-Insert record into C \-Commit
We insert three records, and the second one fails to insert. How should the database behave in such a situation? Turns out that this differs between different database systems.
Silent/Automatic restore to a previous state
Many databases perform a simple "silent/automatic restore to a previous state", all changes done (if any) of the current statement are undone and the transaction can be treated like the failing statement never happened. With the example above, and assuming that we do not cancel the transaction on an error, the records would be inserted into A and C.
In our tests Oracle, MySQL/MariaDB, H2 and SQLite were all behaving this way.
Requiring manual recovery
PostgresSQL requires to perform a "manual recovery" from a failed statement. So that once an error occurred during a transaction, the user has to revert to a (manually) set savepoint or rollback the complete transaction. We will go into details on that later.
Reverting everything and happily continuing
MS SQL on the other hand has a quite different approach. When an error occurs during a transaction, all changes are (automatically) rolled back but the transaction can still be used. So in our example, only the record in C would be inserted.
PostgreSQL JDBC and Savepoints
Back to PostgresSQL and how it requires manual recovery. When a statement fails within a transaction in PostgreSQL, the transaction enters the aborted state and one can then see an error like the one below if further statements are issued:
Current transaction is aborted, commands ignored until end of transaction block.
What that means is simple that the connection/server is still waiting on input on what to do with the transaction. There are three possible ways to recover from there:
- Rollback the complete transaction
- Commit, which will be transformed into a rollback at this point
- Rollback to a savepoint
But these actions must be initiated by the user before the transaction can be further used (or not, if it is being rolled back completely). If one wants to emulate the behavior of other databases in PostgreSQL, every statement that is executed within a transaction has to be "wrapped" with a savepoint, in pseudo-code:
savepoint
try
execute statement
release savepoint
catch
rollback to savepoint
...
commit
Even though that seems tedious, that is not the case. If you have such a requirement you already have central point through which all statements pass before being executed, so this can be implemented easily.
JVx and Savepoints
In our case it is DBAccess, our main datasource. Because every database interaction has to pass through DBAccess (in one way or the other), we could easily implement such emulation at a low-level and it is automatically available to all users. To be exact, DBAccess has received internal support to wrap all statements in savepoints when configured to do so. This configuration possibility is protected and is currently only used by the PostgresSQL DBAccess extension. It does exactly what it says on the tin and is only active when enabled and automatic commits have been turned off. So this change does have no effect on any other database but PostgreSQL.
We already have plans to extend this basic savepoint support with a public API which allows users of JVx to utilize this new functionality. One of the ideas that we are currently discussing is to provide the ability to create named savepoints. A simple mockup of that idea:
-
// Switch off auto commit to use transactions.
-
dbAccess.setAutoCommit(false);
-
-
// Insert some data.
-
dbStorage.insert(aRecord);
-
dbStorage.insert(anotherRecord);
-
-
// This part is optional.
-
try
-
{
-
dbAccess.setSavepoint("NAME");
-
-
dbStorage.insert(yetAnotherRecord);
-
dbStorage.update(someOtherRecord);
-
}
-
catch (DataSourceException dse)
-
{
-
log.error(dse);
-
dbAccess.rollbackTo("NAME");
-
}
-
-
dbAccess.commit();
As said, we are currently in the process of discussing such possibilities but definitely want to provide such an API at one point.
Conclusion
As it turns out, the "special" behavior of PostgreSQL isn't as special as it seems to be. It is a design decision that was taken and that is understandable. Changing this behavior now, 20+ years in, is out of the question as it would require a substantial effort to make sure that this behavior is backwards compatible. The gains from such a change on the other hand would be very little, as it is a quite specialized case in which this behavior matters and the "fix" is rather easy.