Tag Archives: liquibase

Rewriting applications to use pax-jdbc-config and liquibase

After creating the post Pluggable databases for apache karaf applications I posted a link to the blog post in the karaf user mailing list, and the immediate response was, “why didn’t I just pax-jdbc-config instead?“.

The answer to that is that I didn’t know about pax-jdbc-config. I started using pax-jdbc in the summer of 2016, and started using apache karaf in the autumn of 2016 and pax-jdbc-config didn’t exist then (or at least: not as complete and usable as it became in 2017), and any announcement that has gone past since then, has not registered.

But I completely agree with the sentiment expressed in the pax-jdbc-config wiki page:

For some cases you want the actual DataSource available as a service in OSGi. Most tutorials show how to do this using blueprint to create a DataSource and publish it as a service. The problem with this approach is that you have to create and deploy a blueprint file for each DataSource and the blueprint is also Database specific.

So I decided to try pax-jdbc-config out.

First out was the sonar-collector. This is the smallest web application I have, that is using JDBC and liquibase. Properly speaking, it isn’t a web application, but it uses web application technology. It is a single servlet that serves as a webhook that sonarqube can call after completing analysis of a software project, and records some key sonarqube metrics in a single database table. The purpose of sonar-collector was to be able to aggregate statistics over time to be able to show a gradual improvement of a set of applications. I.e. to show that each release was a little bit better instead of a little bit worse, according to the metrics. And since sonarqube by itself didn’t save those metrics in an easily accessible form, we started out by manually collecting metrics, applications, versions and dates in a spread sheet.

Since collecting numbers manually and punching them back into a different program is boring, I looked into what possibilities there were to extract the numbers from sonarqube. And I found that the simplest way to collect them was to collect them each the time sonarqube completed an analysis. So sonar-collector is a single servlet implemented as an OSGi declarative services component, that registers as a servlet OSGi service with the OSGi web whiteboard, and on each invocation collects the metrics of the sonar analysis invoking the servlet, and writes the metrics as a new row in a database table.

The first version of sonar-collector used PostgreSQL and was pretty much tied to PostgreSQL, even though it didn’t do anything PostgreSQL specific: the liquibase schema setup and the single insert statement should work fine on any DBMS supported by liquibase (basicallly all of the DBMSes supported by JDBC), and JDBC.

The reason the first version of sonar-collector was tied to PostgreSQL, was that it had no way to qualify what DataSourceFactory the component should receive. And since sonar-collector’s karaf feature loaded and started the PostgreSQL DataSourceFactory (by having the postgresql JDBC driver as a compile scope maven dependency), PostgreSQL was pretty much all it got.

The changes to adapt sonar-collector to using pax-jdbc-config, were:

  1. Removal of all of the Java code and tests releated to connecting to a JDBC source
  2. Changing the DS injection of an unqualified DataSourceFactory OSGi sevice, with an application specific DataSource OSGi service
  3. Add the default pax-jdbc-configuration to the template feature.xml file

What didn’t need to be changet was the way the sonar-collector runs liquibase. The servlet ran the liquibase scripts from the servlet DS component activate method, since this method only will be called when a valid, working, connected DataSource is available.

How much code actually was removed was a bit of an eye opener, because I did think of the JDBC connection code as “a little bit of boilerplate, but not all that much”. But if you try to make configuration option configuration robust, and then try to get a good test coverage of the code, handling all sorts of exceptions, then you end up with quite a few lines of java code.

In any case, I was satisfied with the end result: I removed a lot of Java code, and ended up with a sonar-collector that can use any RDBMS that is supported by JDBC to store the results.

The next candidate for switching to pax-jdbc-config was authservice.

Authservice is a webapp, that:

  1. Can provide nginx with forms-based authentication
  2. Can provide other webapps with apache shiro-based authentication and role-based authorization
  3. A JDBC user/role/permission database with:
    1. a react-redux based administration user interface
    2. “self-service” web pages (static HTML styled with bootstrap) for letting users change their passwords, name and email
    3. an OSGi service other webapps can use to get information from the user database and modify the user database

In theory the JDBC based user database could be swapped with an LDAP service, using an LDAP realm and wrapping the UserManagementService over admin operations, and be used with both the admin UIs of authservice and other webapps using authservice, without any of them feeling any difference. We’ll see if we ever get there.

The authservice webapp has two databases:

  1. a derby in-memory database initialized with dummy data, and used for unit tests, integration tests, and for demonstrating the application
  2. a PostgreSQL database for use by the application in production

Three bundles of authservice were related to database connection and setup:

  1. an OSGi library bundle containing the liquibase scripts for the database schema as classpath resources and some code to load and run the scripts (in OSGi, code that wishes to load resources from the classpath, needs to reside in the same bundle as the resources)
  2. a bundle with a DS component expecting a DataSourceFactory OSGi service injection, creating an in-memory derby database, setting up the schema and adding dummy data, and finally exposing an AuthserviceDatabaseService OSGi service
  3. a bundle with a DS component expecting a PostgreSQL specific DataSourceFactory OSGi service injection, connecting to a PostgreSQL server, setting up the schema and adding some initial data (currently none here)

The OSGi library bundle could be left as it was.

The changes to adapt authservice to pax-jdbc-config, was:

  1. Rename the bundles to something more describing
    1. Rename the test database setup bundle from authservice.db.derby.test to authservice.db.liquibase.test
    2. Rename the production database setup bundle from authservice.db.postgresql to authservice.db.liquibase.test
  2. Rename the DS components to something more describing, remove all of the JDBC connection setup code, remove the schema setup code from the activate method, and expose the PreHook OSGi service:
    1. Rename the test database DS component from DerbyTestDatabase to TestLiquibaseRunner
    2. Rename the production DS component from PostgresqlDatabase to ProductionLiquibaseRunner
  3. Add a feature pax-jdbc-config config to the template feature.xml files:
    1. Add pax-jdbc-config configuration to the template feature of the test database
    2. Add pax-jdbc-config configuration to the template feature of the production database

After the changes the application consisted of the same number of OSGi bundles, but now the application is no longer tied to derby and PostgreSQL, but can be configured at runtime to use different DBMSes.

The final application to modify, was an application called “ukelonn”, which was used to register performed household chores, and payment of the resulting allowance.

The application was similar in structure to the authservice application. Like authservice. the allowance application has both a derby test database and a PostgreSQL production database. And also like authservice, the allowance application uses liquibase to create the schemas and populate the database with initial data.

However, it had a complicating factor that was that ukelonn’s application specific DatabaseService subtype, was also used to return DBMS specific SQL for PostgreSQL and derby for aggregating over years and months.

Using aggregating over years as an example (i.e. use the simplest query as the example), in derby it looked like this:

Listing 1.

select sum(t.transaction_amount), YEAR(t.transaction_time)
  from transactions t
  join transaction_types tt on tt.transaction_type_id=t.transaction_type_id
  join accounts a on a.account_id=t.account_id
  where tt.transaction_is_work and a.username=?
  group by YEAR(t.transaction_time)
  order by YEAR(t.transaction_time)

and in the PostgreSQL it looked like this:

Listing 1.

select sum(t.transaction_amount), extract(year from t.transaction_time) as year
  from transactions t
  join transaction_types tt on tt.transaction_type_id=t.transaction_type_id
  join accounts a on a.account_id=t.account_id where tt.transaction_is_work and a.username=?
  group by extract(year from t.transaction_time)
  order by extract(year from t.transaction_time)

The difference in SQL syntax was handled by letting the derby DatabaseService and the PostgreSQL DatabaseService return different strings from the sumOverYearQuery() method, and a grouping over year and month, was handled in the same way in the sumOverMonthQuery() method.

I pondered various ways of doing this, including creating two extra bundles to be able to provide this for derby and PostgreSQL, but that would have meant that any new DBMS introduced would need to get its own bundle, and that would raise the threshold for actually using a different DBMS considerable.

The solution I landed on was to let liquibase create different views depending on the DBMS used, and then do a select over the view in the business logic class.

The changes to adapt ukelonn to pax-jdbc-config, then was:

  1. Add the new DBMS specific views
  2. Change the business logic to use the new views instead of asking the database service for the queries to use
  3. Rename the database service bundles to something more describing for liquibase hooks
    1. Rename ukelonn.db.derbytest to ukelonn.db.liquibase.test
    2. Rename ukelonn.db.postgresql to ukelonn.db.liquibase.production
  4. Rename the DS components for the database services to something more describing for liquibase hooks, expose the PreHook OSGi service and remove the JDBC connection code:
    1. Change UkelonnDatabaseProvider to TestLiquibaseRunner
    2. Change PGUkelonnDatabaseProvider to ProductionLiquibaseRunner
  5. Add pax-jdbc-config based features to the feature repositories of the test and production liquibase hooks
    1. Configuration feature ukelonn-db-test
    2. Configuration feature ukelonn-db-production
  6. Use the pax-jdbc-config based features when composing the application

In conclusion

  1. Using pax-jdbc-config allows me to use different DBMSes than the default derby and PostgreSQL DBMSes, without changing any Java code: I need to add the JDBC driver for the RDBMS and need to change the pax-jdbc-config configuration to use the correct DataSourceFactory and set the JDBC connection properties (url, username and password)
  2. Replacing the old way of setting up databases with pax-jdbc-config let me remove a lot of Java code, both actual code for doing the JDBC connections, and probably three times as much code for the unit tests of the JDBC connection code
  3. Using liquibase to set up different views for different DBMSes where the syntax varies was a lot cleaner and easier to test than the previous approach (i.e. having the DatabaseService implementations return different SQL strings for different databases)

How I learned about linux’ “OOM Killer”

I have a low-end VPS (Virtual Private Server), or at least: it used to be low-end, now it’s at least one step above lowest and cheapest.

On this server I’m running various personal stuff: email (with spamassassin), some public web pages, and some private web pages with various small webapps for the household (e.g. the “weekly allowance app” registering work done and payments made).

I had noticed occasional slow startups of the webapps, and in particular in September this year, when I was demonstrating/showing off the webapps at this year’s JavaZone the demos were less than impressive since the webapps took ages to load.

I was quick to blame the slowness on the wi-fi, but as it turns out, that may have been completely unfair.

The webapps had no performance issues on my development machine even when running with a full desktop, IDE and other stuff.  The webapps on the VPS also seemed to have no performance issues once they loaded.

I thought “this is something I definitely will have to look into at some later time…” and then moved on with doing more interesting stuff, i.e. basically anything other than figuring out why the webapps on a VPS were slow to start.

But then the webapps started failing nightly and I had to look into the problem.

What I saw in the logs was that the reason the webapps were broken in the morning was that they were stuck waitning for a liquibase database changelog lock that never was released.

Liquibase is how my webapps set up and update database schemas. Every time a webapp starts it connects to the database and checks what liquibase scripts that have been run against that database and applies the ones that have not already been run. The list of scripts that have been run is a tabled called databasechangelog. And to avoid having more than one liquibase client attempting to modify the database schema, liquibase uses a different table called databasechangeloglock to moderate write access to the database,

I.e. the databasechangeloglock is just a database table that has one or 0 rows. A liquibase client tries to insert a lock into the table at startup and waits and retries if this fails (and eventually completely fails).

In my case the webapps were failing because they were hanging at startup, trying to get a liquibase lock and failing to get one and were hanging in limbo and never completing their startup process. Manually clearing the lock from the table and restarting the webapps made the webapps start up normally. However, the next day the webapps were failing again for the same reason: the webapps were stuck waiting for a liquibase lock.

I initially suspected errors in my code, specifically in the liquibase setup. But googling for similar problems, code examination and debugging revealed nothing. I found nothing because there was nothing to be found.  The actual cause of the problem had nothing to do with the code or with liquibase.

I run my webapps in an instance of apache karaf that is started and controlled by systemd. And I saw that karaf was restarted 06:30 (or close to 06:30) every morning. So my next theory was that systemd for some reason decided to restart karaf 06:30 every morning.

No google searches for similar symptoms found anything interesting.

So I presented my problem to a mailing list with network and linux/unix experts and got back two questions:

  1. Was something else started at the same time?
  2. Did that something else use a lot of memory and trigger the OOM killer?

And that turned out to be the case.

I have been using and maintaining UNIX systems since the mid to late 80ies and setting up and using and maintaining linux systems since the mid to late 90ies, but this was the first time I’d heard of the OOM killer.

The OOM killer has been around for a while (the oldest mention I’ve found is from 2009), but I’ve never encountered it before.

The reason I’ve never encountered it before is that I’ve mostly dealt with physical machines. Back in the 80ies I was told that having approximately two and a half times physical memory was a good rule of thumb for scaling swap space, so that’s a rule I’ve followed ever since (keeping the ratio as the number of megabytes increased, eventually turning into gigabytes).

And when you have two and a half times the physical memory as a fallback, you never encounter the conditions that make the OOM killer come alive.  Everything slows down and the computer starts trashing before the condtions that triggers the OOM killer comes into play.

The VPS on the other hand, has no swap space. And with the original minimum configuration (1 CPU core, 1GB of memory), if it had been a boat it would have been said to be riding low in the water. It was constantly running at a little less than the available 1GB. And if nothing special happened, everything ran just fine.

But when something extraordinary happened, such as e.g. spamassassin’s spamd starting at 06:30 and requiring more memory than was available, then OOM started looking for a juicy fat process to kill, and the apache karaf process was a prime cadidate (perhaps because of “apache” in its name combined with OOM killer’s notorious hatred of feathers?).

And then systemd discovered that one of it’s services had died and immediately tried to restart it, only to have OOM killer shoot it down, and this continued for quite a while.

And in one of the attempted restarts, the webapp got far enough to set the databasechangeloglock before it was rudely shot down, and the next time(s) it was attempted started it got stuck waiting for a lock that would never be released.

The solution was to bump the memory to the next step, i.e. from 1GB to 2GB. Most of the time the VPS is running at the same load as before (i.e. slightly below 1GB) but now a process that suddenly requires a lot of memory no longer triggers the OOM killer and everything’s fine.  Also the available memory is used for buff/cache and everything becomes much faster.

I bumped the memory 8 weeks ago and the problem hasn’t occurred again, so it looks like (so far) the problem has been solved.

Pluggable databases for apache karaf applications

Edit: I no longer use this approach. I use pax-jdbc-config instead.  See the article Rewriting applications to use pax-jdbc-config and liquibase for details

When creating databases for my apache karaf based web applications, I want the following things:

  1. A database-independent schema creation using liquibase (i.e. supporting multiple relational database systems)
  2. Pluggable databases (each web application typically has an in-memory derby database for testing and evaluation and a PostgreSQL database for production use)
  3. The ability to run multiple applications in the same apache karaf instance without the application databases colliding
  4. The possibility to configure the JDBC connection information using the karaf configuration

The way I accomplish this, is:

  1. Create an application specific OSGi service for connecting to the database
  2. Create declarative services (DS) components implementing the service for each database system I wish to support
  3. Create a bundle implementing the calls to liquibase (and containing the liquibase scripts as resources) and use this bundle from all DS components implementing the database service
  4. Use high level karaf features to compose my webapps, with variants for each supported database system

OSGi 101: when I talk about “OSGi service” I mean “Java Interface”. Java interfaces form the basis of the OSGi plug-in model. When an OSGi bundle (a “bundle” is a JAR file with some OSGi specific headers added to its MANIFEST.MF) is loaded it can have dependencies to services provided by other bundles.  When those dependencies are satsified the bundle can go active and provide its own services to other bundles that may be listening.

I have created a Java interface called DatabaseService that implements the necessary operations I do against a JDBC database, and they aren’t many:

  • getDatasource() that will return a DataSource  connected to a relational database with the expected schema (Note: if connecting to the database fails, the DS component providing the service never goes active, so the only way code ever gets to call this method is if the connection was successful. This simplifies the code using the method)
  • getConnection() that returns a Connection to the database. This is a convenience method, because what the implementations actually do, is call Datasource.getConnection().  It makes sense to have this method in the interface, because:
    1. Fetching the connection is the only thing done with DataSource, so that it makes all usages a little bit simpler and shorter
    2. The DS components providing the service all have ended up implementing this method anyway, in the calls to liquibase

The application specific DatabaseService definitions are mostly without methods of their own, like e.g.

However, I’ve had one case where I needed to have very different syntax for the different database systems supported, and then I added methods returning database specific versions of the problematic SQL queries to the application specific database service definition:

For the curious: the SQL with different syntax had to do with aggregation over months and years where the SQL looks like this in derby and the SQL looks like this in PostgreSQL.

The next step is to create a bundle holding the liqubase schema definition of the database.  This bundle neither expects OSGi service injection, not exposes any services.  It’s an OSGi library bundle that is like using a regular jar, except that the exports and imports of packages is controlled by OSGi and OSGi-headers in the bundle’s MANIFEST.MF.

More OSGi terminology: “bundle” means a jar-file with an OSGi-compliant MANIFEST.MF.

The liquibase bundles contain a single class and liquibase scripts as resources. In OSGi the resources are bundle local, so other bundles can’t load them directly. But they can instantiate a class from the liquibase bundle and that class can in turn load the resources that resides in the same bundle as the class.

The classes in the liquibase bundles typically contains a method for initial schema setup, and a method for schema modifications.  In addition to the scripts in the liquibase bundle, the database plugins contain their own liquibase scripts (the liquibase scripts of the in-memory derby test database plugins contains sets of hopefully realistic test data, and the liquibase scripts of the PostgreSQL plugins contains initial data such as e.g. a single initial user).

The declarative services (DS) components of the database plugins expect an injection of the DataSourceFactory OSGi service. The DataSourceFactory interface isn’t part of JDBC (like DataSource itself is), but is part of the OSGi service platform.

To ensure that the correct DataSourceFactory is injected it is possible to filter on the osgi.jdbc.driver.name service property (service properties is a Map<String, String> associated with an OSGi service), like e.g.


and

Once all @Reference service injections are satisfied, the method annotated with @Activate is called. This method needs to create a successful connection to the database and run liquibase scripts. A method for a derby memory database can look like this (the full source of the UkelonnDatabaseProvider class is here):


What the method does is to first create a DataSource object (stored in a field and accessible through the DatabaseService interface), and it then creates a connection in a try-with-resource and runs the liquibase scripts using that connection before the try-with-resource releases the connection. The scripts are typically an initial schema, followed by initial data for the database, followed by modifications to the schema.

It is possible to create configuration for a DS component using the karaf console command line, and creating JDBC connection info for a PostgreSQL database can be done with the following commands:

The resulting configuration will be injected into the @Activate method like e.g. so (the full source of the PGUkelonnDatabaseProvider class is here):

The injected config is sent into the method creating the DataSource instance.

Important: the DataSource instance is cached and is used to represent the connected database.  The Connection instance are not cached. The Connection instances are created on demand in try-with-resource, to ensure that they are closed and all transactions are completed.

The method creating the DataSource instance passes the config on to a method that picks the JDBC connection info out of the injected config and puts into a Properties instance that can be used to connect to a JDBC database.

Using the pluggable database is done by having the DS component implementing the application’s business logic have the application specific database service injected, e.g. like so (full UkelonnServiceProvider source here):

Using the injected service is calling the getConnection() method in a try-with-catch, do the SQL operations and then let the end of the try-with-catch release the connection, like e.g. so

It probably helps for this approach for pluggable databases that I’m using JDBC prepared statements directly instead of using an ORM like e.g. hibernate (but that’s a story/rant for another day…).

Picking what database to use is done using karaf features.

The karaf-maven-plugin is used on all maven modules that create an OSGi bundle, to create a karaf feature repository file and attach it to the OSGi bundle artifact.

In addition I create a handwritten feature repository file where I include all of the feature files of the OSGi bundle artifacts using their maven coordinates. And then in the same file, create high level features that start the application with either database.

Examples of hand-written feature repositories:

  1. aggregate feature repository for ukelonn (maven URL mvn:no.priv.bang.ukelonn/karaf/LATEST/xml/features )
  2. aggregate feature repository for authservice (maven URL mvn:no.priv.bang.authservice/authservice/1.6.0/xml/features )

Using the authservice as an example:

  1. First define a feature that loads the features required for the application, except for the database
  2. Then define a feature that uses that feature in addition with the feature created when building the OSGi bundle for the derby test database
  3. Finally define a feature that uses the first feature in addition with the feature created when building the OSGi bundle for the PostgreSQL database

This means that trying out the authservice application can be done by:

  1. Download and start apache karaf using the quick start guide
  2. Install authservice with derby by loading the feature repository using the maven coordinates, and then loading the feature that pulls in the application and the derby database
  3. All bundles required by the features are pulled in from maven central and the application is started and becomes available on http://localhost:8181/authservice

Installing and starting the PostgreSQL takes a little more work (but not much):

  1. Install and start PostgreSQL (left as an exercies for the reader)
  2. Create a PostgreSQL user named karaf with a password (in the examples “karaf”)
  3. Create a blank database named “authservice” owned by user karaf
  4. Download and start apache karaf using the quick start guide
  5. Create database connection configuration from the karaf console
  6. Install authservice with PostgreSQL by loading the feature repository using the maven coordinates, and then loading the feature that pulls in the application and the PostgreSQL database
  7. All bundles required by the features are pulled in from maven central and the application is started and becomes available on http://localhost:8181/authservice