Join mastodon and slowly posting a wordpress RSS/atom feed

Mastodon is a social network that appears similar to twitter: you get a feed where you can see “toots” instead of “tweets” and you can send your own toots that will be seen by others listening to you or looking for a hashtag found in your toot.

Mastodon differs from twitter in that, like diaspora, it is not under corporate control. Also like diaspora, Mastodon is a non-profit, user-owned, distributed social network. The mastodon software is free software (GNU AGPL-3.0 or later). The mastodon server side is written in ruby and the mastodon frontend is written in react/redux. And also, like diaspora, it is part of the fediverse.

The first thing to do is to find a server to join. Again like diaspora, mastodon consist of interconnected servers. The servers are owned by different people with different rules for joining and different rules for what can be posted. The rules for registration can be quite different.

Note: you can share posts and listen to postings across servers. You’re not limited to the server you’re joining.

I looked at a couple of server and found them too restrictive, and ended up joining mastodon.social.

I can be reached on mastodon with @steinarb@mastodon.social.

The next thing to do, was to find a way to slowly post my wordpress feed in chronolgical order.

I quickly found a utility called feediverse. It’s written in python, and according to its author, written over the course of a weekend. According to the commits on feediverse’s github page it was written 2 years ago and has received no commits since then.

But it still does the job.

The first thing to do was install it on my debian server, as root, first install pip3, then install feediverse:

apt install python3-pip
pip3 install feediverse

But that took me only part of the way, because pointing that at my wordpress feed, would have fed the entire feed to mastodon in reverse chronological order. And like for diaspora, I wanted to post the feed one post a day in reverse order.

Note: The diaspora post contains changes to the wordpress feed (make the feed contain only the summaries of all posts on the blog) that should be applied to get the expected results.

I made the github issue Make it possible to slowly toot a comple feed in cronological order for feediverse with the intent to provide a PR on this.

However, I decided on a simpler solution: create a separate script that reads the feed and posts the entries, one post at a time, to a local file.

To use this script on a debian system:

  1. As root, install the dependencies of the script, using pip3:

    pip3 install pyyaml
    pip3 install feedparser
    pip3 install python-dateutil
    pip3 install feedgen
    pip3 install beautifulsoup4
    
  2. As your own user:
    1. Clone the github repo of the script:

      mkdir -p $HOME/git
      cd $HOME/git
      git clone https://github.com/steinarb/feedreverser.git
      
    2. Run feedreverser once manually with the following command:

      /usr/bin/python3 $HOME/git/feedreverser/feedreverser.py
      
    3. When prompted, give the feed URL (you will find this by right-clicking on an RSS symbol in your blog and copy the URL) and give /tmp/reversed.rss as the file to store the reversed feed in
    4. Add a crontab entry that runs the script once every 24h:

      10 6 * * * /usr/bin/python3 $HOME/git/feedreverser/feedreverser.py
      

At this point, /tmp/reversed.rss contains the oldest post in the feed, and within 24h it will be replaced by the second oldest post.

So the next thing to do, is to set up feediverse to post the contents of the feed in this local file, to mastodon.

First create an app in mastodon:

  1. Click on “Preferences”
  2. In the page that opens, click on “Development”
  3. In “Your applications”, click on “New application”
  4. In “New application”:
    1. In application name, give:

      feediverse
      
    2. Make sure read and wite are checked (they are, by default)
    3. Click on “Submit”
  5. In “Your applications”, click on “feediverse”
  6. Make a note of the values for access token and client secret. You will need them when doing the initial setup run of feediverse

Then, as your own user, run feediverse from the command line:

feediverse

When prompted for

  1. “What is your Mastodon Instance URL”, give the top URL for your mastodon instance (for me, it is https://mastodon.social )
  2. “Do you have your app credentials already? [y/n]”, answer “y”
  3. “What is your app’s client id”, answer:

    feediverse
    
  4. “What is your client secret”, give the client secret you made a note of, earlier
  5. “access_token”, give the access token you made a not of, earlier
  6. “RSS/Atom feed URL to watch”, give a file URL for the file you told feedreverser to write to, i.e.

    file:///tmp/reversed.rss
    
  7. Open the $HOME/.feediverse file (a YAML file), in a text editor, and change the line

    - template: '{title} {url}'
    

    to

    - template: '{title} {url} {hashtags} {summary}'
    
  8. If your first post contains hashtags and a summary you would like to see as much of as possible in the toot, then delete the “updated:” line in the .feediverse file and re-run the feediverse command (mine didn’t)

Then add a crontab entry that runs feediverse once every 15 minutes:

15 * * * * /usr/local/bin/feediverse 2>/dev/null

The error redirect is because of the feediverse issue yaml warning from feediverse (I have provided a PR for this warning, referenced from the issue).

At this point feediverse is set up to slowly post your wordpress feed one feed entry a day.

If you let the setup continue past the emptying the feed backlog it will still work, but it will only post a single feed entry per day.

If you want to make feediverse post new entries to your feed just:

  1. In crontab, remove the entry for feedriverser
  2. In the $HOME/.feediverse file, replace:

    url: file:///tmp/reversed.rss
    

    with the feed URL you gave to feedreverser, e.g.

    url: https://steinar.bang.priv.no/feed/atom/
    

But note that there are some feediverse issues that might bite you, that feedreverser protects you from, by fixing them:

  1. Correctly handle hashtags with spaces: wordpress categores and tags with spaces in them, are currently posted as separate hashtags (e.g. “debian 8” becomes “#debian #8”). I have provided a PR for this, that replaces the spaces in categories and tags, with underscores
  2. toots with more than 500 characters fails: there is a typo in the commit that attempted to correct this, I have provided a PR that fixes the typo
  3. Would it be able to change html to readable plain text, and post images?: This means that titles with character entities in them show up sort of unreadable e.g. “Installing apache karaf on debian stretch”, and descriptions with both HTML tags and character entites can get very unreadable. The feedreverser script uses BeautifulSoup to fix this. There is at the point of writing no PR to feediverse to fix this

If you want a version of of feediverse that fixes the first two, you can get my fork of feediverse and use that instead:

  1. Clone my fork of feediverse and use the branch where I’ve combined my PRs:

    mkdir -p $HOME/git
    cd $HOME/git
    git clone https://github.com/steinarb/feediverse.git
    cd feediverse
    git checkout steinarsbranch
    
  2. Change the crontab line to run the cloned script instead of the script installed with pip3 (note that the need to redirect the error output is gone because the yaml issue is fixed):

    15 * * * * /usr/bin/python3 $HOME/git/feediverse/feediverse.py
    

Join diaspora and slowly posting a wordpress RSS/atom feed

Diaspora is a social network that appears to similar to Facebook in its behaviour: you get a web UI with a feed, and what ends up in that feed comes from your friends and your groups and what hashtags you filter for.

Diaspora differs from Facebook in that it is not under corporate control. Diaspora, according to its wikipedia entry, is a non-profit, user-owned, distributed social network. Diaspora’s software is free software (GNU AGPL-3.0) and written i ruby on rails.

To start using Diaspora, one first have to decide on which “pod” (i.e. Diaspora instance), to join. In principle you can see everything posted by any user in the Diaspora fediverse, but there may be limitations in what the pod owner allows. I didn’t do much searching, I found a pod with a .no address and registered there. The first thing I was asked to do was to create a “Here I am” posting with a lot of hashtags referring to my interests.

I did that, and received several welcome replies. So that was kind of friendly.

If you want to see the postings described in this article, add steinarb@diasporapod.no to your contacts.

I then started the process of posting the RSS feed of this blog to diaspora.

Ideally I would have liked to post the feed entries in chronological order, with the original posted date they were posted.

However, the REST API of diaspora doesn’t allow setting the date of a post. The date you post is the date you get. So posting with the original date was out.

Next best would be to post the wordpress feed entries in blog post chronological order at a slowed rate, perhaps 1 post a day, so I wouldn’t bee seen to flood the diaspora feeds. So this is what I set out to do.

The first thing I had to do was to adjust the RSS feeds on wordpress, to get everything to the start of the feed. WordPress was set up to only show the 10 most recent blog posts in the feed. The feed was also set up to contain the entire bodies of the blog posts, and that was a little too much for this use case. So I adjusted the feeds to list only the blog post summaries.

I googled for a tool to do the posting, found this list on the diaspora wiki. The python script pod_feeder_v2 was at the start of the list, so that’s the one I tried first and that’s the one I stayed with.

Installing the pod_feeder_v2 was done by doing the following commands, logged in as root, on a debian system:

  1. First installed python3 pip

    Listing 1.

    apt install python3-pip
    
  2. Then installed pod_feeder_v2

    Listing 1.

    pip3 install pod-feeder-v2
    

The way pod_feeder_v2 works is that it first reads all posts from the RSS feed it listens to, and then stores all entries it hasn’t seen before into a sqlite table (identified by the feed entry GUID). It then traverses the sqlite table in chronological order and posts the unposted entries, marking entries in the table as they are posted, so they won’t be posted more than once.

The time stamp in the sqlite table entries, is the time the entries are written in the table. Since the wordpress feeds post the newest entries first, pod_feeder_v2 on initial read puts the entries in the wrong order, with (more or less) the same time stamp.

So what I tried to do, was to first fill up the database with the existing posts, and then semi-manually correct the time stamp of each post in the database (34 posts, stretching back to 2012) and then start posting from the table 1 post per day.

Luckily pod_feeder_v2 has a command line option –fetch-only, that allows reading the feed and updating the sqlite table, but not posting. So I ran this command to populate the table in the feed.db (I ran this command logged in with my regular user account, not as root):

Listing 1.

pod-feeder --feed-id steinarblog --category-tags --ignore-tag uncategorized --feed-url https://steinar.bang.priv.no/feed/atom/ --pod-url https://diasporapod.no --fetch-only

This adds both the wordpress category and the wordcloud tags as hashtags in the post. I removed the tag uncategorized from the hashtags.

To manipulate the database table I installed the sqlite command line tool, by doing the following command, logged in as root:

Listing 1.

apt install sqlite3

I started the sqlite3 command line tool logged in as my own user account (“feed.db” is the file holding the sqlite database. It is created in the home directory of the user running pd-feeder):

Listing 1.

sqlite3 feed.db

I listed the available GUIDs:

Listing 1.

SELECT guid FROM feeds WHERE feed_id == 'steinarblog';

I took the GUIDs into an emacs SQL buffer, and manipulated them to become update lines, setting the time stamp to be at the time of the update. Then I reversed the lines in the buffer, so that the oldest was first:

Listing 1.

update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=1';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=7';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=14';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=23';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=26';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=33';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=44';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=46';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=53';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=63';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=60';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=66';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=83';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=165';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=171';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=191';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=196';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=209';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=214';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=223';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=224';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=238';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=250';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=255';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=261';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=269';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=265';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=286';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=292';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=306';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=332';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=342';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=353';
update feeds set timestamp = CURRENT_TIMESTAMP where guid = 'https://steinar.bang.priv.no/?p=310';

I manually pasted each line into sqlite3, one line at a time, waiting a couple of seconds between each paste. It was only 34 lines to paste so this seemed the simplest way. I watched television as I was doing this.

The next bit to do was to post the articles in the database, one post per day.

I set up a cronjob to run once a day, posting from the database, and using “–limit 1” to ensure only one post was done each day:

Listing 1.

0 6 * * * /usr/local/bin/pod-feeder --feed-id --summary steinarblog --category-tags --ignore-tag uncategorized --feed-url https://steinar.bang.priv.no/feed/atom/ --limit 1 --pod-url https://diasporapod.no --username steinarb --password xxxxx --quiet

The pod-feeder will only attempt to post entries for 72 hours, so to avoid the unposted entries being forgotten, I added another cronjob bumping the timestamp of unposted articles with 24 hours:

Listing 1.

5 6 * * * /usr/bin/sqlite3 feed.db 'update feeds set timestamp = DATETIME(timestamp, "+1 day") where posted = 0;'

With this setup one article per day will be posted until the feed is empty. At that point I will change the pod-feeder cronjob to run more often than once a day and remove the job bumping the timestamps of unposted entries.

Composing applications with karaf features

I create web applications by first creating a set of OSGi bundles that form the building blocks of the application, and then use karaf features to pull the building blocks together to create complete applications that run inside apache karaf.

The bundles are (in order of initial creation, and (more or less) order of maven reactor build):

  1. A bundle defining the liquibase schema for the application’s database
  2. A services bundle defining the OSGi service for the business logic layer
  3. A bundle defining the in-memory test database, with dummy data, used for unit tests and demo. I use apache derby for the in-memory test database
  4. A bundle defining the business logic and exposing it as an OSGi service
  5. A bundle defining a webcontext in the OSGi web whiteboard and an Apache Shiro Filter connecting to the webcontext and getting authentication and authorization info from authservice
  6. A bundle implementing the application’s web REST API, using the webcontext of the above bundle and connecting to the OSGi web whiteboard, with operations provided by an OSGi service provided by the backend bundle
  7. A bundle implementing the application’s web frontend, connecting to the above webcontext, and communicating with the application’s web REST API
  8. A bundle defining the production database. I use PostgreSQL for the production databases

Creating karaf features using maven

OSGi bundles are jar files with some extra fields added to the MANIFEST.MF, as outlined by the OSGi spec. The maven build of my projects use the maven-bundle-plugin to create jar files that are also OSGi bundles.

“Feature” is, strictly speaking, not an OSGi concept. It’s a mechanism used by apache karaf to robustly load OSGi runtime dependencies in a version and release independent matter.

Apache karaf has many features built-in. Basically everything from apache servicemix and everything from OPS4J (aka “the pax stuff”) can be loaded from built-in features.

Karaf “feature respositories” are XML files that contains feature definitions. A feature definition has a name and can start OSGi bundles, e.g.:

Listing 1.

<features xmlns="http://karaf.apache.org/xmlns/features/v1.5.0" name="handlereg.services">
 <feature name="handlereg-services" version="1.0.0.SNAPSHOT">
  <bundle start-level="80">mvn:no.priv.bang.handlereg/handlereg.services/1.0.0-SNAPSHOT</bundle>
 </feature>
</features>

The above example is a feature repository, containing a feature named “handlereg-services”.

When the feature handlereg-service is installed, it will start the the OSGi bundle in the <bundle> element, referenced with maven coordinates consisting of groupId, artifactId and version.

The karaf-maven-plugin can be used in a bundle maven module to create a feature repository containing a feature matching the bundle built by the maven module, and attach the feature repository to the resulting maven artifact.

In addition to starting bundles, features can depend on other features, which will cause those features to be loaded.

The bundle feature repositories can be included into a master feature repository and used to compose features that make up complete applications, which is what this article is about. See the section Composing features to create an application at the end of this blog post.

Defining the database schema

I use liquibase to create the schemas, and treat schema creation as code.

Liquibase has multiple syntaxes: XML, JSON, YAML and SQL. Using the SQL syntax is similar to e.g. using Flyaway. Using the non-SQL syntaxes gives you a benefit that Flyaway doesn’t have: cross-DBMS support.

I mainly use the XML syntax, because using the Liquibase schema in my XML editor gives me good editor support for editing changelists.

I also use the SQL syntax, but only for data, either initial data for the production database or dummy data for the test data base. I don’t use the SQL syntax for actual database schema changes, because that would quickly end up not being cross-DBMS compatible.

The ER models of my applications are normalized and contain the entities the application is about. At the ER modeling stage, I don’t think about Java objects, I just try to make the ER model fit my mental picture of the problem space.

I start by listing the entities, e.g. for the weekly allowance app

  1. accounts
  2. transactions (i.e. jobs or payments)
  3. transaction types (i.e. something describing the job or payment)

Then I list the connections, e.g. like so

  1. One account may have many transactions, while each transaction belong to only one account (1-n)
  2. Each transaction must have a type , while each transaction type can belong to multiple transactions (1-n)

Then I start coding:

  1. Create a standard OSGi bundle maven project
  2. Import the bundle into the IDE
  3. Create a JUnit test, where I fire up a derby in-memory datatbase
  4. Let the IDE create a class for applying liquibase scripts to a JDBC DataSource
  5. Create a maven jar resource containing the liquibase XML changelog (I create an application specific directory inside src/main/resources/, not because it’s needed at runtime, because resources are bundle local), but I’ve found the need to use liquibase schemas from different applications in JUnit tests, and then it makes things simpler if the liquibase script directories don’t overlap)
  6. Create a method in the JUnit test to insert data in the first table the way the schema is supposed to look, the insert will expectedly fail (since there is no table)
  7. Create a changeset for the first table, e.g. like so

    Listing 2.

    <changeSet author="sb" id="ukelonn-1.0.0-accounts">
     <preConditions onFail="CONTINUE" >
      <not>
       <tableExists tableName="accounts" />
      </not>
     </preConditions>
    
     <createTable tableName="accounts">
      <column autoIncrement="true" name="account_id" type="INTEGER">
       <constraints primaryKey="true" primaryKeyName="account_primary_key"/>
      </column>
      <column name="username" type="VARCHAR(64)">
       <constraints nullable="false" unique="true"/>
      </column>
     </createTable>
    </changeSet>
    

    Some points to note, both are “lessons learned”:

    1. The <preConditions> element will skip the changeSet without failing if the table already exists
    2. The <changeSet> is just for a single table
  8. After the test runs green, add a select to fetch back the inserted data and assert on the results
  9. Loop from 6 until all tables and indexes and constrains are in place and tested

Note: All of my webapps so far, has the logged in user as a participant in the database. I don’t put most of the user information into the database. I use a webapp called authservice to handle authentication and authorization and also to provide user information (e.g. full name and email address). What I need to put into the database is some kind of link to authservice.

The username column is used to look up the account_id which what is used in the ER model, e.g. a transactions table could have a column that is indexed and can be joined with the accounts table in a select.

Some examples of liquibase schema definitions

  1. The sonar-collector database schema, a very simple schema for storing sonarqube key metrics
  2. The authservice database schema
  3. The ukelonn database schema a database schema for a weekly allowance app, this is the first one created and has several mistakes:
    1. The entire schema is in a single changeset, rather than having a changeSet for each table and/or view (the reason for this is that this liquibase file was initially created by dumping an existing database schema and the result was a big single changeset)
    2. No preConditions guard around the creation of each table meant that moving the users table out of the original schema and into the authservice schema became a real tricky operation
  4. The handlereg database schema (a database schema for a groceries registration app)

Some examples of unit tests for testing database schemas:

  1. AuthserviceLiquibaseTest
  2. UkelonnLiquibaseTest
  3. HandleregLiquibaseTest

Defining the business logic OSGi service

Once a datamodel is in place I start on the business logic service interface.

This is the service that will be exposed by the business logic bundle and that the web API will listen for.

Creating the interface, I have the following rough plan:

  1. Arguments to the methods will be either beans or lists of beans (this maps to JSON objects and arrays of JSON objects transferred in the REST API)
  2. Beans used by the business logic service interface are defined the same bundle as the service interface, with the following rules:
    1. All data members are private
    2. All data members have a public getter but no setter (i.e. the beans are immutable)
    3. There is a no-args constructor for use by jackson (jackson creates beans and set the values using reflection)
    4. There is a constructor initializing all data members, for use in unit tests and when returning bean values
  3. Matching the beans with the ER datamodel isn’t a consideration:
    1. Beans may be used by a single method in the service interface
    2. Beans may be denormalized in structore compared to the entities in the ER model (beans typically contains rows from the result of a join in the datamodel, rather than individual entities)

Some examples of business logic service interfaces:

  1. UserManagementService (user adminstrations operations used by the web API of the authservice authentication and authorization (and user management) app)
  2. UkelonnService (the web API operations of a weekly allowance app)
  3. HandleregService (the web API operations of groceries registrations and statistics app)

Note: Creating the business logic service interface is an iterative process. I add methods while working on the implementation of the business logic and move them up to the service interface when I’m satisfied with them.

Creating a test database

The test database bundle has a DS component that exposes the PreHook OSGi service. PreHook has a single method “prepare” that takes a DataSource parameter. An example is the HandleregTestDbLiquibaseRunner DS component from the handlereg.db.liquibase.test bundle in the handlereg groceries shopping registration application:

Listing 3.

@Component(immediate=true, property = "name=handleregdb")
public class HandleregTestDbLiquibaseRunner implements PreHook {
    ...
    @Override
    public void prepare(DataSource datasource) throws SQLException {
        try (Connection connect = datasource.getConnection()) {
            HandleregLiquibase handleregLiquibase = new HandleregLiquibase();
            handleregLiquibase.createInitialSchema(connect);
            insertMockData(connect);
            handleregLiquibase.updateSchema(connect);
        } catch (LiquibaseException e) {
            logservice.log(LogService.LOG_ERROR, "Error creating handlreg test database schema", e);
        }
    }
}

In the implementation of the “prepare” method, the class containing the schema is instantiated, and run to create the schema. Then Liquibase is used directly on files residing in the test database bundle, to fill the database with test data.

To ensure that the correct PreHook will be called for a given datasource, the DS component is given a name, “name=handleregdb” in the above example.

The same name is used in the pax-jdbc-config configuration that performs the magic of creating a DataSource from a DataSourceFactory. The pax-jdbc-config configuration resides in the template feature.xml file of the bundle project, i.e. in the handlereg.db.liquibase.test/src/main/feature/feature.xml file. The pax-jdbc-config configuration in that template feature.xml, looks like this:

Listing 4.

<feature name="handlereg-db-test" description="handlereg test DataSource" version="${project.version}">
 <config name="org.ops4j.datasource-handlereg-test">
  osgi.jdbc.driver.name=derby
  dataSourceName=jdbc/handlereg
  url=jdbc:derby:memory:handlereg;create=true
  ops4j.preHook=handleregdb
 </config>
 <capability>
  osgi.service;objectClass=javax.sql.DataSource;effective:=active;osgi.jndi.service.name=jdbc/handlereg
 </capability>
 <feature>${karaf-feature-name}</feature>
 <feature>pax-jdbc-config</feature>
</feature>

The xml example above, defines a feature that:

  1. Depends on the feature created by the bundle project
  2. Depends on the pax-jdbc-config feature (built-in in karaf)
  3. Creates the following configuration (will end up in the file etc/org.ops4j.datasource-handlereg-test.cfg in the karaf installation):

    Listing 5.

    osgi.jdbc.driver.name=derby
    dataSourceName=jdbc/handlereg
    url=jdbc:derby:memory:handlereg;create=true
    ops4j.preHook=handleregdb
    

    Explanation of the configuration:

    1. osgi.jdbc.driver.name=derby will make pax-jdbc-config use the DataSourceFactory that has the name “derby”, if there are multiple DataSourceFactory services in the OSGi service registry
    2. ops4j.preHook=handleregdb makes pax-jdbc-config look for a PreHook service named “handleregdb” and call its “prepare” method (i.e. the liquibase script runnner defined at the start of this section)
    3. url=jdbc:derby:memory:handlereg;create=true is the JDBC URL, which one third of the conection properties needed to create a DataSource from a DataSourceFactory (the other two parts are username and password, but they aren’t needed for an in-memory test database)
    4. dataSourceName=jdbc/handlereg gives the name “jdbc/handlreg” to the DataSource OSGi service, so that components that waits for a DataSource OSGi service can qualify what service they are listening for

Implementing the business logic

The business logic OSGi bundle defines a DS component accepting a DataSource with a particular name and exposing the business logic service interface:

Listing 5.

@Component(service=HandleregService.class, immediate=true)
public class HandleregServiceProvider implements HandleregService {

    private DataSource datasource;

    @Reference(target = "(osgi.jndi.service.name=jdbc/handlereg)")
    public void setDatasource(DataSource datasource) {
        this.datasource = datasource;
    }

    ... // Implementing the methods of the HandleregService interface
}

The target argument with the value “jdbc/handlereg”, matching the dataSourceName config value, ensures that only the correct DataSource service will be injected.

The implementations of the methods in the business logic service interface all follow the same pattern:

  1. The first thing that happens is that a connection is created in a try-with-resource. This ensures that the database server doesn’t suffer resource exhaustion
  2. The outermost try-with-resource is followed by a catch clause that will catch anything, log the catch and re-throw inside an application specific runtime exception (I really don’t like checked exceptions)
  3. A new try-with-resource is used to create a PreparedStatement.
  4. Inside the try, parameters are added to the PreparedStatement. Note: Parameter replacements in PreparedStatements are safe with respect to SQL injection (parameters are added after the SQL has been parsed)
  5. Then, if it’s a query, the returned ResultSet is handled in another try-with-resource and then the result set is looped over to create a java bean or a collection of beans to be returned

I.e. a typical business logic service method looks like this:

Listing 6.

public List<Transaction> findLastTransactions(int userId) {
    List<Transaction> handlinger = new ArrayList<>();
    String sql = "select t.transaction_id, t.transaction_time, s.store_name, s.store_id, t.transaction_amount from transactions t join stores s on s.store_id=t.store_id where t.transaction_id in (select transaction_id from transactions where account_id=? order by transaction_time desc fetch next 5 rows only) order by t.transaction_time asc";
    try(Connection connection = datasource.getConnection()) {
        try (PreparedStatement statement = connection.prepareStatement(sql)) {
            statement.setInt(1, userId);
            try (ResultSet results = statement.executeQuery()) {
                while(results.next()) {
                    int transactionId = results.getInt(1);
                    Date transactionTime = new Date(results.getTimestamp(2).getTime());
                    String butikk = results.getString(3);
                    int storeId = results.getInt(4);
                    double belop = results.getDouble(5);
                    Transaction transaction = new Transaction(transactionId, transactionTime, butikk, storeId, belop);
                    handlinger.add(transaction);
                }
            }
        }
    } catch (SQLException e) {
        String message = String.format("Failed to retrieve a list of transactions for user %d", userId);
        logError(message, e);
        throw new HandleregException(message, e);
    }
    return handlinger;
}

To someone familiar with spring and spring boot this may seem like a lot of boilerplate, but I rather like it. I’ve had the misfortune to have to debug into spring applications created by others, and to make reports from relational databases with schemas created by spring repositories.

Compared to my bad spring experience:

  1. This is very easy to debug: you can step and/or breakpoint straight into the code handling the JDBC query and unpack
  2. If the returned ResulSet is empty, it’s easy to just paste the SQL query from a string in the Java code into an SQL tool (e.g. Oracle SQL Developer, MS SQL Server Management Studio, or PostgreSQL pgadmin) and figure out why the returned result set is empty
  3. Going the other way, it’s very simple use the databases’ SQL tool to figure out a query that becomes the heart of a method
  4. Since the ER diagram is manually created for ease of query, rather than autogenerated by spring, it’s easy to make reports and aggregations in the database

Defining a webcontext and hooking into Apache Shiro

This bundle contains a lot of boilerplate that will be basically the same from webapp to webapp, except for actual path of the webcontext. I have created an authservice sample application that is as simple as I could make it, to copy paste into a bundle like this.

As mentioned in the sample application, I use a webapp called “authservice” to provide both apache shiro based authentication and authorizaton, and a simple user managment GUI.

Authservice has been released to maven central and can be used in any apache karaf application by loading authservice’s feature repository from maven central and then installing the appropriate feature.

All of my web applications have a OSGi web whiteboard webcontext that provides the application with a local path, and is hooked into Apache Shiro for authorization and authentication.

The bundle contains one DS component exposing the WebContextHelper OSGi service that is used to create the webcontext, e.g. like so:

Listing 7.

@Component(
    property= {
        HttpWhiteboardConstants.HTTP_WHITEBOARD_CONTEXT_NAME+"=sampleauthserviceclient",
        HttpWhiteboardConstants.HTTP_WHITEBOARD_CONTEXT_PATH+"=/sampleauthserviceclient"},
    service=ServletContextHelper.class,
    immediate=true
)
public class AuthserviceSampleClientServletContextHelper extends ServletContextHelper { }

The bundle will also contain a DS component exposing a servlet Filter as an OSGi service and hooking into the OSGi web whiteboard and into the webcontext, e.g. like so:

Listing 8.

@Component(
    property= {
        HttpWhiteboardConstants.HTTP_WHITEBOARD_FILTER_PATTERN+"=/*",
        HttpWhiteboardConstants.HTTP_WHITEBOARD_CONTEXT_SELECT + "=(" + HttpWhiteboardConstants.HTTP_WHITEBOARD_CONTEXT_NAME +"=sampleauthserviceclient)",
        "servletNames=sampleauthserviceclient"},
    service=Filter.class,
    immediate=true
)
public class AuthserviceSampleClientShiroFilter extends AbstractShiroFilter { // NOSONAR

    private Realm realm;
    private SessionDAO session;
    private static final Ini INI_FILE = new Ini();
    static {
        // Can't use the Ini.fromResourcePath(String) method because it can't find "shiro.ini" on the classpath in an OSGi context
        INI_FILE.load(AuthserviceSampleClientShiroFilter.class.getClassLoader().getResourceAsStream("shiro.ini"));
    }

    @Reference
    public void setRealm(Realm realm) {
        this.realm = realm;
    }

    @Reference
    public void setSession(SessionDAO session) {
        this.session = session;
    }

    @Activate
    public void activate() {
        IniWebEnvironment environment = new IniWebEnvironment();
        environment.setIni(INI_FILE);
        environment.setServletContext(getServletContext());
        environment.init();

        DefaultWebSessionManager sessionmanager = new DefaultWebSessionManager();
        sessionmanager.setSessionDAO(session);
        sessionmanager.setSessionIdUrlRewritingEnabled(false);

        DefaultWebSecurityManager securityManager = DefaultWebSecurityManager.class.cast(environment.getWebSecurityManager());
        securityManager.setSessionManager(sessionmanager);
        securityManager.setRealm(realm);

        setSecurityManager(securityManager);
        setFilterChainResolver(environment.getFilterChainResolver());
    }

}

I hope to make the definition and use of the webcontext simpler when moving to OSGi 7, because the web whiteboard of OSGi 7 will be able to use Servlet 3.0 annotations to specify the webcontexts, servlets and filters.

I also hope to be able to remove a lot of boilerplate from the shiro filter when moving to the more OSGi friendly Shiro 1.5.

Implementing a REST API

The REST API for one of my webapps, is a thin shim over the application’s business logic service interface:

  1. I create a DS component that subclasses the Jersey ServletContainer and exposes Servlet as an OSGi interface, hooking into the OSGi web whiteboard and the webcontext created by the web securiy bundle (I have created a ServletContainer subclass that simplifies this process)
  2. The component gets an injection of the application’s business logic OSGi service
  3. The DS component adds the injected OSGi service as a service to be injected into Jersey resources implementing REST endpoints
  4. I create a set of stateless Jersey resources implementing the REST endpoint that gets injected with the applications business logic OSGi service

Some examples of web APIs:

  1. A user management REST API wrapping the UserManagement OSGi service
  2. The REST API of the weekly allowance app, wrapping the UkelonnService OSGi service
  3. The REST API of the groceries registration app, wrapping the HandleregService OSGi service

I have also created a sample application demonstrating how to add OSGi services to services injected into stateless Jersey resources implementing REST endpoints.

Implementing a web frontend

Composing features to create an application

At this point there is a lot of building blocks but no application.

Each of the building blocks have their own feature repository file attached to the maven artifact.

What I do is to manually create a feature repository that imports all of the generated feature repositories and then hand-write application features that depends on a set of the building block features. I don’t involve the karaf-maven-plugin in this because I only want to load the other feature repositories. I don’t want to inline the contents. I use the maven-resources-plugin resource filtering to expand all of the maven properties, and then use the build-helper-maven-plugin to attach the filtered feature repository to a pom maven artifact.

Some examples of manually created feature repositories:

  1. The authservice authentication and authorization and user management application master feature repository, where the handwritten features are:
    1. authservice-with-dbrealm-and-session which pulls in everything needed for karaf authentication and authorization against a JDBC realm, except for the actual database connection. This feature pulls in none of the user adminstration support of authservice
    2. authservice-with-testdb-dbrealm-and-session which builds on authservice-with-dbrealm-and-session and adds a derby test database with mock data
    3. authservice-with-productiondb-dbrealm-and-session which builds on authservice-with-dbrealm-and-session and adds a PostgreSQL database connection
    4. authservice-user-admin which builds on authservice-with-dbrealm-and-session and adds user adminstration, but pulls in no actual JDBC database
    5. user-admin-with-testdb which builds on authservice-user-admin and adds a derby test database with mock data
    6. user-admin-with-productiondb which builds on authservice-user-admin and adds a PostgreSQL database connection
  2. The ukelonn weekly allowance application master feature repository, where the handwritten features are:
    1. ukelonn-with-derby which pulls in all bundles needed to start the weekly allowance app with a database with mock data, and also pulls in the authentication and authorization app, also with an in-memory database with mock data (no user administration UI pulled in, since the weekly allowance app has its own user administration)
    2. ukelonn-with-postgresql which pulls in all bundles needed to start the weekly allowance app with a JDBC connection to a PostgreSQL database, and also pulls in the authentication and authorization app connected to a PostgreSQL database
    3. ukelonn-with-postgresql-and-provided-authservice which pulls in the weekly allowance app with a PostgreSQL JDBC connection and no authorization and authentication stuff. This feature won’t load if the authservice application hasn’t already been loaded
  3. The handlereg groceries registration application master feature, where the handwritten features are:
    1. handlereg-with-derby starts the application with a test database and also pulls in authservice (that’s the <feature>user-admin-with-testdb</feature> which actually pulls the full user administration application (with a derby test database))
    2. handlereg-with-derby-and-provided-authservice is the same as handlereg-with-derby except for not pulling in authservice. This requires the authservice to already be installed before this feature installed, but has the advantage of not uninstalling authservice when this service is uninstalled
    3. handlereg-with-postgresql starts the application with a PostgreSQL database connection and authservice
    4. handlereg-with-postgresql-and-provided-authservice starts the application with a PostgreSQL database and no autheservice. This is actually the feature used to load handlereg in the production system (since it means the feature can be uninstalled and reinstalled without affecting other applications)

As an example, the handlereg-with-derby feature mentioned above looks like this.

Listing 9.

<feature name="handlereg-with-derby" description="handlereg webapp with derby database" version="${project.version}">
 <feature>handlereg-db-test</feature>
 <feature>handlereg-web-frontend</feature>
 <feature>user-admin-with-testdb</feature>
 <feature>handlereg-backend-testdata</feature>
</feature>

To start the composed application, install and start apache karaf, and from the karaf console, first load the master feature repository and then install the manually composed feature:

Listing 10.

feature:repo-add mvn:no.priv.bang.handlereg/handlereg/LATEST/xml/features
feature:install handlereg-with-derby

How to get test coverage back in sonarcloud maven builds

I use travis-ci to build my github projects and use sonarcloud to do analysis of the builds.

In the start of January 2020, the test coverage percentage on all sonarcloud projects suddenly dropped to 0%.

This blog post explains why coverage percentage dropped to 0% and how to get the test coverage back in the sonarcloud reports.

The explanation assumes familiarity with apache maven.

The reason coverage suddenly dropped to 0%, was that sonarcloud stopped retrieving coverage information from the target/jacoco.exec files.

The way to get test coverage back in sonarcloud is to first let jacoco-maven-plugin generate an XML format test coverage file from the jacoco.exec file(s), and then point sonar-scanner-maven to the XML file instead of the default (which still are the jacoco.exec files).

Before explaining how to fix the build, I’ll recap how thing used to work.

The .travis.yml build command for sonar analysis at the time the test coverage started dropping to 0%, was:

Listing 1.

mvn -B clean org.jacoco:jacoco-maven-plugin:prepare-agent package sonar:sonar

which means:

  1. clean means delete all target directories in the current project and modules
  2. org.jacoco:jacoco-maven-plugin:prepare-agent means “set up jacoco to be the maven test runner container”
  3. package means that maven runs all of the java build steps up to and including test, and since jacoco is the test runner, leave a jacoco.exec file in the target directory
  4. sonar:sonar invokes the sonar-scanner-maven maven plugin to extract information from the reactor build and send to the sonar instance config points it to.

By default sonar-scanner-maven will look for jacoco.exec files, and send them to the sonar instance, whether the sonar instance understands them or not.

Note: Somewhat confusingly for local debugging, the downloadable community edition sonarqube distributions did not drop support for the jacoco.exec files at the same time, in fact a release I downloaded yesterday still supported jacoco.exec.

Getting sonarcloud test coverage back in a single-module maven project

In a single module maven project, you will need to the the following things:

  1. To generate a coverage XML file, add a maven configuration for the jacoco-maven-plugin binding the plugin’s report goal to the maven verify phase (the phase is picked because it is after the tests have run and the target/jacoco.exec file is created):

    Listing 1.

    <project>
     <build>
      <plugins>
       <plugin>
        <groupId>org.jacoco</groupId>
        <artifactId>jacoco-maven-plugin</artifactId>
        <version>0.8.5</version>
        <executions>
         <execution>
          <id>report</id>
          <goals>
           <goal>report</goal>
          </goals>
          <phase>verify</phase>
         </execution>
        </executions>
       </plugin>
      </plugins>
     </build>
    </project>
    
  2. In the .travis.yml file (or your equivalent), in the sonar generating command line, switch from “package” to “verify” to ensure that the coverage XML file is generated with content, by running it after the target/jacoco.exe file has been created:

    Listing 1.

    script:
        - mvn clean org.jacoco:jacoco-maven-plugin:prepare-agent verify sonar:sonar
    

Getting sonarcloud test coverage back in a multi-module maven project

A multi-module maven project uses a slightly different approach: an aggregate coverage file is created by the maven modules’ jacoco.exec file and sonar-scanner-maven is pointed to this file.

The changes to get coverage back in a multi-module project, are:

  1. Add a new maven module just for aggregating the coverage results, with a pom binding the jacoco-maven-plugin aggregate-report to the verify phase, and a dependency list containing the modules that aggregate-report should scan for target/jacoco.exec files

    Listing 1.

    <project>
     <parent>
      <groupId>no.priv.bang.demos.jerseyinkaraf</groupId>
      <artifactId>jerseyinkaraf</artifactId>
      <version>1.0.0-SNAPSHOT</version>
     </parent>
     <artifactId>jacoco-coverage-report</artifactId>
     <dependencies>
      <dependency>
       <groupId>no.priv.bang.demos.jerseyinkaraf</groupId>
       <artifactId>jerseyinkaraf.servicedef</artifactId>
       <version>${project.version}</version>
      </dependency>
      <dependency>
       <artifactId>jerseyinkaraf.services</artifactId>
       <groupId>no.priv.bang.demos.jerseyinkaraf</groupId>
       <version>${project.version}</version>
      </dependency>
      <dependency>
       <groupId>no.priv.bang.demos.jerseyinkaraf</groupId>
       <artifactId>jerseyinkaraf.webapi</artifactId>
       <version>${project.version}</version>
      </dependency>
      <dependency>
       <groupId>no.priv.bang.demos.jerseyinkaraf</groupId>
       <artifactId>jerseyinkaraf.webgui</artifactId>
       <version>${project.version}</version>
      </dependency>
     </dependencies>
     <build>
      <plugins>
       <plugin>
        <groupId>org.jacoco</groupId>
        <artifactId>jacoco-maven-plugin</artifactId>
        <executions>
         <execution>
          <id>report</id>
          <goals>
           <goal>report-aggregate</goal>
          </goals>
          <phase>verify</phase>
         </execution>
        </executions>
       </plugin>
      </plugins>
     </build>
    </project>
    
  2. Add a maven property to the top POM telling the sonar-scanner-maven plugin where to look for the aggregate coverage XML file (deep in the structure of the new module creating the aggregate coverage file):

    Listing 1.

    <project>
     <properties>
      <sonar.coverage.jacoco.xmlReportPaths>${project.basedir}/../jacoco-coverage-report/target/site/jacoco-aggregate/jacoco.xml</sonar.coverage.jacoco.xmlReportPaths>
     </properties>
    </project>
    
  3. Switch “package” in the sonar command line in .travis.yml, to at least the phase running sonar-scanner-maven plugin (i.e. “verify”), but later phases such as “install”, works as well:

    Listing 1.

    script:
        - mvn clean org.jacoco:jacoco-maven-plugin:prepare-agent verify sonar:sonar
    

Simplified REST APIs from karaf using Jersey

I have written the Java class JerseyServlet which is intended as a base class for DS (Declarative Services) components providing Servlet services to the OSGi web whiteboard.

The JerseyServlet simplifies the approach outlined in Use Jersey to provide REST APIs from karaf applications.

The JerseyServlet extends the Jersey ServletContainer to add two things:

  1. A default value for the list of Java packages that are scanned for Jersey resources (classes implementing REST API endpoints)
  2. A map of OSGi services to service implementations that will be used to register DS service injections with the HK2 dependency injection container, so they can be injected into the jersey resources

The default value for resources is the subpackage “.resources” of the package the DS component recides in. I.e. if the DS component is defined like this:


then no.priv.bang.demos.jerseyinkaraf.webapi.resources will be scanned for resources implementing REST API endpoints.

HK2 dependency injection of OSGi services are done in the following way:

  1. The DS component gets OSGi service injections for the services it needs to pass to the resources

    Note: The LogService gets special treatment, since it used by the servlet itself to log problems adding services to HK2, but it is added to HK2 itself as well
  2. The resources uses @Inject annotations to get the services injected when jersey creates the resources in response to REST API invocations

To use JerseyServlet in a web whiteboard DS component residing in an OSGi bundle created by maven add the following dependencies:


The feature xml dependency is used by the karaf-maven-plugin to create a karaf feature file that can be used to pull in an OSGi bundle and all of its runtime dependencies into a karaf instance.

The <provided> dependencies are needed for compilation. Since provided dependencies aren’t transitive, all of the dependencies needed to resolve the classes are needed. E.g. the maven dependency containing the JerseyServlet’s parent class must be listed explicitly, even though that class isn’t used in the java source code. But without this dependency the code won’t compile.

The provided scope is used to make maven-bundle-plugin and karaf-maven-plugin do the right thing:

  1. The maven-bundle-plugin will create imports for the packages of the classes used from provided dependencies in the bundle’s manifest.mf file
  2. The karaf-maven-plugin will not add provided bundles as bundles to be loaded in the features i generates and attaches to maven bundle projects, and instead add feature dependencies to features that load the bundles (

To use maven-bundle-plugin and karaf-maven-plugin, first add the following config to the <dependencyManagement> section of your top pom:

Then reference maven-bundle-plugin and karaf-maven-plugin in all module pom files with packaging jar, to get an OSGi compliant manifest.mf and an attached karaf feature repository:

Simplified delivery of react.js from apache karaf

This article is about a new servlet base class I have created to simplify serving up javascript frontends from the OSGi web whiteboard.

This article won’t go into the structure of the files that must be served. See Deliver react.js from apache karaf and A Java programmers guide to delivering webapp frontends to get an overview of files.

The short story is that the servlet needs to deliver two files:

  1. An index.html containing the initial DOM tree of the web application (if loaded in a web browser that will be all that’s shown), e.g. like this
  2. A bundle.js file containing the javascript of the application as well as all dependencies, all packed together to be as compact as possible

The index.html file and the bundle.js files are added as classpath resources in the OSGi bundle containing the servlet.

The index.html file is added by putting it into src/main/resources of the maven project for the OSGi bundle.

The bundle.js file is added to the classpath by letting the frontend-maven-plugin drop the created bundle.js into target/classes/

Everything covered so far is identical to what’s described in the first blog post.

The difference is how much simpler creating the servlet component is.

A web whiteboard DS component for serving a frontend without any routing could look like this:

The LogService service injection is passed to the FrontendServlet base class, where it is used to report problems when serving files.

If the web application has a router that can be used to edit the local part, like e.g. so


then the servlet needs to have matching routes (that’s the setRoutes() call in the constructor):

The paths are used when reloading a subpage, e.g. by pressing F5 in the web browser. All paths will return the index.html file which in turn will load the bundle.js which will navigate to the correct sub-page based on the path.

Note that the react paths have the application web context, while the paths on the java side are without the application web part:

React routes Java routes
/frontend-karaf-demo/ /
/frontend-karaf-demo/counter /counter
/frontend-karaf-demo/about /about

My OSGi story

OSGi is a Java plugin framework and module system that were initially created for supporting embdedded applications in the automotive industry. OSGi also forms the foundations for the Eclipse IDE plugin model. The plugin model of OSGi consists of components waiting for services and starting and exposing services when all of the dependencies are satsified. A “service” in OSGi terminology, is a Java interface.

I first encountered OSGi in 2006. The company I worked from used OSGi as the basis for an ETL processing system implementing various processing elements as OSGi plugins plugging into the processing framework. Since writing OSGi activators (which was the state of the art for OSGi plugins back in the day) is kind of boring, we created a Dependency Injection system on top of OSGi.

We also used an Eclipse based graphical editor to specify the ETL flows. Since both the processing engine and eclipse used OSGi we made a custom eclipse distro containing the processing engine, and hooked up eclipse debugging support, and were able to execute and debug flows in the graphical editor.

I was pleasantly surprised how well the two applications worked together, and how little work it took to put them together, even though they used different ways of implementing plugins on top OSGi.

For various reasons the OSGi usage was scaled down from 2008, abandoned in 2010, and I left the company in 2013.

But I remembered the nice “lego-feeling” of the OSGi plugin model, and in 2015 when I started tinkering with some hobby projects I started using OSGi again.

I had a bit of re-introduction to OSGi when writing plugins for Jira and Confluence in 2014. Atlassian’s software stack is OSGi based (or at least: it was in 2014. I haven’t checked since then). And the plugins were OSGi bundles using Spring (or very Spring-like) annotations for dependency injection.

In 2015, discovered a free ebook from Sonatype, called The Maven Cookbook (the download page with various formats is gone, but the online HTML version is available, and the PDF version of the book can be downloaded. I downloaded and still have, the epub version).

The Maven Cookbook chapter Cooking with Maven and OSGi gave a receipe for creating OSGi bundles and using gogo shell to verify that the bundle(s) of the project were able to load.

Writing OSGi bundle activators is boring and involves writing a lot of boilerplate, so I investigated the current state of OSGi dependency injection. I found Spring dynamic modules, iPOJO and apache aries blueprint. I wasn’t quite sure where I was going with this, but I knew I wanted something that could run both inside eclipse and in a web server setting, and that ruled out the DS implementations I had found and basically left just writing bundle activators.

But as mentioned before, writing bundle activators is boring.

So the first thing I did, was to write an embeddable bundle activator that could be slurped into bundles and scan the bundles for classes containing javax.inject annotations, and register listeners for injections with the OSGi framework, and instantiate and offer up services.

Much to my surprise the bundle activator actually worked, and I used it as the basis for my web applications until the summer of 2017.

The first thing I tried writing this way was a datamodel framework intended as the storage for a graphical editor. It got part of the way but didn’t (or hasn’t yet) end up into something actually useful. But perhaps one day…? I rewrote it to from Jsr330Activator to DS and now have a much better idea than in 2015 how to do JDBC connections, database schema setup and web server components.

In the summer of 2016 I started on an application for handling my kids’ weekly allowance, and decided that this would be my project for testing out various OSGi technologies. First out were pax-jdbc and pax-web.

In the autumn of 2016 I was convinced to try apache karaf, and I once I had tried it I never looked back. The stuff that (frankly speaking) had been hard in OSGi: resolving issues with missing packages and/ord missing package versions, when loading OSGi bundles, became a non-issue.

It was easy to remote debug applications running inside karaf, and the ability of karaf to listen for and automatically reload snapshot versions from the local maven repository made for a quick modify and test loop: modify, build with maven and have karaf automatically pick up the rebuilt jar files.

During the summer of 2017 I moved to OSGi 6 and decided to try out OSGi Declarative Services (DS).

Since both my OSGi targeted platforms (eclipse and karaf) by then were running OSGi 6, I finally had a component model/dependency injection system, that would work everywhere. So I quickly ended up replacing Jsr330Activator in all of my applications and retired Jsr330Activator (it’s still available on github and maven central, but I don’t use it myself anymore and won’t do any updates).

In conclusion: Apache karaf and “the pax stuff” delivers on all the promises of OSGi in 2006, and removes the traditional hassle with mysterious failing imports. Things basically just works with karaf. I currently have karaf as my preferred platform.

Rewriting applications to use pax-jdbc-config and liquibase

After creating the post Pluggable databases for apache karaf applications I posted a link to the blog post in the karaf user mailing list, and the immediate response was, “why didn’t I just pax-jdbc-config instead?“.

The answer to that is that I didn’t know about pax-jdbc-config. I started using pax-jdbc in the summer of 2016, and started using apache karaf in the autumn of 2016 and pax-jdbc-config didn’t exist then (or at least: not as complete and usable as it became in 2017), and any announcement that has gone past since then, has not registered.

But I completely agree with the sentiment expressed in the pax-jdbc-config wiki page:

For some cases you want the actual DataSource available as a service in OSGi. Most tutorials show how to do this using blueprint to create a DataSource and publish it as a service. The problem with this approach is that you have to create and deploy a blueprint file for each DataSource and the blueprint is also Database specific.

So I decided to try pax-jdbc-config out.

First out was the sonar-collector. This is the smallest web application I have, that is using JDBC and liquibase. Properly speaking, it isn’t a web application, but it uses web application technology. It is a single servlet that serves as a webhook that sonarqube can call after completing analysis of a software project, and records some key sonarqube metrics in a single database table. The purpose of sonar-collector was to be able to aggregate statistics over time to be able to show a gradual improvement of a set of applications. I.e. to show that each release was a little bit better instead of a little bit worse, according to the metrics. And since sonarqube by itself didn’t save those metrics in an easily accessible form, we started out by manually collecting metrics, applications, versions and dates in a spread sheet.

Since collecting numbers manually and punching them back into a different program is boring, I looked into what possibilities there were to extract the numbers from sonarqube. And I found that the simplest way to collect them was to collect them each the time sonarqube completed an analysis. So sonar-collector is a single servlet implemented as an OSGi declarative services component, that registers as a servlet OSGi service with the OSGi web whiteboard, and on each invocation collects the metrics of the sonar analysis invoking the servlet, and writes the metrics as a new row in a database table.

The first version of sonar-collector used PostgreSQL and was pretty much tied to PostgreSQL, even though it didn’t do anything PostgreSQL specific: the liquibase schema setup and the single insert statement should work fine on any DBMS supported by liquibase (basicallly all of the DBMSes supported by JDBC), and JDBC.

The reason the first version of sonar-collector was tied to PostgreSQL, was that it had no way to qualify what DataSourceFactory the component should receive. And since sonar-collector’s karaf feature loaded and started the PostgreSQL DataSourceFactory (by having the postgresql JDBC driver as a compile scope maven dependency), PostgreSQL was pretty much all it got.

The changes to adapt sonar-collector to using pax-jdbc-config, were:

  1. Removal of all of the Java code and tests releated to connecting to a JDBC source
  2. Changing the DS injection of an unqualified DataSourceFactory OSGi sevice, with an application specific DataSource OSGi service
  3. Add the default pax-jdbc-configuration to the template feature.xml file

What didn’t need to be changed was the way the sonar-collector runs liquibase. The servlet ran the liquibase scripts from the servlet DS component activate method, since this method only will be called when a valid, working, connected DataSource is available.

How much code actually was removed was a bit of an eye opener, because I did think of the JDBC connection code as “a little bit of boilerplate, but not all that much”. But if you try to make configuration option configuration robust, and then try to get a good test coverage of the code, handling all sorts of exceptions, then you end up with quite a few lines of java code.

In any case, I was satisfied with the end result: I removed a lot of Java code, and ended up with a sonar-collector that can use any RDBMS that is supported by JDBC to store the results.

The next candidate for switching to pax-jdbc-config was authservice.

Authservice is a webapp, that:

  1. Can provide nginx with forms-based authentication
  2. Can provide other webapps with apache shiro-based authentication and role-based authorization
  3. A JDBC user/role/permission database with:
    1. a react-redux based administration user interface
    2. “self-service” web pages (static HTML styled with bootstrap) for letting users change their passwords, name and email
    3. an OSGi service other webapps can use to get information from the user database and modify the user database

In theory the JDBC based user database could be swapped with an LDAP service, using an LDAP realm and wrapping the UserManagementService over admin operations, and be used with both the admin UIs of authservice and other webapps using authservice, without any of them feeling any difference. We’ll see if we ever get there.

The authservice webapp has two databases:

  1. a derby in-memory database initialized with dummy data, and used for unit tests, integration tests, and for demonstrating the application
  2. a PostgreSQL database for use by the application in production

Three bundles of authservice were related to database connection and setup:

  1. an OSGi library bundle containing the liquibase scripts for the database schema as classpath resources and some code to load and run the scripts (in OSGi, code that wishes to load resources from the classpath, needs to reside in the same bundle as the resources)
  2. a bundle with a DS component expecting a DataSourceFactory OSGi service injection, creating an in-memory derby database, setting up the schema and adding dummy data, and finally exposing an AuthserviceDatabaseService OSGi service
  3. a bundle with a DS component expecting a PostgreSQL specific DataSourceFactory OSGi service injection, connecting to a PostgreSQL server, setting up the schema and adding some initial data (currently none here)

The OSGi library bundle could be left as it was.

The changes to adapt authservice to pax-jdbc-config, was:

  1. Rename the bundles to something more describing
    1. Rename the test database setup bundle from authservice.db.derby.test to authservice.db.liquibase.test
    2. Rename the production database setup bundle from authservice.db.postgresql to authservice.db.liquibase.test
  2. Rename the DS components to something more describing, remove all of the JDBC connection setup code, remove the schema setup code from the activate method, and expose the PreHook OSGi service:
    1. Rename the test database DS component from DerbyTestDatabase to TestLiquibaseRunner
    2. Rename the production DS component from PostgresqlDatabase to ProductionLiquibaseRunner
  3. Add a feature pax-jdbc-config config to the template feature.xml files:
    1. Add pax-jdbc-config configuration to the template feature of the test database
    2. Add pax-jdbc-config configuration to the template feature of the production database

After the changes the application consisted of the same number of OSGi bundles, but now the application is no longer tied to derby and PostgreSQL, but can be configured at runtime to use different DBMSes.

The final application to modify, was an application called “ukelonn”, which was used to register performed household chores, and payment of the resulting allowance.

The application was similar in structure to the authservice application. Like authservice. the allowance application has both a derby test database and a PostgreSQL production database. And also like authservice, the allowance application uses liquibase to create the schemas and populate the database with initial data.

However, it had a complicating factor that was that ukelonn’s application specific DatabaseService subtype, was also used to return DBMS specific SQL for PostgreSQL and derby for aggregating over years and months.

Using aggregating over years as an example (i.e. use the simplest query as the example), in derby it looked like this:

select sum(t.transaction_amount), YEAR(t.transaction_time)
  from transactions t
  join transaction_types tt on tt.transaction_type_id=t.transaction_type_id
  join accounts a on a.account_id=t.account_id
  where tt.transaction_is_work and a.username=?
  group by YEAR(t.transaction_time)
  order by YEAR(t.transaction_time)

and in the PostgreSQL it looked like this:

select sum(t.transaction_amount), extract(year from t.transaction_time) as year
  from transactions t
  join transaction_types tt on tt.transaction_type_id=t.transaction_type_id
  join accounts a on a.account_id=t.account_id where tt.transaction_is_work and a.username=?
  group by extract(year from t.transaction_time)
  order by extract(year from t.transaction_time)

The difference in SQL syntax was handled by letting the derby DatabaseService and the PostgreSQL DatabaseService return different strings from the sumOverYearQuery() method, and a grouping over year and month, was handled in the same way in the sumOverMonthQuery() method.

I pondered various ways of doing this, including creating two extra bundles to be able to provide this for derby and PostgreSQL, but that would have meant that any new DBMS introduced would need to get its own bundle, and that would raise the threshold for actually using a different DBMS considerable.

The solution I landed on was to let liquibase create different views depending on the DBMS used, and then do a select over the view in the business logic class.

The changes to adapt ukelonn to pax-jdbc-config, then was:

  1. Add the new DBMS specific views
  2. Change the business logic to use the new views instead of asking the database service for the queries to use
  3. Rename the database service bundles to something more describing for liquibase hooks
    1. Rename ukelonn.db.derbytest to ukelonn.db.liquibase.test
    2. Rename ukelonn.db.postgresql to ukelonn.db.liquibase.production
  4. Rename the DS components for the database services to something more describing for liquibase hooks, expose the PreHook OSGi service and remove the JDBC connection code:
    1. Change UkelonnDatabaseProvider to TestLiquibaseRunner
    2. Change PGUkelonnDatabaseProvider to ProductionLiquibaseRunner
  5. Add pax-jdbc-config based features to the feature repositories of the test and production liquibase hooks
    1. Configuration feature ukelonn-db-test
    2. Configuration feature ukelonn-db-production
  6. Use the pax-jdbc-config based features when composing the application

In conclusion

  1. Using pax-jdbc-config allows me to use different DBMSes than the default derby and PostgreSQL DBMSes, without changing any Java code: I need to add the JDBC driver for the RDBMS and need to change the pax-jdbc-config configuration to use the correct DataSourceFactory and set the JDBC connection properties (url, username and password)
  2. Replacing the old way of setting up databases with pax-jdbc-config let me remove a lot of Java code, both actual code for doing the JDBC connections, and probably three times as much code for the unit tests of the JDBC connection code
  3. Using liquibase to set up different views for different DBMSes where the syntax varies was a lot cleaner and easier to test than the previous approach (i.e. having the DatabaseService implementations return different SQL strings for different databases)

How I learned about linux’ “OOM Killer”

This blog post describes how I discovered a linux feature called “OOM Killer” that can have strange effects if it interrupts a program at a place where it really shouldn’t be interrupted.

I have a low-end VPS (Virtual Private Server), or at least: it used to be low-end, now it’s at least one step above lowest and cheapest.

On this server I’m running various personal stuff: email (with spamassassin), some public web pages, and some private web pages with various small webapps for the household (e.g. the “weekly allowance app” registering work done and payments made).

I had noticed occasional slow startups of the webapps, and in particular in September this year, when I was demonstrating/showing off the webapps at this year’s JavaZone the demos were less than impressive since the webapps took ages to load.

I was quick to blame the slowness on the wi-fi, but as it turns out, that may have been completely unfair.

The webapps had no performance issues on my development machine even when running with a full desktop, IDE and other stuff.  The webapps on the VPS also seemed to have no performance issues once they loaded.

I thought “this is something I definitely will have to look into at some later time…” and then moved on with doing more interesting stuff, i.e. basically anything other than figuring out why the webapps on a VPS were slow to start.

But then the webapps started failing nightly and I had to look into the problem.

What I saw in the logs was that the reason the webapps were broken in the morning was that they were stuck waitning for a liquibase database changelog lock that never was released.

Liquibase is how my webapps set up and update database schemas. Every time a webapp starts it connects to the database and checks what liquibase scripts that have been run against that database and applies the ones that have not already been run. The list of scripts that have been run is a tabled called databasechangelog. And to avoid having more than one liquibase client attempting to modify the database schema, liquibase uses a different table called databasechangeloglock to moderate write access to the database,

I.e. the databasechangeloglock is just a database table that has one or 0 rows. A liquibase client tries to insert a lock into the table at startup and waits and retries if this fails (and eventually completely fails).

In my case the webapps were failing because they were hanging at startup, trying to get a liquibase lock and failing to get one and were hanging in limbo and never completing their startup process. Manually clearing the lock from the table and restarting the webapps made the webapps start up normally. However, the next day the webapps were failing again for the same reason: the webapps were stuck waiting for a liquibase lock.

I initially suspected errors in my code, specifically in the liquibase setup. But googling for similar problems, code examination and debugging revealed nothing. I found nothing because there was nothing to be found.  The actual cause of the problem had nothing to do with the code or with liquibase.

I run my webapps in an instance of apache karaf that is started and controlled by systemd. And I saw that karaf was restarted 06:30 (or close to 06:30) every morning. So my next theory was that systemd for some reason decided to restart karaf 06:30 every morning.

No google searches for similar symptoms found anything interesting.

So I presented my problem to a mailing list with network and linux/unix experts and got back two questions:

  1. Was something else started at the same time?
  2. Did that something else use a lot of memory and trigger the OOM killer?

And that turned out to be the case.

I have been using and maintaining UNIX systems since the mid to late 80ies and setting up and using and maintaining linux systems since the mid to late 90ies, but this was the first time I’d heard of the OOM killer.

The OOM killer has been around for a while (the oldest mention I’ve found is from 2009), but I’ve never encountered it before.

The reason I’ve never encountered it before is that I’ve mostly dealt with physical machines. Back in the 80ies I was told that having approximately two and a half times physical memory was a good rule of thumb for scaling swap space, so that’s a rule I’ve followed ever since (keeping the ratio as the number of megabytes increased, eventually turning into gigabytes).

And when you have two and a half times the physical memory as a fallback, you never encounter the conditions that make the OOM killer come alive.  Everything slows down and the computer starts trashing before the condtions that triggers the OOM killer comes into play.

The VPS on the other hand, has no swap space. And with the original minimum configuration (1 CPU core, 1GB of memory), if it had been a boat it would have been said to be riding low in the water. It was constantly running at a little less than the available 1GB. And if nothing special happened, everything ran just fine.

But when something extraordinary happened, such as e.g. spamassassin’s spamd starting at 06:30 and requiring more memory than was available, then OOM started looking for a juicy fat process to kill, and the apache karaf process was a prime cadidate (perhaps because of “apache” in its name combined with OOM killer’s notorious hatred of feathers?).

And then systemd discovered that one of it’s services had died and immediately tried to restart it, only to have OOM killer shoot it down, and this continued for quite a while.

And in one of the attempted restarts, the webapp got far enough to set the databasechangeloglock before it was rudely shot down, and the next time(s) it was attempted started it got stuck waiting for a lock that would never be released.

The solution was to bump the memory to the next step, i.e. from 1GB to 2GB. Most of the time the VPS is running at the same load as before (i.e. slightly below 1GB) but now a process that suddenly requires a lot of memory no longer triggers the OOM killer and everything’s fine.  Also the available memory is used for buff/cache and everything becomes much faster.

I bumped the memory 8 weeks ago and the problem hasn’t occurred again, so it looks like (so far) the problem has been solved.

Pluggable databases for apache karaf applications

Edit: I no longer use this approach. I use pax-jdbc-config instead.  See the article Rewriting applications to use pax-jdbc-config and liquibase for details

When creating databases for my apache karaf based web applications, I want the following things:

  1. A database-independent schema creation using liquibase (i.e. supporting multiple relational database systems)
  2. Pluggable databases (each web application typically has an in-memory derby database for testing and evaluation and a PostgreSQL database for production use)
  3. The ability to run multiple applications in the same apache karaf instance without the application databases colliding
  4. The possibility to configure the JDBC connection information using the karaf configuration

The way I accomplish this, is:

  1. Create an application specific OSGi service for connecting to the database
  2. Create declarative services (DS) components implementing the service for each database system I wish to support
  3. Create a bundle implementing the calls to liquibase (and containing the liquibase scripts as resources) and use this bundle from all DS components implementing the database service
  4. Use high level karaf features to compose my webapps, with variants for each supported database system

OSGi 101: when I talk about “OSGi service” I mean “Java Interface”. Java interfaces form the basis of the OSGi plug-in model. When an OSGi bundle (a “bundle” is a JAR file with some OSGi specific headers added to its MANIFEST.MF) is loaded it can have dependencies to services provided by other bundles.  When those dependencies are satsified the bundle can go active and provide its own services to other bundles that may be listening.

I have created a Java interface called DatabaseService that implements the necessary operations I do against a JDBC database, and they aren’t many:

  • getDatasource() that will return a DataSource  connected to a relational database with the expected schema (Note: if connecting to the database fails, the DS component providing the service never goes active, so the only way code ever gets to call this method is if the connection was successful. This simplifies the code using the method)
  • getConnection() that returns a Connection to the database. This is a convenience method, because what the implementations actually do, is call Datasource.getConnection().  It makes sense to have this method in the interface, because:
    1. Fetching the connection is the only thing done with DataSource, so that it makes all usages a little bit simpler and shorter
    2. The DS components providing the service all have ended up implementing this method anyway, in the calls to liquibase

The application specific DatabaseService definitions are mostly without methods of their own, like e.g.

However, I’ve had one case where I needed to have very different syntax for the different database systems supported, and then I added methods returning database specific versions of the problematic SQL queries to the application specific database service definition:

For the curious: the SQL with different syntax had to do with aggregation over months and years where the SQL looks like this in derby and the SQL looks like this in PostgreSQL.

The next step is to create a bundle holding the liqubase schema definition of the database.  This bundle neither expects OSGi service injection, not exposes any services.  It’s an OSGi library bundle that is like using a regular jar, except that the exports and imports of packages is controlled by OSGi and OSGi-headers in the bundle’s MANIFEST.MF.

More OSGi terminology: “bundle” means a jar-file with an OSGi-compliant MANIFEST.MF.

The liquibase bundles contain a single class and liquibase scripts as resources. In OSGi the resources are bundle local, so other bundles can’t load them directly. But they can instantiate a class from the liquibase bundle and that class can in turn load the resources that resides in the same bundle as the class.

The classes in the liquibase bundles typically contains a method for initial schema setup, and a method for schema modifications.  In addition to the scripts in the liquibase bundle, the database plugins contain their own liquibase scripts (the liquibase scripts of the in-memory derby test database plugins contains sets of hopefully realistic test data, and the liquibase scripts of the PostgreSQL plugins contains initial data such as e.g. a single initial user).

The declarative services (DS) components of the database plugins expect an injection of the DataSourceFactory OSGi service. The DataSourceFactory interface isn’t part of JDBC (like DataSource itself is), but is part of the OSGi service platform.

To ensure that the correct DataSourceFactory is injected it is possible to filter on the osgi.jdbc.driver.name service property (service properties is a Map<String, String> associated with an OSGi service), like e.g.


and

Once all @Reference service injections are satisfied, the method annotated with @Activate is called. This method needs to create a successful connection to the database and run liquibase scripts. A method for a derby memory database can look like this (the full source of the UkelonnDatabaseProvider class is here):


What the method does is to first create a DataSource object (stored in a field and accessible through the DatabaseService interface), and it then creates a connection in a try-with-resource and runs the liquibase scripts using that connection before the try-with-resource releases the connection. The scripts are typically an initial schema, followed by initial data for the database, followed by modifications to the schema.

It is possible to create configuration for a DS component using the karaf console command line, and creating JDBC connection info for a PostgreSQL database can be done with the following commands:

The resulting configuration will be injected into the @Activate method like e.g. so (the full source of the PGUkelonnDatabaseProvider class is here):

The injected config is sent into the method creating the DataSource instance.

Important: the DataSource instance is cached and is used to represent the connected database.  The Connection instance are not cached. The Connection instances are created on demand in try-with-resource, to ensure that they are closed and all transactions are completed.

The method creating the DataSource instance passes the config on to a method that picks the JDBC connection info out of the injected config and puts into a Properties instance that can be used to connect to a JDBC database.

Using the pluggable database is done by having the DS component implementing the application’s business logic have the application specific database service injected, e.g. like so (full UkelonnServiceProvider source here):

Using the injected service is calling the getConnection() method in a try-with-catch, do the SQL operations and then let the end of the try-with-catch release the connection, like e.g. so

It probably helps for this approach for pluggable databases that I’m using JDBC prepared statements directly instead of using an ORM like e.g. hibernate (but that’s a story/rant for another day…).

Picking what database to use is done using karaf features.

The karaf-maven-plugin is used on all maven modules that create an OSGi bundle, to create a karaf feature repository file and attach it to the OSGi bundle artifact.

In addition I create a handwritten feature repository file where I include all of the feature files of the OSGi bundle artifacts using their maven coordinates. And then in the same file, create high level features that start the application with either database.

Examples of hand-written feature repositories:

  1. aggregate feature repository for ukelonn (maven URL mvn:no.priv.bang.ukelonn/karaf/LATEST/xml/features )
  2. aggregate feature repository for authservice (maven URL mvn:no.priv.bang.authservice/authservice/1.6.0/xml/features )

Using the authservice as an example:

  1. First define a feature that loads the features required for the application, except for the database
  2. Then define a feature that uses that feature in addition with the feature created when building the OSGi bundle for the derby test database
  3. Finally define a feature that uses the first feature in addition with the feature created when building the OSGi bundle for the PostgreSQL database

This means that trying out the authservice application can be done by:

  1. Download and start apache karaf using the quick start guide
  2. Install authservice with derby by loading the feature repository using the maven coordinates, and then loading the feature that pulls in the application and the derby database
  3. All bundles required by the features are pulled in from maven central and the application is started and becomes available on http://localhost:8181/authservice

Installing and starting the PostgreSQL takes a little more work (but not much):

  1. Install and start PostgreSQL (left as an exercies for the reader)
  2. Create a PostgreSQL user named karaf with a password (in the examples “karaf”)
  3. Create a blank database named “authservice” owned by user karaf
  4. Download and start apache karaf using the quick start guide
  5. Create database connection configuration from the karaf console
  6. Install authservice with PostgreSQL by loading the feature repository using the maven coordinates, and then loading the feature that pulls in the application and the PostgreSQL database
  7. All bundles required by the features are pulled in from maven central and the application is started and becomes available on http://localhost:8181/authservice