Tag Archives: linux

How I learnt about linux’ “OOM Killer”

I have a low-end VPS (Virtual Private Server), or at least: it used to be low-end, now it’s at least one step above lowest and cheapest.

On this server I’m running various personal stuff: email (with spamassassin), some public web pages, and some private web pages with various small webapps for the household (e.g. the “weekly allowance app” registering work done and payments made).

I had noticed occasional slow startups of the webapps, and in particular in September this year, when I was demonstrating/showing off the webapps at this year’s JavaZone the demos were less than impressive since the webapps took ages to load.

I was quick to blame the slowness on the wi-fi, but as it turns out, that may have been completely unfair.

The webapps had no performance issues on my development machine even when running with a full desktop, IDE and other stuff.  The webapps on the VPS also seemed to have no performance issues once they loaded.

I thought “this is something I definitely will have to look into at some later time…” and then moved on with doing more interesting stuff, i.e. basically anything other than figuring out why the webapps on a VPS were slow to start.

But then the webapps started failing nightly and I had to look into the problem.

What I saw in the logs was that the reason the webapps were broken in the morning was that they were stuck waitning for a liquibase database changelog lock that never was released.

Liquibase is how my webapps set up and update database schemas. Every time a webapp starts it connects to the database and checks what liquibase scripts that have been run against that database and applies the ones that have not already been run. The list of scripts that have been run is a tabled called databasechangelog. And to avoid having more than one liquibase client attempting to modify the database schema, liquibase uses a different table called databasechangeloglock to moderate write access to the database,

I.e. the databasechangeloglock is just a database table that has one or 0 rows. A liquibase client tries to insert a lock into the table at startup and waits and retries if this fails (and eventually completely fails).

In my case the webapps were failing because they were hanging at startup, trying to get a liquibase lock and failing to get one and were hanging in limbo and never completing their startup process. Manually clearing the lock from the table and restarting the webapps made the webapps start up normally. However, the next day the webapps were failing again for the same reason: the webapps were stuck waiting for a liquibase lock.

I initially suspected errors in my code, specifically in the liquibase setup. But googling for similar problems, code examination and debugging revealed nothing. I found nothing because there was nothing to be found.  The actual cause of the problem had nothing to do with the code or with liquibase.

I run my webapps in an instance of apache karaf that is started and controlled by systemd. And I saw that karaf was restarted 06:30 (or close to 06:30) every morning. So my next theory was that systemd for some reason decided to restart karaf 06:30 every morning.

No google searches for similar symptoms found anything interesting.

So I presented my problem to a mailing list with network and linux/unix experts and got back two questions:

  1. Was something else started at the same time?
  2. Did that something else use a lot of memory and trigger the OOM killer?

And that turned out to be the case.

I have been using and maintaining UNIX systems since the mid to late 80ies and setting up and using and maintaining linux systems since the mid to late 90ies, but this was the first time I’d heard of the OOM killer.

The OOM killer has been around for a while (the oldest mention I’ve found is from 2009), but I’ve never encountered it before.

The reason I’ve never encountered it before is that I’ve mostly dealt with physical machines. Back in the 80ies I was told that having approximately two and a half times physical memory was a good rule of thumb for scaling swap space, so that’s a rule I’ve followed ever since (keeping the ratio as the number of megabytes increased, eventually turning into gigabytes).

And when you have two and a half times the physical memory as a fallback, you never encounter the conditions that make the OOM killer come alive.  Everything slows down and the computer starts trashing before the condtions that triggers the OOM killer comes into play.

The VPS on the other hand, has no swap space. And with the original minimum configuration (1 CPU core, 1GB of memory), if it had been a boat it would have been said to be riding low in the water. It was constantly running at a little less than the available 1GB. And if nothing special happened, everything ran just fine.

But when something extraordinary happened, such as e.g. spamassassin’s spamd starting at 06:30 and requiring more memory than was available, then OOM started looking for a juicy fat process to kill, and the apache karaf process was a prime cadidate (perhaps because of “apache” in its name combined with OOM killer’s notorious hatred of feathers?).

And then systemd discovered that one of it’s services had died and immediately tried to restart it, only to have OOM killer shoot it down, and this continued for quite a while.

And in one of the attempted restarts, the webapp got far enough to set the databasechangeloglock before it was rudely shot down, and the next time(s) it was attempted started it got stuck waiting for a lock that would never be released.

The solution was to bump the memory to the next step, i.e. from 1GB to 2GB. Most of the time the VPS is running at the same load as before (i.e. slightly below 1GB) but now a process that suddenly requires a lot of memory no longer triggers the OOM killer and everything’s fine.  Also the available memory is used for buff/cache and everything becomes much faster.

I bumped the memory 8 weeks ago and the problem hasn’t occurred again, so it looks like (so far) the problem has been solved.

Installing debian “squeeze” with PXE boot on a Samsung N145 Plus netbook

Introduction

This article describes the steps necessary to install debian 6 “squeeze” on a Samsung N145 Plus netbook, with the following specification:

  • Intel Atom processor
  • 10.1″ display
  • 1GB RAM
  • 340GB HDD
  • Windows 7 preinstalled

Setting up netboot of the debian installer

DHCP requests in my home LAN network is provided by dnsmasq on a desktop PC running GNU/linux debian stable (which at the time of writing, was Debian 6 squeeze). One nice feature of dnsmasq is that it can provide PXE network boot.

So what I did was to download the i386 network boot image and put the contents in the /var/tftpd/debian-installer/i386 directory of the computer running dnsmasq, and then edit the /etc/dnsmasq.conf file in the following way:

  1. Remove the comment in front of the dhcp-boot config line:
    dhcp-boot=pxelinux.0
  2. Set the tftp-root pointing to the directory containing the pxelinux.0 file:
    tftp-root=/var/tftpd/debian-installer/i386

Installing debian

Booting from the network

I connected the netbook with to the switch in my home LAN an RJ45  twisted pair cable, and powered on the netbook, and kept the F12 button pressed during boot, and ended up in the debian text based installer.

I set the time zone and location of the install (Oslo, Norway), created an initial user and set the root password.

Partitioning

The netbook came with a 340GB and Windows 7 preinstalled.  The hard disk was partitioned so that the Win7 system had both a C: and a D: drive, with the operating system installed on the C: drive.

The plan was to keep the Windows 7 installation, sans its D: drive and install debian in the part of the hard disk occupied by the D: drive.

The initial partitioning table looked like this:

#1 primary 104.9 MB B ntfs
#2 primary 93.4 GB ntfs
#5 logical 138.3 GB ntfs
#4 primary 28.2 GB ntfs

I guessed that partititon #1 was the boot partition, and that partition #2 was the C: drive containing the Windows 7 installation, and that #4 was either some kind of Samsung software (diagnostics possibly) or something belonging to the Windows 7 installation.

I left partition #1, #2 and #4 alone, and deleted the partition containing the D: drive (partition #5), and turned that into free space:

#1 primær 104.9 MB B ntfs
#2 primær 93.4 GB ntfs
pri/log 138.3 GB FREE SPACE
#4 primær 18.2 GB ntfs

I added a swap partition twice the size of the physical memory i.e. 2GB, and added an ext3 partition using the rest of the free space, and ended up with a partitioning table looking like this:

#1 primary 104.9 MB B ntfs
#2 primary 93.4 GB ntfs
#5 logical 136.3 GB B f ext3 /
#6 logical 2.0 GB f swap swap
#4 primary 18.2 GB ntfs

I saved the partitioning table and continued.

Installing the system

After completing the partitioning, I selected the following items to install:

  • SSH server
  • Laptop
  • Base tools

I let the installer run, using defaults for all questions. I answered YES to the question of whether GRUB should be installed on MBR. The installer found the Windows 7 installation and added it to the GRUB boot menu.  When the time came to reboot, I let the installer reboot.

After the reboot I logged in as root and installed the “KDE Plasma netbook” package:

apt-get install plasma-netbook kde-l10n-nb

I opened the /etc/apt/sources.list in a text editor, and modified it:

I then updated the APT database with the new sources and added all updates to the already installed software:

apt-get update
apt-get install linuxmint-keyring
apt-get update
apt-get dist-upgrade

I then installed all software I assumed was necessary:

apt-get install ttf-mscorefonts-installer
apt-get install openoffice.org openoffice.org-l10n-nb
apt-get install firefox firefox-l10n-nb

I rebooted the laptop and then logged into the plasma desktop using the user created at the start of the installation process. The desktop was missing network support and other useful software.

I logged in as root using the “failsafe” alternative, and installed missing software in the terminal window:

apt-get install network-manager-kde update-notifier-kde
apt-get install synaptic software-center gdebi

I rebooted and logged into plasma again. I tried to plug in an USB flash memory, and discovered that the desktop had no file manager, konqueror was missing. I installed konqueror (and discovered I should have picked the package “kde-plasma-netbook”, rather than just “plasma-netbook”):

apt-get install konqueror

The plasma desktop looked great, but was way to slow on an atom processor without much in the way of graphical hardware acceleration.

So I decided to try gnome and installed gnome with the command:

apt-get install gnome

I let apt set gdm3 as the default login instead of kdm.

I rebooted and logged into the gnome desktop, and it performed a lot better than the plasma desktop.

I rebooted again chose Windows 7 from the grub menu, and Windows 7 booted and logging into the desktop worked.

Making the Fn keys adjust the display brightness

The Fn keys for the adjusting the brightness didn’t work. I googled, and found two promising web pages:

  1. Fixing brightnes control, etc. on a Samsung R510 with Debian Squeeze
  2. InstallingDebianOn Samsung Samsung N150

I decided to try the first approach, and downloaded the packages created for Ubuntu Natty from https://launchpad.net/~voria/+archive/ppa

I then installed the downloaded .deb packages in the following way:

  1. Installed the easy-slow-manager:
    1. I let gdebi pull in all depdendencies (gcc, the linux-headers, make, etc)
  2. Installed samsung-backlight:
    1. Edited /etc/default/grub changing the line GRUB_CMDLINE_LINUX_DEFAULT
      GRUB_CMDLINE_LINUX_DEFAULT="quiet"
      to
      GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_backlight=vendor"
    2. Ran the command
      update-grub
  3. Installed samsung-tools:
      1. Installed the devscripts
        apt-get install devscripts
      2. Unpacked the samsung-tools tarball
        cd /tmp
        tar zxvf samsung-tools_1.4~ppa3~loms~natty.tar.gz
        cd /tmp/samsung-tools_1.4~ppa3~loms~natty
        dch -l sb
        1. Added “Compiled for debian squeeze” as the final comment

     

     

  4. Built the deb package
    cd /tmp/samsung-tools-1.4~ppa3~loms~nattysb1
    dpkg-buildpackage -rfakeroot -us -uc
  5. Installed the deb package
    gdebi /tmp/samsung-tools_1.4~ppa3~loms~nattysb1_all.deb
    1. I let gdebi install all of the required dependencies
  6. Rebooted

After the reboot I tried the Fn+Up and Fn+Down keys to adjust the display brightness and the keys worked fine.