postgresql inside kubernetes / no space left on device

Written by
Walter Doekes
Published on 2021-01-15

Running PostgreSQL inside Kubernetes? Getting occasional "No space left on device" errors? Know that 64MB is not enough for everyone.

With the advent of more services running inside Kubernetes, we're now running into new issues and complexities specific to the containerization. For instance, to solve the problem of regular file backups of distributed filesystems, we've resorted to using rsync wrapped inside a pod (or sidecar). And now for containerized PostgreSQL, we're running into an artificial memory limit that needs fixing.

Manifestation

The issue manifests itself like this:

ERROR: could not resize shared memory segment "/PostgreSQL.491173048"
  to 4194304 bytes: No space left on device

This shared memory that PostgreSQL speaks of, is the shared memory made available to it through /dev/shm.

On your development machine, it may look like this:

$ mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

$ df -h | sed -ne '1p;/shm/p'
Filesystem  Size  Used Avail Use% Mounted on
tmpfs        16G  948M   15G   6% /dev/shm

That's fine. 16GiB is plenty of space. But in Kubernetes we get a Kubernetes default of a measly 64MiB and no means to change the shm-size. So, inside the pod with the PostgreSQL daemon, things look like this:

$ mount | grep shm
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)

$ df -h | sed -ne '1p;/shm/p'
Filesystem  Size  Used Avail Use% Mounted on
shm          64M     0   64M   0% /dev/shm

For a bunch of database operations, that is definitely too little. Any PostgreSQL database doing any serious work will quickly use up that much temporary space. (And run into this error.)

According to Thomas Munro on the postgrespro.com mailing list:

PostgreSQL creates segments in /dev/shm for parallel queries (via shm_open()), not for shared buffers. The amount used is controlled by work_mem. Queries can use up to work_mem for each node you see in the EXPLAIN plan, and for each process, so it can be quite a lot if you have lots of parallel worker processes and/or lots of tables/partitions being sorted or hashed in your query.

Basically what they're saying is: you need sufficient space in /dev/shm, period!

On the docker-library postgres page it is documented that you may want to increase the --shm-size (ShmSize). That is quite doable for direct Docker or docker-compose instantiations. But for PostgreSQL daemon pods in Kubernetes resizing shm does not seem to be possible.

Any other fixes then?

Well, I'm glad you asked! /dev/shm is just one of the ways that the PostgreSQL daemon can be configured to allocate shared memory through:

dynamic_shared_memory_type (enum)

Specifies the dynamic shared memory implementation that the server should use. Possible values are posix (for POSIX shared memory allocated using shm_open), sysv (for System V shared memory allocated via shmget), windows (for Windows shared memory), and mmap (to simulate shared memory using memory-mapped files stored in the data directory). [...]

(from PostgresSQL runtime config)

When using the posix shm_open(), we're directly opening files in /dev/shm. If we however opt to use the (old fashioned) sysv shmget(), the memory allocation is not pinned to this filesystem and it is not limited (unless someone has been touching /proc/sys/kernel/shm*).

Technical details of using System V shared memory

Using System V shared memory is a bit more convoluted than using POSIX shm. For POSIX shared memory calling shm_open() is basically the same as opening a (mmap-able) file in /dev/shm. For System V however, you're looking at an incantation like this shmdemo.c example:

#include <stdio.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/shm.h>

#define SHM_SIZE (size_t)(512 * 1024 * 1024UL) /* 512MiB */

int main(int argc, char *argv[])
{
    key_t key;
    int shmid;
    char *data;

    if (argc > 2) {
        fprintf(stderr, "usage: shmdemo [data_to_write]\n");
        return 1;
    }
    /* The file here is used as a "pointer to memory". The key is
     * calculated based on the inode number and non-zero 8 bits: */
    if ((key = ftok("./pointer-to-memory.txt", 1 /* project_id */)) == -1) {
        fprintf(stderr, "please create './pointer-to-memory.txt'\n");
        return 2;
    }
    if ((shmid = shmget(key, SHM_SIZE, 0644 | IPC_CREAT)) == -1)
        return 3;
    if ((data = shmat(shmid, NULL, 0)) == (char *)(-1)) /* attach */
        return 4;

    /* read or modify the segment, based on the command line: */
    if (argc == 2) {
        printf("writing to segment %#x: \"%s\"\n", key, argv[1]);
        strncpy(data, argv[1], SHM_SIZE);
    } else {
        printf("segment %#x contained: \"%s\"\n", key, data);
        shmctl(shmid, IPC_RMID, NULL); /* free the memory */
    }

    if (shmdt(data) == -1) /* detach */
        return 5;
    return 0;
}

(Luckily the PostgreSQL programmers concerned themselves with these awkward semantics, so we won't have to.)

If you want to confirm that you have access to sufficient System V shared memory inside your pod, you could try the above code sample to test. Invoking it looks like:

$ ./shmdemo
please create './pointer-to-memory.txt'

$ touch ./pointer-to-memory.txt

$ ./shmdemo
segment 0x1010dd5 contained: ""

$ ./shmdemo 'please store this in shm'
writing to segment 0x1010dd5: "please store this in shm"

$ ./shmdemo
segment 0x1010dd5 contained: "please store this in shm"

$ ./shmdemo
segment 0x1010dd5 contained: ""

And if you skipped/forget the IPC_RMID, you can see the leftovers using ipcs:

$ ipcs | awk '{if(int($6)==0)print}'

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x52010e16 688235     walter     644        536870912  0
0x52010e19 688238     walter     644        536870912  0

------ Semaphore Arrays --------
key        semid      owner      perms      nsems

And remove them with ipcrm:

$ ipcrm -M 0x52010e16

$ ipcrm -M 0x52010e19

But, you probably did not come here for lessons in ancient IPC. Quickly moving on to the next paragraph...

Configuring sysv dynamic_shared_memory_type in stolon

For stolon — the Kubernetes PostgreSQL manager that we're using — you can configure different parameters through the pgParameters setting. It keeps the configuration in a configMap:

$ kubectl -n NS get cm stolon-cluster-mycluster -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":...}'
    stolon-clusterdata: '{"formatVersion":1,...}'
...

Where the stolon-clusterdata holds both the configuration and current state:

{
  "formatVersion": 1,
  "changeTime": "2021-01-15T10:17:54.297700008Z",
  "cluster": {
...
    "spec": {
...
      "pgParameters": {
        "datestyle": "iso, mdy",
        "default_text_search_config": "pg_catalog.english",
        "dynamic_shared_memory_type": "posix",
...

You should not be editing this directly, but it can be educational to look at.

To edit the pgParameters you'll be using stolonctl from inside a stolon-proxy as specified in the cluster specification patching docs:

$ stolonctl --cluster-name=mycluster --store-backend=kubernetes \
    --kube-resource-kind=configmap update --patch \
    '{"pgParameters": {"dynamic_shared_memory_type": "sysv"}}'

$ stolonctl --cluster-name=mycluster --store-backend=kubernetes \
    --kube-resource-kind=configmap update --patch \
    '{"pgParameters": {"shared_buffers": "6144MB"}}'

And a restart:

$ kubectl -n NS rollout restart sts stolon-keeper

And that, my friends, should get rid of that pesky 64MiB limit.

postgresql inside kubernetes / no space left on device