postgresql inside kubernetes / no space left on device
Running PostgreSQL inside Kubernetes? Getting occasional "No space left on device" errors? Know that 64MB is not enough for everyone.
With the advent of more services running inside Kubernetes, we're now running into new issues and complexities specific to the containerization. For instance, to solve the problem of regular file backups of distributed filesystems, we've resorted to using rsync wrapped inside a pod (or sidecar). And now for containerized PostgreSQL, we're running into an artificial memory limit that needs fixing.
Manifestation
The issue manifests itself like this:
ERROR: could not resize shared memory segment "/PostgreSQL.491173048"
to 4194304 bytes: No space left on device
This shared memory that PostgreSQL speaks of, is the shared memory
made available to it through /dev/shm
.
On your development machine, it may look like this:
$ mount | grep shm
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
$ df -h | sed -ne '1p;/shm/p'
Filesystem Size Used Avail Use% Mounted on
tmpfs 16G 948M 15G 6% /dev/shm
That's fine. 16GiB is plenty of space. But in Kubernetes we get a Kubernetes default of a measly 64MiB and no means to change the shm-size. So, inside the pod with the PostgreSQL daemon, things look like this:
$ mount | grep shm
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
$ df -h | sed -ne '1p;/shm/p'
Filesystem Size Used Avail Use% Mounted on
shm 64M 0 64M 0% /dev/shm
For a bunch of database operations, that is definitely too little. Any PostgreSQL database doing any serious work will quickly use up that much temporary space. (And run into this error.)
According to Thomas Munro on the postgrespro.com mailing list:
PostgreSQL creates segments in
/dev/shm
for parallel queries (viashm_open()
), not for shared buffers. The amount used is controlled bywork_mem
. Queries can use up towork_mem
for each node you see in theEXPLAIN
plan, and for each process, so it can be quite a lot if you have lots of parallel worker processes and/or lots of tables/partitions being sorted or hashed in your query.
Basically what they're saying is: you need sufficient space in
/dev/shm
, period!
On the docker-library postgres
page
it is documented that you may want to increase the --shm-size
(ShmSize). That is quite doable for direct Docker or
docker-compose instantiations. But for PostgreSQL daemon pods in
Kubernetes resizing shm does not seem to be
possible.
Any other fixes then?
Well, I'm glad you asked! /dev/shm
is just one of the ways that the
PostgreSQL daemon can be configured to allocate shared memory through:
- dynamic_shared_memory_type (enum)
- Specifies the dynamic shared memory implementation that the server should use. Possible values are posix (for POSIX shared memory allocated using shm_open), sysv (for System V shared memory allocated via shmget), windows (for Windows shared memory), and mmap (to simulate shared memory using memory-mapped files stored in the data directory). [...]
(from PostgresSQL runtime config)
When using the posix shm_open()
, we're directly opening files in
/dev/shm
. If we however opt to use the (old fashioned) sysv
shmget()
, the memory allocation is not pinned to this filesystem and
it is not limited (unless someone has been touching
/proc/sys/kernel/shm*
).
Technical details of using System V shared memory
Using System V shared memory is a bit more convoluted than using
POSIX shm. For POSIX shared memory calling shm_open()
is basically
the same as opening a (mmap
-able) file in /dev/shm
. For System V
however, you're looking at an incantation like this shmdemo.c
example:
#include <stdio.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define SHM_SIZE (size_t)(512 * 1024 * 1024UL) /* 512MiB */
int main(int argc, char *argv[])
{
key_t key;
int shmid;
char *data;
if (argc > 2) {
fprintf(stderr, "usage: shmdemo [data_to_write]\n");
return 1;
}
/* The file here is used as a "pointer to memory". The key is
* calculated based on the inode number and non-zero 8 bits: */
if ((key = ftok("./pointer-to-memory.txt", 1 /* project_id */)) == -1) {
fprintf(stderr, "please create './pointer-to-memory.txt'\n");
return 2;
}
if ((shmid = shmget(key, SHM_SIZE, 0644 | IPC_CREAT)) == -1)
return 3;
if ((data = shmat(shmid, NULL, 0)) == (char *)(-1)) /* attach */
return 4;
/* read or modify the segment, based on the command line: */
if (argc == 2) {
printf("writing to segment %#x: \"%s\"\n", key, argv[1]);
strncpy(data, argv[1], SHM_SIZE);
} else {
printf("segment %#x contained: \"%s\"\n", key, data);
shmctl(shmid, IPC_RMID, NULL); /* free the memory */
}
if (shmdt(data) == -1) /* detach */
return 5;
return 0;
}
(Luckily the PostgreSQL programmers concerned themselves with these awkward semantics, so we won't have to.)
If you want to confirm that you have access to sufficient System V shared memory inside your pod, you could try the above code sample to test. Invoking it looks like:
$ ./shmdemo
please create './pointer-to-memory.txt'
$ touch ./pointer-to-memory.txt
$ ./shmdemo
segment 0x1010dd5 contained: ""
$ ./shmdemo 'please store this in shm'
writing to segment 0x1010dd5: "please store this in shm"
$ ./shmdemo
segment 0x1010dd5 contained: "please store this in shm"
$ ./shmdemo
segment 0x1010dd5 contained: ""
And if you skipped/forget the IPC_RMID
, you can see the leftovers
using ipcs
:
$ ipcs | awk '{if(int($6)==0)print}'
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x52010e16 688235 walter 644 536870912 0
0x52010e19 688238 walter 644 536870912 0
------ Semaphore Arrays --------
key semid owner perms nsems
And remove them with ipcrm
:
$ ipcrm -M 0x52010e16
$ ipcrm -M 0x52010e19
But, you probably did not come here for lessons in ancient IPC. Quickly moving on to the next paragraph...
Configuring sysv dynamic_shared_memory_type in stolon
For stolon — the Kubernetes PostgreSQL manager that we're using
— you can configure different parameters through the pgParameters
setting. It keeps the configuration in a configMap
:
$ kubectl -n NS get cm stolon-cluster-mycluster -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":...}'
stolon-clusterdata: '{"formatVersion":1,...}'
...
Where the stolon-clusterdata
holds both the configuration and current
state:
{
"formatVersion": 1,
"changeTime": "2021-01-15T10:17:54.297700008Z",
"cluster": {
...
"spec": {
...
"pgParameters": {
"datestyle": "iso, mdy",
"default_text_search_config": "pg_catalog.english",
"dynamic_shared_memory_type": "posix",
...
You should not be editing this directly, but it can be educational to look at.
To edit the pgParameters
you'll be using stolonctl
from inside a
stolon-proxy as specified in the cluster specification patching
docs:
$ stolonctl --cluster-name=mycluster --store-backend=kubernetes \
--kube-resource-kind=configmap update --patch \
'{"pgParameters": {"dynamic_shared_memory_type": "sysv"}}'
$ stolonctl --cluster-name=mycluster --store-backend=kubernetes \
--kube-resource-kind=configmap update --patch \
'{"pgParameters": {"shared_buffers": "6144MB"}}'
And a restart:
$ kubectl -n NS rollout restart sts stolon-keeper
And that, my friends, should get rid of that pesky 64MiB limit.