Terminate Container in Responsive and Graceful Way
Contents
Container Context and PID 1
In container context, when it comes to teardown
docker stop
send aSIGTERM
to the main process inside the container, and after a grace period (default 10s),SIGKILL
- Kubernetes send a
SIGTERM
signal to the main process of containers in the Pod, and after a grace period (default 30s),SIGKILL
- Under interactive mode, pressing Ctrl+C causes the system to send a
SIGINT
signal to the main process
For Terminating in graceful and responsive way, processes inside container should handle SIGTERM
and SIGINT
. Many annoying issues/complains about container exit can be found in network[1][2]. Let’s dig out why.
Applications shipped in minimal container image, such as Distroless Container Images, or FROM scratch
static binary, usually have with entrypoint like /app
, run directly as PID 1 in container’s pid namespace.
But PID 1
is treated specially by Linux[3][4][5]:
- The process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
- Indeed, it is unkillable, meaning that it doesn’t get killed by signals which would terminate regular processes.
- When the process with pid 1 die for any reason, all other processes are killed with
KILL
signal - When any process having children dies for any reason, its children are reparented to process with PID 1
- PID 1 has a unique responsibility, which is to reap zombie processes
The following Rust code build a basic application that print its PID and lives for 60 seconds. The full code is here.
|
|
Let’s build and run it in container, then send SIGTERM
or SIGINT
to sleep process. Our process with PID 41 is child of /bin/sh, it will exit immediately with code 143 (SIGTERM)
or 130 (SIGINT)
.
This much like the case we run program in terminal, test it, and press CTRL-C to stop it.
|
|
PID 1 Behavior
When run as PID 1, it is unstoppable in its PID namespace. None of SIGINT
, SIGTERM
or SIGKILL
will work.
|
|
As container processes are just normal processes in host PID namespace, sending SIGKILL
in host work as expected. Docker or Kubelet send signals to PID 1 in every container by this way.
The process won’t repond to SIGTERM
or SIGINT
because it is not coded to do it.
|
|
Solution for entrypoint is application binary
Solutions to this problem depends on what the behavior is expected. If reponsive to SIGINT(CTRL-C)
or SIGTERM
is the only demand, for languages have default behavior, such as Golang, it abort directly when receive SIGTERM
or SIGINT
. Nothing need to be done.
For language don’t have default behavior, like Rust, using tini or dumb-init to wrap container entrypoint are the fast way.
tini or dumb-init will act as PID 1 in container and immediately spawns command as a child process, taking care to properly handle and forward signals as they are received
|
|
Note zengxu/alpine:init
is build by this Dockerfile:
|
|
Above solution can work in any runtime context, including Containerd, Docker, Podman, or Kubernetes.
In runtime context is Docker, tini is included in it. Adding arg --init
in run command will override target’s entrypoint as /sbin/docker-init -- /src/sleep
.
|
|
Solution for entrypoint script
What about application that must be start from a shell script? Bash or shell don’t forward signals like SIGTERM to processes it is currently waiting on[6].
|
|
This is why the annoying scene happens[1], the container can’t kill by Ctrl-C
.
|
|
As pointed out by answers in [6], exec process and let it replace shell process solve this problem. Writing signal handler in script do the best, but can be a litte complex.
|
|
What happen here is
|
|
child reaping
In some cases, application use unix fork to do specific tasks. Then the duty of reaping children comes to the entrypoint process. Since not all application will carefully reaping child processes by installing SIGCHILD
signal hander, calling the wait syscall in parent process, these child processes may become longer lived zombie processes. Large sized, long lived zombie processes are harmful to Unix system, it will exhaust pid resource and process table[7].
Below Golang samaples try to create zombies every 1 second
|
|
If application don’t reap children, in the end there’re 60 zombies. The full code is here. You can play it with
|
|
Rust version demo is here.
For such cases, container init-system such as tini and dumb-init are good choice. More at what-is-advantage-of-Tini?
As previous pointed, in Docker you can simplely use docker run --init
to solve this problem. In K8s, the pause process can help reaping child processes with spec.shareProcessNamespace: true
, More details at share-process-namespace, the-almighty-pause-container.
Comparing to leverage runtime, building it into the container are better choice. Things will be handled by default without reliance on properly configuring things at runtime.
graceful shutdown guides
For application should shutdown gracefully, it should be coded to catch SIGTERM
or SIGINT
, do cleanup such as closing connections, and finally exit with 0. For Rust this guide (handling-unix-kill-signals-in-rust) can be followed. For Golang this (how-to-stop-http-listenandserve) can be followed. Other languages are your own, but solution are quite common.
Author Zeng Xu
LastMod 2023-02-27 08:00
License 本作品采用 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议 进行许可,转载时请注明原文链接。