Make sure osmo-gsm-tester gets killed eventually, even if a bug causes
it to run forever or if aborted manually.
* add a name to the docker container
* kill the docker container if it runs longer than 24h with
docker-cleanup.sh
* rename fix_permissions_trap to clean_up_trap and kill it there, when
it is still running before the job starts and after it is done
(in my testing this did not kill it after pressing abort, but it would
be killed either at the start of the next job running on the same
jenkins node, or after 24h by docker-cleanup.sh)
Related: OS#6304
Change-Id: I6fc874d319d74aabdc33c10910cbcca2978d5bbb
Remove containers starting with jenkins- or having ttcn3 in the name, if
they have been running for more than 24 hours. This can happen with the
ttcn3 testsuites, as they typically start multiple docker containers in
the background (one per Osmocom program) before they start the testsuite
docker container in the foreground. Usually the clean up trap makes sure
that all containers get killed, but we have seen that a few containers
have been running for a few months. One reason for this could be
temporary loss of connection between the jenkins server and the node
running the job.
Extend the clean script to remove the containers that were not properly
removed by the clean up trap.
Historically we used to kill docker containers of the same name before
starting a testsuite, but this had the downside that we could not start
the same testsuite multiple times in parallel. This was refactored in
docker-playground Ifcd384272c56d585e220e2588f2186dc110902ed.
Change-Id: I58c17b57c998eaba411658e83b7295d7cfcf9a23
Remove the fallback clean up code, as it also may lead to images getting
removed right before we need to use them. Besides that, it should be
dead code by now since docuum should be running on all our jenkins nodes
to clean up old images based on last use date.
Change-Id: I9ca0c2ba245bdd75d9fb8eaf341055e8c2ab1b55
Don't delete images while they are being used, to fix these errors we
see from time to time in the middle of "docker build" on jenkins:
unknown parent image ID sha256:1b072e35048cd8b680eddabdc641ac678edb1184d222d5e7b3fbe0b3c333129a
This happens because "docker build" creates so-called dangling images
for each step processed of a Dockerfile. The "docker system prune" call
deletes these dangling images (among other things).
Remove the "docker system prune" call. We already have the docuum daemon
to deal with unused images (dangling and not dangling), it removes them
based on last use date so that the used space is always below a
configured limit. As it deletes images that haven't been used the
longest when it reaches the limit, it will not result in the problem
explained above.
Besides images, "docker system prune" also removes unused containers
(instances of images created with 'docker run' without --rm) and
networks. Add "docker container prune" and "docker network prune"
commands to remove them from now on.
Also remove the redundant container removal logic (previous it was
redundant with "docker system prune", now redundant with "docker
container prune").
Related: https://docs.docker.com/config/pruning/
Change-Id: Ia1b466eea43dd135373949e8e3e6b005c169ea0c
Only run the simple image clean code if docuum is not running. It works
well enough in most cases, but has the drawbacks that it never deletes
"latest" images or images not matching "^osmocom-build", and may delete
images that are still being used (OS#5447). With the other tool, all
images are considered for removal, and the ones that have not been used
the longest time are removed first.
Related: OS#5477, OS#5066, SYS#5827
Change-Id: I1cef0833c096de0fa5acf77156bb5dd362e2ef9c
Do not only clean up dangling images, but also containers, volumes and
networks.
Related: SYS#5827
Change-Id: If441b251de50063f0229d36fb1bc260a4cb1dd87
This script should be executed regularly on all build slaves that
have docker in order to discard unused images/layers. It would
be a good idea to call "fstrim /" afterwards in order to get more
SSD performance. However, the latter requires root access, and hence
cannot be called by the 'osmocom-build' user and thus jenkins.
Maybe we should install it as a cron job or systemd periodic timer job?
Related: OS#3144
Change-Id: I688b952578507a9cc28fe682221b5c7e3a245519