Commit Graph

11 Commits

Author SHA1 Message Date
Oliver Smith a9c93850c3 jobs/osmo-gsm-tester-virtual: kill old instances
Make sure osmo-gsm-tester gets killed eventually, even if a bug causes
it to run forever or if aborted manually.

* add a name to the docker container
* kill the docker container if it runs longer than 24h with
  docker-cleanup.sh
* rename fix_permissions_trap to clean_up_trap and kill it there, when
  it is still running before the job starts and after it is done
  (in my testing this did not kill it after pressing abort, but it would
  be killed either at the start of the next job running on the same
  jenkins node, or after 24h by docker-cleanup.sh)

Related: OS#6304
Change-Id: I6fc874d319d74aabdc33c10910cbcca2978d5bbb
2023-12-14 11:11:27 +01:00
Oliver Smith a13ce691d1 scripts/docker-cleanup: buildkit cache too
In newer docker versions, a buildkit cache was introduced. It gets used
while building images. Clean it up as well.

Related: https://osmocom.org/projects/osmocom-servers/wiki/Docker_cache_clean_up
Change-Id: Icf5237def75d4bcef6b0065f3f1f1da2ff362322
2023-11-21 13:01:03 +00:00
Oliver Smith b206b2f1d2 scripts/docker-cleanup: remove containers > 24h
Remove containers starting with jenkins- or having ttcn3 in the name, if
they have been running for more than 24 hours. This can happen with the
ttcn3 testsuites, as they typically start multiple docker containers in
the background (one per Osmocom program) before they start the testsuite
docker container in the foreground. Usually the clean up trap makes sure
that all containers get killed, but we have seen that a few containers
have been running for a few months. One reason for this could be
temporary loss of connection between the jenkins server and the node
running the job.

Extend the clean script to remove the containers that were not properly
removed by the clean up trap.

Historically we used to kill docker containers of the same name before
starting a testsuite, but this had the downside that we could not start
the same testsuite multiple times in parallel. This was refactored in
docker-playground Ifcd384272c56d585e220e2588f2186dc110902ed.

Change-Id: I58c17b57c998eaba411658e83b7295d7cfcf9a23
2023-10-04 17:53:51 +02:00
Oliver Smith 21a641d6c2 scripts/docker-cleanup: remove fallback code
Remove the fallback clean up code, as it also may lead to images getting
removed right before we need to use them. Besides that, it should be
dead code by now since docuum should be running on all our jenkins nodes
to clean up old images based on last use date.

Change-Id: I9ca0c2ba245bdd75d9fb8eaf341055e8c2ab1b55
2022-12-09 10:40:58 +01:00
Oliver Smith a7df704d4d scripts/docker-cleanup: fix timing problems
Don't delete images while they are being used, to fix these errors we
see from time to time in the middle of "docker build" on jenkins:
unknown parent image ID sha256:1b072e35048cd8b680eddabdc641ac678edb1184d222d5e7b3fbe0b3c333129a

This happens because "docker build" creates so-called dangling images
for each step processed of a Dockerfile. The "docker system prune" call
deletes these dangling images (among other things).

Remove the "docker system prune" call. We already have the docuum daemon
to deal with unused images (dangling and not dangling), it removes them
based on last use date so that the used space is always below a
configured limit. As it deletes images that haven't been used the
longest when it reaches the limit, it will not result in the problem
explained above.

Besides images, "docker system prune" also removes unused containers
(instances of images created with 'docker run' without --rm) and
networks. Add "docker container prune" and "docker network prune"
commands to remove them from now on.

Also remove the redundant container removal logic (previous it was
redundant with "docker system prune", now redundant with "docker
container prune").

Related: https://docs.docker.com/config/pruning/
Change-Id: Ia1b466eea43dd135373949e8e3e6b005c169ea0c
2022-12-09 10:40:33 +01:00
Oliver Smith 88521fbc14 scripts/docker-cleanup.sh: conditional img clean
Only run the simple image clean code if docuum is not running. It works
well enough in most cases, but has the drawbacks that it never deletes
"latest" images or images not matching "^osmocom-build", and may delete
images that are still being used (OS#5447). With the other tool, all
images are considered for removal, and the ones that have not been used
the longest time are removed first.

Related: OS#5477, OS#5066, SYS#5827
Change-Id: I1cef0833c096de0fa5acf77156bb5dd362e2ef9c
2022-02-11 15:44:44 +01:00
Oliver Smith b5ebf6ea6b scripts/docker-cleanup.sh: use "docker system prune"
Do not only clean up dangling images, but also containers, volumes and
networks.

Related: SYS#5827
Change-Id: If441b251de50063f0229d36fb1bc260a4cb1dd87
2022-02-11 15:44:16 +01:00
Oliver Smith 2b77e64c48 scripts/docker-cleanup.sh: delete containers too
Related: SYS#5827
Change-Id: I73b2f13875286c1aaa5424809edab2202f41768b
2022-02-11 15:44:16 +01:00
Oliver Smith e94b29e837 scripts/docker-cleanup.sh: use set -x
Change-Id: Iba170128e55a9778467c3d3bcf33a91321a8c29f
2022-02-11 15:44:16 +01:00
Alexander Couzens bd20389e49 scripts/docker-cleanup.sh: set permissions to 755
It will otherwise not executed by the cron, because the cron
is checking for the executable bit

Change-Id: Ie9d67b157d62b38b62f5e74406d14344f90d07b8
2018-04-16 16:33:08 +02:00
Harald Welte acdde1617b add docker-cleanup.sh script
This script should be executed regularly on all build slaves that
have docker in order to discard unused images/layers.  It would
be a good idea to call "fstrim /" afterwards in order to get more
SSD performance.  However, the latter requires root access, and hence
cannot be called by the 'osmocom-build' user and thus jenkins.

Maybe we should install it as a cron job or systemd periodic timer job?

Related: OS#3144
Change-Id: I688b952578507a9cc28fe682221b5c7e3a245519
2018-04-11 06:07:12 +00:00