Rescuing pods from CrashLoopBackOff

When pods behave badly in your cluster, looping over and over, it is known as CrashLoopBackOff. If the pod contains important data or you just need to edit something on the pod to get it fixed like running some checker on a program installed on the pod or moving things around or whatever the reason is but you need to get into the pod to fix it but there is no way you can just kubectl -n namespace exec -it name bash to it then what do you do? dang you CrashLoopBackOff!!

initContainer method

  • Advantage: the pod runs before the main application and finishes
  • Disadvantage: the pod exits and the main application starts so if you are needing to do something interactive this does not work.

One possible way to get going is to run an initContainer. Since a pod can run 1 or more containers the initContainer can be easily added to the pod to run. You can have 1 or more initContainer per pod and they run before the app containers start so if you need to fix a permission or make changes or run some pre-script before your application container then this is perfect.

  • Init containers always run to completion
  • Each init container must complete successfully before the next one starts

If a Pod’s init container fails, the kubelet repeatedly restarts that init container until it succeeds. However, if the Pod has a restartPolicy of Never, and an init container fails during startup of that Pod, Kubernetes treats the overall Pod as failed. (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)

If you are running via StatefulSet or Deployments you can setup podTemplate to run your initContainer

One useful example is in elasticsearch to install plugins to the container before starting it. you would add

  - podTemplate:
        - name: install-plugins
          - sh
          - -c
          - |
            bin/elasticsearch-plugin install --batch repository-gcs

If you are just changing this for a Pod then you can add it to the main spec section

  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]

blocking method

  • Advantage: The container for the pod starts but the application does not start so you can make interactive changes before the application
  • Disadvantage: You would need to reconfigure the pod after making updates to bring it up to normal

This method works by changing the pod template of the ReplicationController/Deployment/DeploymentConfig/Statefulset so that it starts with a blocking, never-failing command like /bin/sh or /bin/cat so that the command blocks the main program from starting then you can exec into the container to run your fixes.

You will need to give it a tty and stdin.

Edit your spec.template.spec from something like

        name: logstash


        name: logstash
        command: [ "/bin/cat" ]
        tty: true
        stdin: true

Then apply the change. The pod will restart and will run /bin/cat forever so that you can kubectl -n namespace exec -it containername bash and perform your work then exit out.

Once you are done you would revert the change and your pod should restart.



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.