-
CrashLoopBackOff indicates that a container is repeatedly crashing after restarting. A container might crash for many reasons, and checking a Pod's logs might aid in troubleshooting the root cause.
Apart from the error text message Does not have minimum availability, there could be other error text messages such as Failed to pull image. However, I recommend you to identify error text messages which are appropriate for your environment. You can check with kubectl logs <pod_name> or on Log Viewer.
For your reference, here are explanations for pod issues:
- CrashLoopBackOff means the container was downloaded but failed to run
- ImagePullBackOff means the image was not downloaded
- "Does not have minimum availability" means that there are no resources available on cluster but not specific to a lack of resources. For instance there maybe nodes available but the pod is not scheduleable on them per the deployment.
- "Insufficient cpu" means there is insufficient cpu on the nodes.
- "Unschedulable" indicates that your Pod cannot be scheduled because of insufficient resources or some configuration error.
With that in mind, Here is the step-by-step for creating a Log based Metric for later creating an alert based on it.
Setup a Logs-based Metric using the parameters:
resource.type="k8s_pod"
severity>=WARNING
unschedulable
You can replace the filter to something that is more appropriate for your case.
Create a label in the metric that will allow you to identify the pod that was unschedulable (or other status). This will also help with grouping when you create the alert for a failing pod.
In Stackdriver Monitoring, create an alert with the following parameters.
- Set the resource type to
k8s_pod
- Set the metric to the one you created in step 1
- Set
Group By to the pod_name (also created in step 1)
- In the advanced aggregation section set the aligner to
sum and the Alignment Period to 5m (or what you thinks is more appropriate).
- Configure the condition triggers
For to more than 1 minute to prevent the alert from firing over and over. This can also be configured per your requirement.
I hope this information is helpful, If you have any questions let me know in the comments.