Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: critical alerts by modules #263

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

AlexanderLukin
Copy link
Contributor

Make several changes in the current critical alerts engine.

  1. Now critical alerts are sent for each module individually.

  2. Introduce new CRITICAL_ALERTS_MIN_VAL_CSM_ABSOLUTE_COUNT env variable. If the number of validators in the CSM module affected by the CriticalMissedAttestations or CriticalNegativeDelta alert is greater than the value specified in this variable, the appropriate alert will be triggered. For validators in curated modules the logic of sending alerts is kept the same as before (alerts are sent depending on the total number of active validators).

  3. Ignore the number of active validators for node operators in the CSM module for CriticalMissedProposes alert. If there are validators in the CSM module affected by this alert, they all will be included in the alert summary regardless of the total number of validators for the node operator.

  4. Add a new nos_module_id label to all critical alerts. So now it is possible to route alerts depending on the module to different channels via Alertmanager.

  5. Rules for sending critical alerts were slightly loosened. Previously alerts were sent when the number of affected validators was greater than the particular threshold. Now alerts are sent when the number of affected validators is greater or equal to the threshold.

  6. Add information about the module to the alert summary.

  7. Add a new CSM_MODULE_ID env variable. Update information about all new envs in README.

  8. Slightly change log info for critical alerts. Now logs display the particular critical alert type together with the modules for which it was sent.

Make several changes in the current critical alerts engine.

1. Now critical alerts are sent for each module individually.

2. Introduce new `CRITICAL_ALERTS_MIN_VAL_CSM_ABSOLUTE_COUNT` env
variable. If the number of validators in the CSM module affected by the
`CriticalMissedAttestations` or `CriticalNegativeDelta` alert is greater
than the value specified in this variable, the appropriate alert will be
triggered. For validators in curated modules the logic of sending alerts
is kept the same as before (alerts are sent depending on the total
number of active validators).

3. Ignore the number of active validators for node operators in the CSM
module for `CriticalMissedProposes` alert. If there are validators in
the CSM module affected by this alert, they all will be included in the
alert summary regardless of the total number of validators for the node
operator.

4. Add a new `nos_module_id` label to all critical alerts. So now it is
possible to route alerts depending on the module to different channels
via Alertmanager.

5. Rules for sending critical alerts were slightly loosened. Previously
alerts were sent when the number of affected validators was greater than
the particular threshold. Now alerts are sent when the number of
affected validators is greater or equal to the threshold.

6. Add information about the module to the alert summary.

7. Add a new `CSM_MODULE_ID` env variable. Update information about all
new envs in README.

8. Slightly change log info for critical alerts. Now logs display the
particular critical alert type together with the modules for which it
was sent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant