A sample service check, annotated:
host_name dbsrp2076 <-- name of server service_description SSH <-- service being monitored servicegroups PROD-ssh <-- service groups is_volatile 0 <-- Does this service spontaneously start and stop (always 0 for "no") check_period 24x7 <-- during what hours is this service checked? max_check_attempts 10 <-- How many failed attempts before an alert is generated? check_interval 15 <-- how often is this service ordinarily checked, in minutes retry_interval 1 <-- On failed attempts, how many minutes between retries? contact_groups oipeds_dba,smapigroup <-- who is notified if there is an issue? notification_options w,u,c,r <-- notification option list (see below) notification_interval 60 <-- when a service is failed, how often should an alert be sent out if the problem is not acknowledged? (in minutes) notification_period 24x7 <-- during what hours of the day are alerts sent? check_command check_ssh <-- script used to execute the check.
Standard Nagios notification options:
w: Notify on WARNING service states u: Notify on UNKNOWN service states c: Notify on CRITICAL service states r: Notify on service RECOVERY (OK states) f: Notify when the service starts and stops FLAPPING n (none): Do not notify the contact on any type of service notifications
Furthermore, here are the current thresholds in the monitoring scripts:
- check_auto_increment: Checks all integers in a database to see how close they are to their max values as defined by the datatype. Warns at 70% of capacity, critical alert at 85% of capacity. Information on integer data types and their capacity can be found here: dev.mysql.com/doc/refman/5.7/en/integer-types.html
- check_cpuram: Enumerates current CPU and RAM on the server.
- check_mysql: Checks to see if MySQL is running.
- check_mysql_active: Checks the number of active, running threads. Warns at 20, critical at 40. Does *not* count sleeping threads.
- check_mysql_cluster: Checks to ensure all three nodes of a cluster report as available. Returns the cluster configuration and nodes if OK, returns a critical alert if the cluster size is less than three. Returns an unknown error if other states are encountered.
- check_mysql_schemata: Returns a list of all schemata in the database with the exception of all system schemata.
- check_mysql_size: Returns the total size in GB of the entire database. Informational.
- check_mysql_sleep: Checks the number of sleeping threads. Warns at 500, critical at 600.
- check_mysql_version: Checks the MySQL version running. Approved versions are >= 5.7.28 and 8.0.x. Other versions return a warning alert.
- check_os: Returns the distribution name, major, and minor version of the operating system. Informational only.
- check_remote_disk: Checks the following mount points for disk usage: /mysql/binlog0 /mysql/tmp0 /mysql/audit0 /mysqladm /mystemp /mysql/data0 /mysqlshare Warns if utilization is breater than 85%, sends critical alert if >= 90%.