Folosesc deja de multa vreme Nagios. Nagios este foarte util pentru notificari despre servicii pe diverse servere ce nu functioneaza. Poate trimite notificari prin e-mail, SMS(avand un SMS gateway conectat), poate vorbi in boxe(prin intermediul festival) sau poate destul de multe notificari.
Insa nagios face doar notificari. Ruleaza sub un user simplu, pentru anumite comenzi(ce nu pot fi rulat de sub un simplu user) face sudo (cu supraveghere atenta).
Pana nu demult aveam intiparita bine in minte ideea ca daca vreau ceva care sa verifice starea unui program si sa ia masuri in anumite situatii trebuie sa fac un script, aveam in cap chiar si un exemplu (de la eggdrop, remember IRC?) care sa il pun sa ruleze in cron.
Software-ul pare a fi unul destul de matur, a ajuns la versiunea a 5-a(In repozitoarele Debian stable sau Ubuntu LTS e la versiunea 4.8.x). Are chiar si o interfata web pe care totusi nu v-as recomanda sa o porniti, iar daca o porniti sa nu o lasati libera publicului larg(in primul rand daemonul ruleaza sub userul root).
Monit este capabil ca in functie de diversi parametrii ai aplicatiei, cum ar fi memoria RAM utilizata, nivelul de incarcare al procesorului, disparitia activitatii pe portul TCP sau UDP sa ia anumite masuri, adica sa lanseze anumite comenzi de oprire/pornire/repornire a serviciilor sau chiar modificarea permisiunilor asupra fisierelor.
Nu avem de facut decat sa rulam ca root urmatoarea comanda:
group webmin if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
Ce se itampla?:
Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: monit 0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded. Need to get 254kB of archives. After this operation, 680kB of additional disk space will be used. Get:1 http://ftp.nb.lug.ro hardy/universe monit 1:4.8.1-2.1 [254kB] Fetched 254kB in 0s (1966kB/s) Selecting previously deselected package monit. (Reading database ... 79286 files and directories currently installed.) Unpacking monit (from .monit_1%3a4.8.1-2.1_i386.deb) ... Setting up monit (1:4.8.1-2.1) ... Starting daemon monitor: -e monit won't be started/stopped unless it it's configured please configure monit and then edit /etc/default/monit and set the "startup" variable to 1 in order to allow monit to start
Editam ulterior fisierul /etc/default/monit
startup=1
CHECK_INTERVALS=180
Editam /etc/monit/monitrc
set alert email@domeniu.ro #adresa de e-mail valida unde se vor trimite notificarile
include /etc/monit.d/*
Creem directorul /etc/monit.d si creem cate un fisier de configurare pentru diversele servicii ce urmeaza sa le monitorizam.
mkdir /etc/monit.d
cd /etc/monit.d/
Iata catvea exemple:
apache2:
check process apache2 with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 1.5 GB for 5 cycles then restart
if children > 50 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
group lamp
bind9:
check process named with pidfile /var/run/bind/run/named.pid
start program = "/etc/init.d/bind9 start"
stop program = "/etc/init.d/bind9 stop"
if failed host 127.0.0.1 port 53 type tcp protocol dns then alert
if failed host 127.0.0.1 port 53 type udp protocol dns then alert
if 5 restarts within 5 cycles then timeout
group dns
dovecot:
check process dovecot with pidfile /var/run/dovecot/master.pid
group mail
start program = "/etc/init.d/dovecot start"
stop program = "/etc/init.d/dovecot stop"
if 5 restarts within 5 cycles then timeout
if failed port 110 type TCP protocol POP then restart
if failed port 143 type TCP protocol IMAP then restart
mysql:
check process mysqld with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 1.0 GB for 5 cycles then restart
if children > 50 then restart
if failed host 127.0.0.1 port 3306 protocol mysql then restart
if 5 restarts within 5 cycles then timeout
if loadavg(5min) greater than 10 for 8 cycles then stop
group lamp
postfix:
check process postfix with pidfile /var/spool/postfix/pid/master.pid
group mail
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if failed port 25 protocol smtp then restart
if 5 restarts within 5 cycles then timeout
proftpd:
check process proftpd with pidfile /var/run/proftpd.pid
start program = "/etc/init.d/proftpd start"
stop program = "/etc/init.d/proftpd stop"
if failed port 21 protocol ftp then restart
if 5 restarts within 5 cycles then timeout
group lamp
ssh:
check process sshd with pidfile /var/run/sshd.pid
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
group ssh
webmin:
check process webmin with pidfile /var/webmin/miniserv.pid
group webmin
start program = "/etc/init.d/webmin start"
stop program = "/etc/init.d/webmin stop"
if failed host 127.0.0.1 port 8080 then restart
if 5 restarts within 5 cycles then timeout
check file webmin_rc with path /etc/init.d/webmin
group webmin if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor In final pornim serviciul de monitorizare
/etc/init.d/monit start
Starting daemon monitor: monit.