BB Unix Network Monitor - Message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {bb} Problems monitoring SMTP Service



On Wed, 2005-08-31 at 07:54, Dirk H. Schulz wrote:
> Hi Philip,
> 
> One example from this night:
> 
> Aug 31 00:52:08 smtpmachine postfix/smtpd[17060]: connect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:52:08 smtpmachine postfix/smtpd[17060]: disconnect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:52:26 smtpmachine postfix/smtpd[17060]: connect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:52:26 smtpmachine postfix/smtpd[17060]: disconnect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:52:40 smtpmachine postfix/smtpd[17060]: connect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:52:40 smtpmachine postfix/smtpd[17060]: disconnect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:56:48 smtpmachine postfix/smtpd[17208]: connect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:56:48 smtpmachine postfix/smtpd[17208]: disconnect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:56:51 smtpmachine postfix/smtpd[17208]: connect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:56:51 smtpmachine postfix/smtpd[17208]: disconnect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:56:53 smtpmachine postfix/smtpd[17208]: connect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> Aug 31 00:56:53 smtpmachine postfix/smtpd[17208]: disconnect from 
> bbnet.domain.tld[xxx.xxx.xxx.xxx]
> 
> 
> At 00:52:07 BBDISPLAY claims smtpmachine's smtp service to be down for 
> 0:04:45. The times should be synchronized quite well since they all 
> synchronize daily against the same ntp server.

It appears that your BBNET is actually testing 3 times each BBSLEEP
period. Is this what you intended? If not, do you have this SMTP
server listed more than once in your bb-hosts file?

If you want to display the machine on multiple HTML pages, only the
first entry in your bb-hosts file should specify the tests that you
want. The second (and subsequent) lines for this host are only
place-holders and should only have the "noconn" directive.

> It is "the internet", simply. Is is two different colocation centers 
> with very different upstream providers.

Because of this, you're actually "testing the internet" at the
same time. If you are doing this to receive warning of a problem,
you might want to alter your paging rules so that you only get an
alert after two or three "down" reports. If you're trying to
generate "availability" type reporting, I doubt that it is
possible to get accurate statistics without either testing
from multiple locations or testing at the co-location centre.

> Yes, on smtpmachine ssh is tested as well. It also has regular problems, 
> but not that much. With smtp they occur every 20 to 120 minutes, with 
> ssh every few days. And the ssh service does not get a red "down" dot, 
> but a black "unavailable".

This suggests to me that it's likely to be caused by delays on the
internet exceeding the BB time-out values. The default values work
well for tests on local networks, but you may need to increase the
values of the BBNETTIMER variables in bbdef-server.sh. If you also
have a large number of tests on local networks, it might be better
to create a secondary BBNET (with longer time-out values) just
for testing remote hosts.

Cheers, Phil.



-- 
Vail's Second Axiom: The amount of work to be done increases in
proportion to
the amount of work already completed.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=
To unsubscribe from this list, or to subscribe to the bb-digest list
send e-mail to mailto:majordomo@bb4.com with unsubscribe bb -and/or-
subscribe bb-digest in the BODY of the message.


Home | Main Index | Thread Index