BB Unix Network Monitor - Message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {bb} response time monitoring




Hi Hiroaki,

        I recently wrote a BB external script to test the speed of delivery.  The hardest part of the scripting was dealing with arithmetic in ksh.  I'm not sure the script is really appropriate for distribution because there are items that are unique to my situation.... but I'm quite willing to offer sections of the script to help others get started on their own tests....

To that end, here's a lengthy reply.....

Some basic ideas:  

1)  I wanted to fetch a known page, one that would not be changing, therefore changing times.  I picked a page that was ~32K long and with all the trimmings/decoration (jpgs, css, etc,) totalled ~55K.  I copied all of these pieces to a new location in my web server tree.

2)  I wanted to test the speed of the web service, not the speed of the network.  Therefore, this test runs on the web server itself.  It also connects on localhost (the loopback port) to eliminate network contention.  This also required an update to the Apache configuration to have it listen on 127.0.0.1:80 for server name localhost.

3)  To get the page plus all its trimmings, I found the program wget was what I needed.  Unfortunately, wget gets copies of the pages and writes to the local disk.  With the right options, wget will NOT create the directory structure and WILL delete the files after the fetch.  I did NOT find an option that would fetch all 55K w/o writing anything -- I looked because I would also like to eliminate disk write/delete times from the test, but no go so far.

4)  I use timex to determine the transfer time.  Times reports in ss.ss format, or mm:ss.ss when more than 59.99 seconds, or hh:mm:ss.ss when time is an hour or more.

5)  ksh does integer arithmetic.  My ksh book says some ksh may be extended to include a real type, but not mine.  100/ths of seconds are a poor unit of measure, so I removed the decimal point and added a 0 to make msec units.

6)  I wanted to get an "average" time, so I fetch the test page three times.  My performance is wildly sporadic (the whole reason to write this test) and I found that sometimes the page loaded in 0.13 seconds, sometimes in 4 minutes 27.13 seconds.  When any given retrieval exceeds 60 seconds, I KNOW I have a problem, so I stop doing further fetches and report what was collected to that point.

6a)  Yeah, you're right, given 0.13 as a real possibility, anything over 10 seconds probably indicates a REAL problem....

7)  I had to pick values for yellow and red, so yellow is 7 seconds, red is 15 (for the average time or 60 seconds for any individual time).  Just wild guesses...

8)  I am NOT a ksh whiz -- so I often use brute force where a whiz might see something elegant.  I'm willing to learn, but I'm not very concerned.  What I do works...  ;-)

Script Specifics:

I set up some variables:

OFile="/tmp/wget.$$"        # a place I can write, then delete.  PID should make name unique
Host=localhost                # or $MACHINEDOTS as you wish
URL=""        # the address of the test page
Test="resptime"                # use a shorter name to make BB display nicer
Yellow=7000                        # time in milliseconds
Red=15000
WGET="/usr/opt/bin/wget"

I wrote a small function to perform the actual test:

dotimex () {
timex $WGET -q -p -nd --delete-after -C off $URL 2> $OFile
Time=`awk ' /real/ { print $2 } ' $OFile | tr -d . `0
fixtime
rm $OFile
}

The wget options say to suppress wget's statistics, fetch page requisites (the css, and other page pieces), do not create directories, delete files after fetching (won't delete directories, only files, thus the -nd), and tell server not to use cached files.

The fixtime function finds the mm:ss.ss and hh:mm:ss.ss formats and converts them to msec.  Note in the "Time=" statement that I remove the decimal point and concatenate a zero, so the second part of the data are already msec.  Here's fixtime:

fixtime () {
echo $Time | grep : > /dev/null
if [[ $? -eq 0 ]] ; then                # there might be one two or three fields
  P[0]=`echo $Time | cut -d ":" -f1`
  P[1]=`echo $Time | cut -d ":" -f2`
  P[2]=`echo $Time | cut -d ":" -f3`
  if [[ "X${P[1]}" = "X" ]] ; then        # if the second field is empty, there is only one field
    a=b                                        # place holder -- is there a FORTRAN "continue"?
  elif [[ "X${P[2]}" = "X" ]] ; then        # if the third field is empty, there are only two fields
    let p1=${P[0]}*60*1000
    let p2=${P[1]}
    let Time=$p1+$p2
  else                                        # Three fields, use them all
    let p1=${P[0]}*60*60*1000
    let p2=${P[1]}*60*1000
    let p3=${P[2]}
    let Time=$p1+$p2+$p3
  fi
fi
}

At this point, the value in "Time" is in milliseconds.  So I invoke the test, like so:

dotimex
Time1=$Time
Time1S=`msec_to_sec $Time1`
if [[ $Time1 -ge 60000 ]] ; then
  Status=red
  FLine="
Tested host $Host

The first fetch took more than 60 seconds.
Time1 was $Time1S sec.
"
echo  $BB $BBDISP "status $MACHINE.$Test $Status `date` $FLine"
  exit
fi

OK, so some folks complained they didn't like reading msec.  One actually supplied a function to put a decimal point back in in the right place.  That's the msec_to_sec function:

function msec_to_sec {
    num=$1
    if [[ $num -lt 1000 ]]; then
        num="0000$num"
    fi
    if [[ $1 != *.* ]]; then
        num="$num."
    fi
    echo "$num" | sed 's/\([0-9][0-9][0-9]\)\./.\1/; s/^00*//; s/^\./0./'
}

Wherever I am going to display the time value, I use the same variable name with an "S" appended, then write this second value.

And, from my use of $BBDISP, $MACHINEDOTS, etc, it is assumed that the full BB environment is defined, as it is when the script is launched by BB.

So I do the "dotimex" sequence three times, accumulating Time1, Time2, and Time3.  Each time, the amount of data returned in FLine gets a little longer.

let TOTTime=$Time1+$Time2+$Time3
TOTTimeS=`msec_to_sec $TOTTime`
let AVTime=$TOTTime/3
AVTimeS=`msec_to_sec $AVTime`
if  [[ $AVTime -lt $Yellow ]] ; then
  Status=green
elif [[ $AVTime -lt $Red ]] ; then
  Status=yellow
else
  Status=Red
fi
FLine="
Tested host $Host

Yellow and Red limits are $YellowS and $RedS
The three times are $Time1S sec $Time2S sec $Time3S sec
The average time is $AVTimeS sec
The total time is $TOTTimeS sec
"
echo $BB $BBDISP "status $MACHINE.$Test $Status `date` $FLine"

So that's the essence of my test.  

What is MISSING is error capture and recovery:  What happens when wget cannot contact the web server (as in I halted apache)?  What happens when Apache gives wget a 404 or 403 or 500 error?  Well, that's why this script is not ready for the real world!

In my case, other BB tests are likely indicating the reason for these failures, so it is not worth my time hardening this script for my use.

Your milage may vary.

Good luck,   Dan




Henrik Storner <henrik-bb@hswn.dk>
Sent by: owner-bb@bb4.com

01/20/2005 12:38 AM
Please respond to bb

       
        To:        bb@bb4.com
        cc:        
        Subject:        Re: {bb} response time monitoring



In <BHEBIPILGCPGEPFDGELJKECICEAA.hmorishita@sis.seino.co.jp> "Hiroaki  Morishita" <hmorishita@sis.seino.co.jp> writes:

>I'm running bb1.9e and bbgen3.2 on Redhat EL3.

>I'd like to monitor response time of services.
>Is there any way to make a status color different
>if it takes longer than, for example, 5 seconds to get a http response?

No, not currently. BB (and bbgen) only checks if the service
responds - it doesn't look at the response-time.

It would be a fairly simple thing to do, though, so I might
add it in a future version of bbgen (now called "Hobbit").


Regards,
Henrik
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=
To unsubscribe from this list, or to subscribe to the bb-digest list
send e-mail to mailto:majordomo@bb4.com with unsubscribe bb -and/or-
subscribe bb-digest in the BODY of the message.


Home | Main Index | Thread Index