I recently wrote a BB external script to test the speed of delivery. The hardest part of the scripting was dealing with arithmetic in ksh. I'm not sure the script is really appropriate for distribution because there are items that are unique to my situation.... but I'm quite willing to offer sections of the script to help others get started on their own tests....
To that end, here's a lengthy reply.....
Some basic ideas:
1) I wanted to fetch a known page, one that would not be changing, therefore changing times. I picked a page that was ~32K long and with all the trimmings/decoration (jpgs, css, etc,) totalled ~55K. I copied all of these pieces to a new location in my web server tree.
2) I wanted to test the speed of the web service, not the speed of the network. Therefore, this test runs on the web server itself. It also connects on localhost (the loopback port) to eliminate network contention. This also required an update to the Apache configuration to have it listen on 127.0.0.1:80 for server name localhost.
3) To get the page plus all its trimmings, I found the program wget was what I needed. Unfortunately, wget gets copies of the pages and writes to the local disk. With the right options, wget will NOT create the directory structure and WILL delete the files after the fetch. I did NOT find an option that would fetch all 55K w/o writing anything -- I looked because I would also like to eliminate disk write/delete times from the test, but no go so far.
4) I use timex to determine the transfer time. Times reports in ss.ss format, or mm:ss.ss when more than 59.99 seconds, or hh:mm:ss.ss when time is an hour or more.
5) ksh does integer arithmetic. My ksh book says some ksh may be extended to include a real type, but not mine. 100/ths of seconds are a poor unit of measure, so I removed the decimal point and added a 0 to make msec units.
6) I wanted to get an "average" time, so I fetch the test page three times. My performance is wildly sporadic (the whole reason to write this test) and I found that sometimes the page loaded in 0.13 seconds, sometimes in 4 minutes 27.13 seconds. When any given retrieval exceeds 60 seconds, I KNOW I have a problem, so I stop doing further fetches and report what was collected to that point.
6a) Yeah, you're right, given 0.13 as a real possibility, anything over 10 seconds probably indicates a REAL problem....
7) I had to pick values for yellow and red, so yellow is 7 seconds, red is 15 (for the average time or 60 seconds for any individual time). Just wild guesses...
8) I am NOT a ksh whiz -- so I often use brute force where a whiz might see something elegant. I'm willing to learn, but I'm not very concerned. What I do works... ;-)
Script Specifics:
I set up some variables:
OFile="/tmp/wget.$$"
# a place I can write, then delete. PID should make
name unique
Host=localhost
# or $MACHINEDOTS as you wish
URL=""
# the address of the test page
Test="resptime"
# use a shorter name to make BB display
nicer
Yellow=7000
# time
in milliseconds
Red=15000
WGET="/usr/opt/bin/wget"
I wrote a small function to perform
the actual test:
dotimex () {
timex $WGET -q -p -nd --delete-after -C off $URL 2>
$OFile
Time=`awk ' /real/ { print $2 } ' $OFile | tr -d .
`0
fixtime
rm $OFile
}
The wget options say to suppress wget's
statistics, fetch page requisites (the css, and other page pieces), do
not create directories, delete files after fetching (won't delete directories,
only files, thus the -nd), and tell server not to use cached files.
The fixtime function finds the mm:ss.ss and hh:mm:ss.ss formats and converts them to msec. Note in the "Time=" statement that I remove the decimal point and concatenate a zero, so the second part of the data are already msec. Here's fixtime:
fixtime () {
echo $Time | grep : > /dev/null
if [[ $? -eq 0 ]] ; then
# there might be one two or three fields
P[0]=`echo $Time | cut -d ":" -f1`
P[1]=`echo $Time | cut -d ":" -f2`
P[2]=`echo $Time | cut -d ":" -f3`
if [[ "X${P[1]}" = "X"
]] ; then # if the second field is empty,
there is only one field
a=b
#
place holder -- is there a FORTRAN "continue"?
elif [[ "X${P[2]}" = "X"
]] ; then # if the third field is empty,
there are only two fields
let p1=${P[0]}*60*1000
let p2=${P[1]}
let Time=$p1+$p2
else
# Three
fields, use them all
let p1=${P[0]}*60*60*1000
let p2=${P[1]}*60*1000
let p3=${P[2]}
let Time=$p1+$p2+$p3
fi
fi
}
At this point, the value in "Time"
is in milliseconds. So I invoke the test, like so:
dotimex
Time1=$Time
Time1S=`msec_to_sec $Time1`
if [[ $Time1 -ge 60000 ]] ; then
Status=red
FLine="
Tested host $Host
The first fetch took more than 60 seconds.
Time1 was $Time1S sec.
"
echo $BB $BBDISP "status $MACHINE.$Test
$Status `date` $FLine"
exit
fi
OK, so some folks complained they didn't like reading msec. One actually supplied a function to put a decimal point back in in the right place. That's the msec_to_sec function:
function msec_to_sec {
num=$1
if [[ $num -lt 1000 ]]; then
num="0000$num"
fi
if [[ $1 != *.* ]]; then
num="$num."
fi
echo "$num" | sed 's/\([0-9][0-9][0-9]\)\./.\1/;
s/^00*//; s/^\./0./'
}
Wherever I am going to display the time
value, I use the same variable name with an "S" appended, then
write this second value.
And, from my use of $BBDISP, $MACHINEDOTS, etc, it is assumed that the full BB environment is defined, as it is when the script is launched by BB.
So I do the "dotimex" sequence three times, accumulating Time1, Time2, and Time3. Each time, the amount of data returned in FLine gets a little longer.
let TOTTime=$Time1+$Time2+$Time3
TOTTimeS=`msec_to_sec $TOTTime`
let AVTime=$TOTTime/3
AVTimeS=`msec_to_sec $AVTime`
if [[ $AVTime -lt $Yellow ]] ; then
Status=green
elif [[ $AVTime -lt $Red ]] ; then
Status=yellow
else
Status=Red
fi
FLine="
Tested host $Host
Yellow and Red limits are $YellowS and $RedS
The three times are $Time1S sec $Time2S sec $Time3S
sec
The average time is $AVTimeS sec
The total time is $TOTTimeS sec
"
echo $BB $BBDISP "status $MACHINE.$Test $Status
`date` $FLine"
So that's the essence of my test.
What is MISSING is error capture and recovery: What happens when wget cannot contact the web server (as in I halted apache)? What happens when Apache gives wget a 404 or 403 or 500 error? Well, that's why this script is not ready for the real world!
In my case, other BB tests are likely indicating the reason for these failures, so it is not worth my time hardening this script for my use.
Your milage may vary.
Good luck, Dan
| Henrik Storner <henrik-bb@hswn.dk>
Sent by: owner-bb@bb4.com 01/20/2005 12:38 AM
|
To: bb@bb4.com cc: Subject: Re: {bb} response time monitoring |