BB Unix Network Monitor - Message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {bb} problem monitoring kpop after upgrade, is buggy bbnet



On Sun, 26 Sep 1999, Robert-Andre Croteau wrote:
> Henrik Olsen wrote:
> > 
> > On Sat, 25 Sep 1999, Tracy J. Di Marco White wrote:
> > > Henrik Olsen wrote:
> > > }Not knowing kpop, I can't tell exactly why the delay is so long, but for
> > > }other services it's often a failing reverse dns lookup or a failing
> > > }auth/ident request, either of which might take a timeout before the
> > > }service responds.
> > Actually, my comments on the different timeouts between 1.09b and 1.2b
> > where wrong, I was apparently looking at the 1.08a source instead of
> > 1.09b:(
> 
> Well, the 1.09b and 1.2b versions I have are fairly different :(
> 
> > 
> > There has been changes to the timeout handling but they both use the 3, 5,
> > 12 approach, the difference between the versions is that 1.2b has a bug
> > in handling timeouts when not receiving any data. :)
> > 
> > Sean/Rob, it looks like Tracy found a bug in bbnet, this patch will fix
> > the problem:
> 
> I'm not sure but I'm open to all comments...
> 
> I've look at the previous messages to this thread and something caught
> my eye:
> 
> I'm not sure if that's it:
> # time /local2/bb-1.09b/bin/bbnet mailhub:1109
> 0.013u 0.016s 0:03.14 0.6% 0+1k 0+0io 0pf+0w
>               ^^^^^^^
> # time /local2/bb-1.2b/bin/bbnet mailhub:1109
> bbnet: TIMEOUT mailhub PORT 1109...
> 0.003u 0.018s 0:20.04 0.0% 0+2k 0+0io 0pf+0w
>               ^^^^^^^
> 
> Is this elapsed ?
> 
> If so, then it makes some sense,
> 
> 1.09b would only go thru a 3seconds delay and quit while the 1.2b
> would go thru the 3, 5 and 12 seconds delay for 20 seconds.
> 
> Now why would the 1.09b take only one 3 second delay ?  hummmm, let's
> see:

It took a 3 second delay because it did:
        signal(SIGALRM, (void *)noinput);
        alarm(3);
to unconditionally exit(0) after a 3 second timeout.

> 
> in bbnet.c on 1.09b, the variable cnt is defined in timeout() as
> 
> static int cnt;
> 
> Maybe it wasn't initialized with 0 upon execution of the code.  So if
> cnt >
> 2 at program startup (compiler SHOULD have initialized to 0 but there
> might also be a bug in the compiler ...) then in the 1.09b version the
> timeout() function would only be called once:
> 
> 1.09b bbnet.c
> 
> timeout()
> {
>         static int cnt;
>         ^^^^^^^^^^^^^^^  cnt should be 0 at startup
> 
>         cnt++;
>         ^^^^^^  if cnt is initialized with > 2
>         if (cnt == 1) timer = 5;
>         else if(cnt == 2) timer = 12;
>         else {
>         ^^^^^^^^^^^ then it would jump to this else immediately and
>         ^^^^^^^^^^^  report an elapsed time of 3 seconds...  cause 
>         ^^^^^^^^^^^  timer 5 & 12 were skipped
>                 fprintf(stderr, "bbnet: TIMEOUT %s PORT %d...\n",
> machine, port);
>                 exit(3);
>         }
>         longjmp(env,1);
> }
> 
> > 245c245
> > <       if (setjmp(env) != 0) {
> > ---
> > >       if (setjmp(env) != 0) { /* RETURN OK BUT INCLUDE MESSAGE */
> > 247,248c247,248
> > <                 debug("SETTING TIMER TO %d\n", timer);
> > <               if (timer == 99) return(99);
> > ---
> > >               printf("*** bbnet: Stop waiting for server data\n");
> > >               return(0);              /* PRETEND IT'S OK */
> 
> That's my take on it.
nononononono:)

The bug is that the timeout taken when the data isn't returned, is the one
with the setjpm called BEFORE the first call to recv, not the one set
inside the loop, since it's recv that isn't returning, so the [sig]setjmp 
inside the loop is never reached.

To say it slightly differently, (and to put back the 20 seconds timeout
you seem to want).  Change
#ifdef SIGSETJMP
        if (sigsetjmp(env,1) != 0) {
#else
        if (setjmp(env) != 0) {
#endif
                debug("SETTING TIMER TO %d\n", timer);
                if (timer == 99) return(99);
        }
        signal(SIGALRM, (void *)timeout);
        debug("PAUSE FOR RETURN\n");
        alarm(timer);

        while ( (n = recv(sockfd, line, MAXLINE, 0)) > 0 ) {

to:

#ifdef SIGSETJMP
        if (sigsetjmp(env,1) != 0) {
#else
        if (setjmp(env) != 0) {
#endif
                debug("SETTING TIMER TO %d\n", timer);
                if (timer == 99) {
                        printf("*** bbnet: Stop waiting for server data\n");
                        return(0);              /* PRETEND IT'S OK */
		}
        }
        signal(SIGALRM, (void *)timeout);
        debug("PAUSE FOR RETURN\n");
        alarm(timer);

        while ( (n = recv(sockfd, line, MAXLINE, 0)) > 0 ) {

Because it is there the SIGALRM will take you, NOT to the one set inside
the while(recv()) loop.

> 
> I think the bug was in 1.09b returning too quickly and that
> version 1.2b behaves correctly but I could be wrong ;)
> 
> Tracy,  in the 1.09b code can you change
> 
> static int cnt;
> 
> 	to
> 
> static int cnt = 0;
> 
> recompile and try the bbnet test again, let us know the result
> You could even have a printf("\ncnt = %d\n",cnt); as the
> first statement in the timeout function just to make sure
> what the value of cnt is... maybe do it before the = 0 change
> to confirm the bug on 1.09b (well, sometimes printf will mask
> bugs also so I wouldn't rely on it...)
> 
> bye
> -- 
> Robert-Andre Croteau	BSD,MOTU		robert@unix.sh
> Services Conseils Informatiques MOTU Inc. 	robert@motu.ca
> (514) 465-3057					rcroteau@videotron.ca
> http://www.motu.ca/                             http://www.bb4.com
> 	Si le bonheur ne s'achete pas alors louez le.
> -
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=
> To unsubscribe from this list, or to subscribe to the bb-digest list
> send e-mail to mailto:majordomo@bb4.com with unsubscribe bb -and/or-
> subscribe bb-digest in the BODY of the message.
> 

-- 
Henrik Olsen,  Dawn Solutions I/S       URL=http://www.iaeste.dk/~henrik/
 `Can you count, Banjo?' He looked smug. `Yes, miss. On m'fingers, miss.'
 `So you can count up to ...?' Susan prompted.
 `Thirteen, miss,' said Banjo proudly.         Terry Pratchett, Hogfather


-
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=
To unsubscribe from this list, or to subscribe to the bb-digest list
send e-mail to mailto:majordomo@bb4.com with unsubscribe bb -and/or-
subscribe bb-digest in the BODY of the message.


Home | Main Index | Thread Index