Date: Sun, 30 Sep 2001 19:55:49 +0200 From: Marc Haber To: net-acct@CoLi.Uni-SB.DE, net-acct@exorsus.net Cc: 112696@bugs.debian.org Subject: Re: lck=1 and never 0 again Message-ID: <20010930195549.B10410@torres.ka0.zugschlus.de> References: <20010919194645.A8027@torres.ka0.zugschlus.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010919194645.A8027@torres.ka0.zugschlus.de>; from mh@zugschlus.de on Wed, Sep 19, 2001 at 07:46:45PM +0200 Sender: owner-net-acct@CoLi.Uni-SB.DE Precedence: bulk Status: RO Content-Length: 5243 Lines: 172 On Wed, Sep 19, 2001 at 07:46:45PM +0200, Marc Haber wrote: > we are using net-acct for our ISP accounting system to collect the > data. We have been using net-acct for quite some time on a variety of > systems. Recently, we have been experiencing problems that net-acct > stops writing out data and takes all available CPU. > > 18/09 21:16:24 [STATE] flushing delayed (writing == 0, lck == 1, may_write == 1) > 18/09 21:17:25 [STATE] write_log called > > This usually repeats ad infinitum, but in this rare case, the daemon > got back on track, eventually writing out the data. We have inspected > the code and didn't find any occurences where lck is written to > without debug output, but obviously, lck was set to 1 somewhen between > 21:06:24 and 21:07:24, and reset between 21:17:25 and 21:18:24. > > This behavior can be reproduced by setting "large" timeout values for > flush (300) and fdelay (120) on our main accounting host. Please find attached a patch that lets write_log return a value indicating if it was successfully able to write. Only after a successful write, the next write cycle is scheduled. This way, with lck == 1, you get one attempt to acquire the lock per second instead per flush seconds. With this patch, we are now back on track, and it rarely happens that it takes longer than ten seconds (==tries) to acquire the lock. Maybe, it would be a good idea to have write_log called from a place where lck is definetely zero, for example from the end of register_packet. But I refrained from doing so because I was afraid of overlooking some aspects of the asynchronous function call and the forks happening during the write. It is up to people knowing the code better to comment about this. The proposed patch to register_packet is in the patch, but commented out. Greetings Marc patch follows diff -ur net-acct-0.71-glibc2/src/daemon.c net-acct-0.71/src/daemon.c --- net-acct-0.71-glibc2/src/daemon.c Thu May 7 16:10:38 1998 +++ net-acct-0.71/src/daemon.c Thu Sep 20 17:08:57 2001 @@ -42,10 +42,15 @@ { DEBUG(DBG_STATE, sprintf(dbg, "entering daemon_stop\n")); - write_log(1); - - DEBUG(DBG_STATE, sprintf(dbg, "wrote final log\n")); - + if( write_log(1) == 0 ) + { + DEBUG(DBG_STATE, sprintf(dbg, "wrote final log\n")); + } + else + { + DEBUG(DBG_STATE, sprintf(dbg, "writing final log unsuccessful\n")); + }; + unlink(PID_FILE); DEBUG(DBG_STATE, sprintf(dbg, "unlinked PID_FILE\n")); diff -ur net-acct-0.71-glibc2/src/netacct.h net-acct-0.71/src/netacct.h --- net-acct-0.71-glibc2/src/netacct.h Sun Sep 30 19:46:26 2001 +++ net-acct-0.71/src/netacct.h Thu Sep 20 16:40:00 2001 @@ -239,7 +239,7 @@ /* process.c */ void register_packet(unsigned long int src,unsigned long int dst, unsigned char proto, unsigned short srcport, unsigned short dstport, int size, char *devname, char *user); -void write_log(int force); +int write_log(int force); void alarm_handler(int sig); void child_finished(int sig); void signal_debug(int sig); diff -ur net-acct-0.71-glibc2/src/process.c net-acct-0.71/src/process.c --- net-acct-0.71-glibc2/src/process.c Sun Sep 30 19:46:26 2001 +++ net-acct-0.71/src/process.c Thu Sep 20 18:23:10 2001 @@ -47,6 +47,7 @@ int err_delay, max_err_delay; volatile time_t now; +static time_t next_write_log = 0; /* statistics */ unsigned int list_compares, list_lookups; @@ -97,12 +98,17 @@ may_write = 0; save2 = lck; lck = 1; - /* end */ + /* end */ + + DEBUG(DBG_STATE, sprintf(dbg, "reopen_socket [before]: lck = 1\n")); exit_capture(); init_capture(); lck = save2; + + DEBUG(DBG_STATE, sprintf(dbg, "reopen_socket [after]: lck = %d\n",save2)); + may_write = save1; } @@ -230,6 +236,15 @@ { packets->dropped++; } +/* + if(now >= next_write_log) + { + if( write_log(0) == 0) + { + next_write_log = now + cfg -> flush; + } + } + */ } int do_write_list(FILE *f, struct ipdata *list[]) @@ -585,7 +600,6 @@ void alarm_handler(int sig) { static time_t last_check = 0; - static time_t next_write_log = 0; DEBUG( ((sig == SIGALRM) ? DBG_ANNOYING : (DBG_SIGNAL | DBG_STATE)), sprintf(dbg, "got signal %d, handling\n", sig)); @@ -609,18 +623,21 @@ if(now >= next_write_log) { - write_log(0); - next_write_log = now + cfg -> flush; + if( write_log(0) == 0) + { + next_write_log = now + cfg -> flush; + } } alarm(1); } -void write_log(int force) +int write_log(int force) { struct ipdata *p, *q; static struct ipdata *tlist; /* temp */ int i; + int ret=0; DEBUG(DBG_STATE, sprintf(dbg, "write_log called\n")); @@ -739,7 +756,9 @@ else { DEBUG(DBG_STATE, sprintf(dbg, "flushing delayed (writing == %d, lck == %d, may_write == %d)\n",writing,lck,may_write)); + ret=1; } + return ret; } void signal_debug(int sig) -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Karlsruhe, Germany | lose things." Winona Ryder | Fon: *49 721 966 32 15 Nordisch by Nature | How to make an American Quilt | Fax: *49 721 966 31 29