Thursday, 1 March 2007

Apache piped ErrorLog broken

I came across this frustrating little Apache bug today: if you configure your error log to be a pipe to an external command, the external process is not killed when doing a graceful restart of the server, so you end up with redundant processes building up (and holding old logfiles open), one per restart.


How it works is: let's say you have what appears to be a popular configuration these days, with:

ErrorLog "|/usr/sbin/cronolog /var/log/apache/error.%Y%m%d.log"

When you do a graceful restart, apache starts the new cronolog first, attaches this to its stderr (closing the handle to the old cronolog), then kills the old child process. But: the new cronolog inherits apache's original stderr, which points to the old cronolog. And instead of killing the old one, apache kills the immediate child process, which is a sh (because apache passes the ErrorLog command line to be interpreted by the shell); leaving the cronolog behind. So the old cronolog is not killed, and although apache closes the filehandle that it is reading from, the new cronolog process is still holding it open.

Once you have done a few graceful restarts, you will have a whole tree of cronolog's (or other external logging program). The first cronolog reads from the second cronolog's stderr, which is reading from the third cronolog's stderr, which is reading from … which is reading from apache's stderr. So if you kill the penultimate cronolog, the whole stack of stale processes unwinds (each cronolog exits when the last process writing to its stdin exits).

Given that cronolog has been popular for > 5 years, and apache itself ships with a similar rotatelogs script, I'm surprised this bug has lasted so long. I have tested that it occurs with Apache 1.3 on both FreeBSD and Linux. It appears to have been fixed only in October, so I guess only Apache 2.2 and up includes the fix?


But I seem to have a workaround:

ErrorLog "|exec /usr/sbin/cronolog /var/log/apache/error.%Y%m%d.log"

… a favourite trick of mine, for avoiding all those sh processes kicking around, which here happens also to make everything work: because there is now only one direct child process, so apache now kills the cronolog when it has completed its restart. Disclaimer: I have only tried this in my test setup so far.

There is just one hit on Google for "exec cronolog": and it seems related (but they haven't realised there is a problem with restarts). But it seems to me that all the sites recommending cronolog need to update to make this their suggested configuration.

0 comments:

Post a Comment