Ticket #4372 (new enhancement)

Opened 6 months ago

Last modified 7 weeks ago

Timestamp-based log rotation class

Reported by: jasonjwwilliams Owned by: jasonjwwilliams
Priority: normal Milestone:
Component: core Keywords: logrotation twistd
Cc: thijs, mithrandi@…, exarkun, itamar@…, gxti@… Branch: branches/timestamp-logfile-rotation-4372
Author: jasonjwwilliams Launchpad Bug:

Description (last modified by exarkun) (diff)

(related to ticket:3360)

Twistd blocks for a long period of time when the number of log files built up is large, thereby causing connections to freeze and (depending on the length of the rotation) time out.

An alternate rotation scheme, where the rotation name for the active log file is selected based on a timestamp could remove the need for doing ever more work as the number of old logfiles increases.

(Changes in attached diff are against Twisted 10.0.0).

Attachments

logfile.py.wrapped_write.diff Download (3.7 KB) - added by jasonjwwilliams 6 months ago.
LogFIle.write() is now aware of rotationInProgress.
logfile.py.datestamp.diff Download (0.8 KB) - added by jasonjwwilliams 6 months ago.
Alternate implementation that does not defer, but uses datestamp for rotation.
logfile.py.diff Download (4.2 KB) - added by jasonjwwilliams 4 months ago.
Latest version of rotate in thread. Includes wrapped write improvements.
logfile.py.timestamp.diff Download (3.4 KB) - added by jasonjwwilliams 4 months ago.
Timestamp patch ready for review (20100428).
test_logfile.timestamp.diff Download (8.3 KB) - added by jasonjwwilliams 4 months ago.
Timestamp patch ready for review (20100428). This is the updated unit test file for twisted/test/

Change History

Changed 6 months ago by jasonjwwilliams

LogFIle.write() is now aware of rotationInProgress.

Changed 6 months ago by jasonjwwilliams

After about 2-4 hours of heavy logging, Python 2.5.2 seg faults on the original diff attached. Analyzing the core dumps it seemed to alternate between BaseLogFile._openFile and BaseLogFile.write. Operating under the assumption that a conflict was occurring during the rotation of the current log file, I added some checks to LogFile.write(). If current log is being rotated, write() blocks temporarily.

Changed 6 months ago by jasonjwwilliams

Alternate implementation that does not defer, but uses datestamp for rotation.

Changed 6 months ago by jasonjwwilliams

Per Maarten ter Huurne's suggestion as an alternate means of resolving this problem, logfile.py.datestamp.diff removes the log rotation code and abandons the logname.log.X convention (where X is an incrementing integer) in favor of a datestamp. The current log is rotated to logname.X.Y where X is the PID of the process and Y is the current UNIX timestamp.

Has the added benefit that all logs from the same instance run are grouped in order, and the logs show up in chronological order by default when using ls on the directory.

Changed 6 months ago by thijs

  • keywords review added
  • cc thijs added
  • owner glyph deleted

Putting it up for review as described on TwistedDevelopment.

Changed 6 months ago by mithrandi

  • cc mithrandi@… added

Changed 5 months ago by exarkun

Doing rotation in a thread sounds like a pretty scary scenario to me. I think we should strongly consider other options instead.

Changing the way rotation works sounds like a potentially good path to take. We should consider the backwards compatibility issues, though. Do we want to unilaterally change the logfile naming scheme? Make it an option? A flag to be set programmatically? Deprecate the old code?

(I didn't read the patch, so this is not a review; I'm just pointing out some issues with the goals being worked towards here.)

Changed 5 months ago by jasonjwwilliams

Frankly, I'm leaning strongly towards the second patch that changes the naming scheme and dumps rotation entirely.

I've been running both patches (rotation in a thread, and new naming scheme) on a couple of production machines (machine A has the first, machine B has the second) that were experiencing the long block times with log rotation.

The "rotation in a thread" patch occasionally causes CPython 2.5.2 to core...this is running the logfile.py.wrapped_write.diff which was written to reduce the coring.

That said the "new log naming" scheme patch has been running strong for a week now with now problems, and has completely eliminated the timeouts while log rotation occurs.

Changed 5 months ago by exarkun

  • keywords review removed
  • owner set to jasonjwwilliams
  • cc exarkun added

Hi Jason, thanks for your work on this.

I'm just going to review the datestamp patch. I don't think the threading approach is viable.

  1. There are some compatibility concerns with this change.
    1. The Python API twisted.python.logfile.LogFile provides certain behavior in trunk now. Twisted takes backwards compatibility pretty seriously, and part of what that means here is that just changing the behavior of this existing API is probably not an acceptable way to fix this bug. The way to mitigate this is probably to either add a flag to LogFile to control which rotation behavior it uses, or to introduce an entirely new class similar to LogFile but with the new behavior.
    2. The behavior of the "twistd" script itself also provides certain behavior in trunk. Since this isn't a Python API, the compatibility concerns are slightly lessened, but they still need to be kept in mind. Changing the default behavior might lead to problems on existing deployments which are relying on the current logfile naming/rotation scheme. The way to mitigate this is probably to introduce a new command-line option to twistd for selecting the new rotation behavior.
  2. Testing is an important part of Twisted development
    1. The patch, as-is, breaks a couple existing unit tests. All unit tests need to be passing all the time. If the compatibility issues above are addressed, though, then these tests will probably begin passing again (and their pass/fail state is a good way to know whether the necessary compatibility is being preserved).
    2. There are no new unit tests for the new behavior implemented here. All code must be covered by unit tests in order to be included in trunk.
  3. I suspect that it would still be useful to implement the maxRotatedFiles feature with this new rotation scheme.

Please don't hesitate to ask if any of this is unclear, or if you get stuck on any of the compatibility or testing pieces. Thanks again.

Changed 4 months ago by jasonjwwilliams

Latest version of rotate in thread. Includes wrapped write improvements.

Changed 4 months ago by jasonjwwilliams

Timestamp patch ready for review (20100428).

Changed 4 months ago by jasonjwwilliams

Timestamp patch ready for review (20100428). This is the updated unit test file for twisted/test/

Changed 4 months ago by jasonjwwilliams

  • keywords review added
  • owner jasonjwwilliams deleted

Timestamp patch for logfile.py and patch for matching unit tests in test_logfile.py are attached. Passes unit tests.

Changed 4 months ago by jasonjwwilliams

The unit tests for the new class had to be overhauled a little from the regular tests. The old tests expected a specific naming suffix. The tests for the new class check to make sure the right number of log files are present for the number of rotations (and that they have the right data in them).

Changed 3 months ago by exarkun

  • branch set to branches/timestamp-logfile-rotation-4372
  • branch_author changed from jasonjwwilliams to exarkun, jasonjwwilliams

(In [29176]) Branching to 'timestamp-logfile-rotation-4372'

Changed 3 months ago by exarkun

(In [29177]) Apply logfile.py.timestamp.diff and test_logfile.timestamp.diff

refs #4372

Changed 3 months ago by exarkun

  • branch_author changed from exarkun, jasonjwwilliams to jasonjwwilliams

Changed 3 months ago by itamar

I suggest making the timestamp format configurable, or even better have overrideable function for generating the filenames. This will make the class useful for people who require specific filename structures.

Changed 3 months ago by itamar

  • cc itamar@… added

Changed 3 months ago by exarkun

I suggest making the timestamp format configurable, or even better have overrideable function for generating the filenames. This will make the class useful for people who require specific filename structures.

Meh. I'm skeptical. I don't know how much effort its worth sinking into fancy log rotation schemes. There are plenty of third-party rotation tools. Do we need to duplicate all of their functionality?

Changed 3 months ago by itamar

It seems like a fairly minor change (3 extra lines of code, 2 changed, plus whitespace?) that makes the code more flexible.

This is a general point: we should try to do small incremental efforts needed to make stuff customizable, if it allows users to reuse functionality in cases where they need slightly different output. Because otherwise people have to copy/paste or just reimplement code we've already written.

Changed 3 months ago by exarkun

  • type changed from defect to enhancement
  • description modified (diff)
  • summary changed from Log Rotation Blocks w/ Large Number of Log Files to Timestamp-based log rotation class

This is a general point: we should try to do small incremental efforts needed to make stuff customizable

Certainly. But as the current patch stands, it doesn't add any

Changed 3 months ago by itamar

"Any" what? I think you hit submit too soon or something :)

Regardless, I don't consider this a blocker for commit. If you don't want to do it now I suggest opening separate ticket to make the format customizable when you close ticket, marking it easy.

However, I do think the use of pid is problematic. If you're only running one server on the machine, the PID adds nothing and just makes it harder to figure out ordering of log files. If you're running more than one server on same machine, you would never want to put log files in same directory anyway with same base filename, because changing PIDs make it impossible to say which of the N servers a file belongs to using the PID. So, multiple servers are distinguished based on base filename, or simply by using multiple folders.

Either way, use of PID adds nothing, and makes things a bit more complicated for administrators dealing with the logs. Instead, we should just be appending to existing log files if we end up re-using one. Downside of that is you may end up appending to corrupted logfile, but given we flush on each write and writes are much smaller than blocks that is very unlikely.

Changed 3 months ago by exarkun

Meh. I just meant to change the metadata, I didn't mean to add any comment.

Changed 3 months ago by itamar

Oh, and the default timestamp output format really ought to be human readable string (in UTC, not potentially changing local timezone), not a mysterious integer.

All these observations are based on interactions with ITA ops people.

Changed 3 months ago by jasonjwwilliams

Re: timezone that's an ops decision/issue as far as I'm concerned. We enforce UTC at the OS-level across all of our infrastructure due to the many timezones we operate in. If you have to deal with many timezones and aren't doing this you're gonna run into more issues than this one.

I'd prefer not to make the filename configurable. Given this is a new feature/class, there's no legacy filenaming scheme they have to maintain compatibility with, and it increases the surface area for things to break. The rotation code makes certain assumptions about the file name scheme (as does the old rotator).

The PID is in there because we've found it very useful in our infrastructure. Often, you'll have an app acting up and restart it, having the PID in the filename allows you to focus on just the logs for the run of the app that was problematic. Further, if you're aggregating logs into a single location it helps reduce collision.

Changed 2 months ago by gxti

  • owner set to jasonjwwilliams
  • keywords review removed

I agree on not having PIDs (by default) -- they break ordering, aren't sufficiently distinct to avoid collisions and don't tell you which machine the log came from anyway.

I also agree about using a human-readable timestamp format although I agree with Jason that it should be localtime to follow the principle of least surprise. Good sysadmins will set their servers to be UTC, but it can be very confusing to find logs that are in UTC when the system is not.

Most importantly, both of these differences in opinion can be resolved amicably with a customizable filename format. Perhaps the most straightforward solution is to farm out the generation of filenames to a tiny method and let the consumer override it if they want a particular behavior (PIDs, hostname, phase of the moon) and provide a reasonable default (a filename-friendly ISO-formatted timestamp). I don't like the idea of using a python formatting string for this simply because it requires extending twisted every time someone wants some new thing in their filenames. Either that or *that* method is easily subclassable at which point you might as well have done the first thing.

This will require cleaning up listLogs to not depend so much on the filename format. Don't dissect the filename then immediately reassemble it in the caller as it just makes the code more fragile. It's best to just return a list of paths.

Changed 2 months ago by gxti

  • cc gxti@… added

Changed 7 weeks ago by exarkun

  • keywords logrotation twistd added; log rotation removed
Note: See TracTickets for help on using tickets.