Opened 10 years ago

Last modified 4 years ago

#727 enhancement new

Compression for rotated log files

Reported by: itamarst Owned by:
Priority: normal Milestone:
Component: core Keywords: twistd
Cc: itamarst, djk121, therve, exarkun Branch: branches/compressed-logfiles-727
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description


Attachments (1)

727_logfile.diff (3.8 KB) - added by therve 8 years ago.

Download all attachments as: .zip

Change History (21)

Changed 8 years ago by therve

comment:1 Changed 8 years ago by therve

  • Cc therve added

I tried a simple implementation of this problem. Do not use this, it's just a sample of what can be done. I guess I have to use producer/consumer to create the compressed file.

TODO:

  • use consumer/producer
  • compress DailyLogFile

comment:2 Changed 7 years ago by therve

  • author set to therve
  • Branch set to branches/compressed-logfiles-727

(In [22073]) Branching to 'compressed-logfiles-727'

comment:3 Changed 7 years ago by therve

(In [22074]) Add compressed version of LogFile and DailyLogFile, for gzip and bz2.

Refs #727

comment:4 Changed 7 years ago by therve

  • Owner changed from itamarst to therve

comment:5 Changed 7 years ago by therve

(In [22084]) Try to update both log file and compressed version.

Unfortunately, it doesn't really work for bz2.

Refs #727

comment:6 Changed 7 years ago by therve

(In [22086]) Partial revert of r22084: I'll try something else.

Refs #727

comment:7 Changed 7 years ago by therve

(In [22087]) Use callInThread to call the compression.

Refs #727

comment:8 Changed 7 years ago by therve

(In [22088]) Make some cleanups

Refs #727

comment:9 Changed 7 years ago by therve

  • Keywords review added
  • Owner therve deleted
  • Priority changed from normal to highest

This is ready to review.

comment:10 follow-up: Changed 7 years ago by exarkun

  • Keywords review removed
  • Owner set to therve
  • twisted/python/logfile.py
    • I'm not sure why LogReader ever existed in the first place. Maybe it had something to do with buffering or something, but xreadlines was around a long time ago. It'd be nice to get rid of this and just say getLog returns a file-like object. Maybe that's not really related to this branch (although this does add two new LogReader subclasses). Anyhow, I'm not even really sure that getLog was ever actually useful for anyone or anything.
    • CompressorHelper seems too clever. A mixin would work just as well here, wouldn't it? Also, mixing in object to classic hierarchies is scary (it causes problems if you ever decide to make the top new-style, which I hope we will decide to do someday).
    • twisted.python can't import stuff from twisted.internet - twisted.internet.reactor import in callInThread is bad. Maybe something with a callInThread method could be required as an __init__ argument? This should make the tests simpler too, since they won't have to subclass/override.
    • The _compress method is a bit scary.
      • it has a lot of code inside a try/except.
      • it imports twisted.python.log
      • there's no behavior for handling errors, just logging it. Not that I have any suggestions for what to do if there's an error. Note that if a callable passed to callInThread raises an unhandled exception, it'll get logged anyway. As an aside, new calls to log.err() should probably all be required to specify something for the _why parameter so they're easier to debug.
    • overall, new docstrings could be a bit more precise in documenting their parameters/return values
    • Some Python installations may not have gzip or bz2. Kinda stupid, but I guess we should think about it.

comment:11 Changed 7 years ago by therve

(In [22186]) Manage missing gzip/bz2.

Refs #727

comment:12 in reply to: ↑ 10 Changed 7 years ago by therve

  • Cc exarkun added

Replying to exarkun:

I'm not sure why LogReader ever existed in the first place. Maybe it had something to do with buffering or something, but xreadlines was around a long time ago. It'd be nice to get rid of this and just say getLog returns a file-like object. Maybe that's not really related to this branch (although this does add two new LogReader subclasses). Anyhow, I'm not even really sure that getLog was ever actually useful for anyone or anything.

It's useful at least for one thing: tests. I don't think the behavior is really hard to support, so I would prefer to keep it.

CompressorHelper seems too clever. A mixin would work just as well here, wouldn't it? Also, mixing in object to classic hierarchies is scary (it causes problems if you ever decide to make the top new-style, which I hope we will decide to do someday).

I don't think I can use a mixin because the baseClass is different (LogFile and DailyLogFile). At least, I don't know how to do it. The object mixin should just be removed when we'll make the top new-style.

twisted.python can't import stuff from twisted.internet - twisted.internet.reactor import in callInThread is bad. Maybe something with a callInThread method could be required as an __init__ argument? This should make the tests simpler too, since they won't have to subclass/override.

OK, but I'll have to import the reactor at some point anyway, no ?

The _compress method is a bit scary.

  • it has a lot of code inside a try/except.
  • it imports twisted.python.log
  • there's no behavior for handling errors, just logging it. Not that I have any suggestions for what to do if there's an error. Note that if a callable passed to callInThread raises an unhandled exception, it'll get logged anyway. As an aside, new calls to log.err() should probably all be required to specify something for the _why parameter so they're easier to debug.

What do you suggest? Removing the log.err completely? This solution seems to solve all the above points :).

overall, new docstrings could be a bit more precise in documenting their parameters/return values

Right, I'll try to improve that.

Some Python installations may not have gzip or bz2. Kinda stupid, but I guess we should think about it.

Done, I think.

comment:13 Changed 7 years ago by therve

(In [22234]) Add some docstrings.

Refs #727

comment:14 Changed 7 years ago by therve

(In [22312]) Remove callInThread method, pass it as argument.

Refs #727

comment:15 Changed 7 years ago by therve

  • Keywords review added
  • Owner therve deleted

This is ready to review. Looking at it now, the tests are maybe a bit complicated, so I could refactor them if necessary. But that's for what reviews exist, after all :).

comment:16 Changed 6 years ago by glyph

  • Keywords review removed
  • Owner set to therve

The biggest overall comment:

  • In order to actually be useful, this needs to be both selectable by a twistd command-line option and accessible to applications; really, this implies #2571.

Other, more minor stuff:

  • This does stuff in threads. Why threads? Subprocesses are great, right? Maybe it's because we don't have a good process pool. Sounds like another dependency.
  • This adds more code that uses os.path rather than FilePath
  • Many of the general refactorings in this branch should be separated out from this feature.
  • I think there's a problem with some potential new error conditions being introduced in the compress step but I'm not sure. Threads plus file I/O plus random C APIs definitely makes me uncomfortable though :) The potential race with the os.access LBYL seems like it might be worse if you are spending more time "doing stuff" during compression. What happens when you run out of disk during compression?

comment:17 Changed 6 years ago by therve

  • Priority changed from highest to normal

comment:17 Changed 6 years ago by therve

  • Priority changed from highest to normal

comment:18 Changed 4 years ago by exarkun

  • Keywords twistd added

comment:19 Changed 3 years ago by <automation>

  • Owner therve deleted
Note: See TracTickets for help on using tickets.