Ticket #727 enhancement new

Opened 10 years ago

Last modified 4 years ago

Compression for rotated log files

Reported by: itamarst Owned by:
Priority: normal Milestone:
Component: core Keywords: twistd
Cc: itamarst, djk121, therve, exarkun Branch: branches/compressed-logfiles-727
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description


Attachments

727_logfile.diff Download (3.8 KB) - added by therve 8 years ago.

Change History

Changed 8 years ago by therve

1

  Changed 8 years ago by therve

  • cc therve added

I tried a simple implementation of this problem. Do not use this, it's just a sample of what can be done. I guess I have to use producer/consumer to create the compressed file.

TODO:

  • use consumer/producer
  • compress DailyLogFile

2

  Changed 6 years ago by therve

  • branch set to branches/compressed-logfiles-727
  • author set to therve

(In [22073]) Branching to 'compressed-logfiles-727'

3

  Changed 6 years ago by therve

(In [22074]) Add compressed version of LogFile and DailyLogFile, for gzip and bz2.

Refs #727

4

  Changed 6 years ago by therve

  • owner changed from itamarst to therve

5

  Changed 6 years ago by therve

(In [22084]) Try to update both log file and compressed version.

Unfortunately, it doesn't really work for bz2.

Refs #727

6

  Changed 6 years ago by therve

(In [22086]) Partial revert of r22084: I'll try something else.

Refs #727

7

  Changed 6 years ago by therve

(In [22087]) Use callInThread to call the compression.

Refs #727

8

  Changed 6 years ago by therve

(In [22088]) Make some cleanups

Refs #727

9

  Changed 6 years ago by therve

  • owner therve deleted
  • priority changed from normal to highest
  • keywords review added

This is ready to review.

10

follow-up: ↓ 12   Changed 6 years ago by exarkun

  • keywords review removed
  • owner set to therve
  • twisted/python/logfile.py
    • I'm not sure why LogReader ever existed in the first place. Maybe it had something to do with buffering or something, but xreadlines was around a long time ago. It'd be nice to get rid of this and just say getLog returns a file-like object. Maybe that's not really related to this branch (although this does add two new LogReader subclasses). Anyhow, I'm not even really sure that getLog was ever actually useful for anyone or anything.
    • CompressorHelper seems too clever. A mixin would work just as well here, wouldn't it? Also, mixing in object to classic hierarchies is scary (it causes problems if you ever decide to make the top new-style, which I hope we will decide to do someday).
    • twisted.python can't import stuff from twisted.internet - twisted.internet.reactor import in callInThread is bad. Maybe something with a callInThread method could be required as an __init__ argument? This should make the tests simpler too, since they won't have to subclass/override.
    • The _compress method is a bit scary.
      • it has a lot of code inside a try/except.
      • it imports twisted.python.log
      • there's no behavior for handling errors, just logging it. Not that I have any suggestions for what to do if there's an error. Note that if a callable passed to callInThread raises an unhandled exception, it'll get logged anyway. As an aside, new calls to log.err() should probably all be required to specify something for the _why parameter so they're easier to debug.
    • overall, new docstrings could be a bit more precise in documenting their parameters/return values
    • Some Python installations may not have gzip or bz2. Kinda stupid, but I guess we should think about it.

11

  Changed 6 years ago by therve

(In [22186]) Manage missing gzip/bz2.

Refs #727

12

in reply to: ↑ 10   Changed 6 years ago by therve

  • cc exarkun added

Replying to exarkun:

I'm not sure why LogReader ever existed in the first place. Maybe it had something to do with buffering or something, but xreadlines was around a long time ago. It'd be nice to get rid of this and just say getLog returns a file-like object. Maybe that's not really related to this branch (although this does add two new LogReader subclasses). Anyhow, I'm not even really sure that getLog was ever actually useful for anyone or anything.

It's useful at least for one thing: tests. I don't think the behavior is really hard to support, so I would prefer to keep it.

CompressorHelper seems too clever. A mixin would work just as well here, wouldn't it? Also, mixing in object to classic hierarchies is scary (it causes problems if you ever decide to make the top new-style, which I hope we will decide to do someday).

I don't think I can use a mixin because the baseClass is different (LogFile and DailyLogFile). At least, I don't know how to do it. The object mixin should just be removed when we'll make the top new-style.

twisted.python can't import stuff from twisted.internet - twisted.internet.reactor import in callInThread is bad. Maybe something with a callInThread method could be required as an __init__ argument? This should make the tests simpler too, since they won't have to subclass/override.

OK, but I'll have to import the reactor at some point anyway, no ?

The _compress method is a bit scary. * it has a lot of code inside a try/except. * it imports twisted.python.log * there's no behavior for handling errors, just logging it. Not that I have any suggestions for what to do if there's an error. Note that if a callable passed to callInThread raises an unhandled exception, it'll get logged anyway. As an aside, new calls to log.err() should probably all be required to specify something for the _why parameter so they're easier to debug.

What do you suggest? Removing the log.err completely? This solution seems to solve all the above points :).

overall, new docstrings could be a bit more precise in documenting their parameters/return values

Right, I'll try to improve that.

Some Python installations may not have gzip or bz2. Kinda stupid, but I guess we should think about it.

Done, I think.

13

  Changed 6 years ago by therve

(In [22234]) Add some docstrings.

Refs #727

14

  Changed 6 years ago by therve

(In [22312]) Remove callInThread method, pass it as argument.

Refs #727

15

  Changed 6 years ago by therve

  • owner therve deleted
  • keywords review added

This is ready to review. Looking at it now, the tests are maybe a bit complicated, so I could refactor them if necessary. But that's for what reviews exist, after all :).

16

  Changed 6 years ago by glyph

  • keywords review removed
  • owner set to therve

The biggest overall comment:

  • In order to actually be useful, this needs to be both selectable by a twistd command-line option and accessible to applications; really, this implies #2571.

Other, more minor stuff:

  • This does stuff in threads. Why threads? Subprocesses are great, right? Maybe it's because we don't have a good process pool. Sounds like another dependency.
  • This adds more code that uses os.path rather than FilePath
  • Many of the general refactorings in this branch should be separated out from this feature.
  • I think there's a problem with some potential new error conditions being introduced in the compress step but I'm not sure. Threads plus file I/O plus random C APIs definitely makes me uncomfortable though :) The potential race with the os.access LBYL seems like it might be worse if you are spending more time "doing stuff" during compression. What happens when you run out of disk during compression?

17

  Changed 6 years ago by therve

  • priority changed from highest to normal

17

  Changed 6 years ago by therve

  • priority changed from highest to normal

18

  Changed 4 years ago by exarkun

  • keywords twistd added

20

  Changed 3 years ago by <automation>

  • owner therve deleted
Note: See TracTickets for help on using tickets.