[Twisted-Python] Where to start: log reader/analysis

Jean-Paul Calderone exarkun at divmod.com
Mon Aug 6 05:49:53 MDT 2007


On Mon, 6 Aug 2007 10:57:19 +0200, Yoann Aubineau <yoann.aubineau at wengo.com> wrote:
>Hi Andrew,
>
>I wrote a class that follows a file (eg. log file) and provides an iterator
>to walk through it. Don't know if it may be of any use for you (or others).

Hi Yoann, thanks for sharing.

>
>class FileFollower(object):
>    """Iterate through a file while it is updated.
>
>    >>> file = FileFollower("/tmp/testfile")
>    >>> file.interval = 5
>    >>> for line in file:
>    ...     print line
>    """
>
>    interval = 1
>
>    def __init__(self, filename, interval=None):
>        self.filename = filename
>        self.interval = interval or self.interval
>        self.stat = None
>        self.offset = 0
>        self.lines = []
>        self.running = True
>
>    #
>    # File following
>
>    def follow(self):
>        while self.running:
>            if self.hasChanged():
>                data = self.readChange()
>                if data:
>                    self.dataReceived(data)
>                    break
>            time.sleep(self.interval)
>
>    def hasChanged(self):
>        stat = os.stat(self.filename)
>        if stat != self.stat:
>            self.stat = stat
>            return True
>        return False
>
>    def readChange(self):
>        file = open(self.filename)
>        file.seek(self.offset)
>        data = file.read()
>        self.offset = file.tell()
>        file.close()
>        return data
>
>    #
>    # Data buffering
>
>    def dataReceived(self, data):
>        lines = data.split(os.linesep)
>        lines = lines[:-1]
>        for line in lines:
>            self.lineReceived(line)
>
>    def lineReceived(self, line):
>        self.lines.append(line)
>
>    #
>    # Iterator implementation
>
>    def __iter__(self):
>        return self
>
>    def next(self):
>        if not self.lines:
>            self.follow()
>        line = self.lines.pop(0)
>        return line
>

In order to make this class more usable within a Twisted application, I'd
make a few suggestions:

Separate the transport from the protocol.  All of the methods in the area
commented "file following" are basically transport methods: they know how
to get the underlying bytes (by polling and eventually reading).  The
protocol implementation is basically the dataReceived and lineReceived
methods.  With separation between the transport and the protocol, you
don't even need to implement these, since you can just use LineReceiver
from twisted.protocols.basic.

Do the polling in a cooperative way.  Using an infinite for loop and a
time.sleep call has the consequence of tying up an entire thread.  This
means nothing else can happen unless you run the follow method of this
class in a new, dedicated thread.  If you use the reactor to schedule
the checks instead, then this can be used alongside other Twisted code
without having to deal with threading.  twisted.internet.task.LoopingCall
might be of particular interest.

Jean-Paul




More information about the Twisted-Python mailing list