Opened 3 years ago

Last modified 3 years ago

#5551 defect new

struct.error: argument out of range for 4-bytes integer format

Reported by: pigmej Owned by:
Priority: normal Milestone:
Component: names Keywords:
Cc: Branch:
Author: Launchpad Bug:

Description

I have custom Names backend everything works etc, but after "some hours" after the start some domains seems to stop working. In logs I can see that traceback (see above).

2012-03-14 10:02:33+0000 [DNSDatagramProtocol (UDP)] Unhandled Error
        Traceback (most recent call last):
          File "/opt/dns/site-packages/twisted/names/server.py", line 192, in messageReceived
            self.handleQuery(message, proto, address)
          File "/opt/dns/site-packages/twisted/names/server.py", line 137, in handleQuery
            self.gotResolverResponse, protocol, message, address
          File "/opt/dns/site-packages/twisted/internet/defer.py", line 301, in addCallback
            callbackKeywords=kw)
          File "/opt/dns/site-packages/twisted/internet/defer.py", line 290, in addCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/opt/dns/site-packages/twisted/internet/defer.py", line 551, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/opt/dns/site-packages/twisted/names/server.py", line 107, in gotResolverResponse
            self.sendReply(protocol, message, address)
          File "/opt/dns/site-packages/twisted/names/server.py", line 92, in sendReply
            protocol.writeMessage(message, address)
          File "/opt/dns/site-packages/twisted/names/dns.py", line 1804, in writeMessage
            self.transport.write(message.toStr(), address)
          File "/opt/dns/site-packages/twisted/names/dns.py", line 1686, in toStr
            self.encode(strio)
          File "/opt/dns/site-packages/twisted/names/dns.py", line 1587, in encode
            q.encode(body_tmp, compDict)
          File "/opt/dns/site-packages/twisted/names/dns.py", line 497, in encode
            strio.write(struct.pack(self.fmt, self.type, self.cls, self.ttl, 0))
        struct.error: argument out of range for 4-bytes integer format

I checked the backend and it always returns the same values, that's the part where I'm building the A answer:

                defer.returnValue([
                    (
                        dns.RRHeader(hostname,
                                     dns.A,
                                     dns.IN,
                                     self.ttl,
                                     dns.Record_A(record['content'],
                                                  self.ttl)
                                 ),
                    ),
                    (),
                    ()
                ]
                )

When I restart the names server everything starts work again.

Change History (12)

comment:1 Changed 3 years ago by exarkun

What's the value of self.ttl?

comment:2 Changed 3 years ago by pigmej

self.ttl = 5

comment:3 Changed 3 years ago by exarkun

Have you ruled out hardware failure? I don't see how struct.pack could be failing like this if the values being passed to it are the expected values. A failing memory stick could cause valid values to magically transform into invalid values. A run of memtest86+ on this host might be illuminating.

comment:4 Changed 3 years ago by pigmej

Yes I did memtest86+ on it. No failures.

comment:5 Changed 3 years ago by exarkun

Okay. I'm out of guesses, then. If you can provide an example that reproduces this problem, it would be very helpful. Without that, I'm not sure if there's any way to proceed. Alternatively, if you can debug the problem further and track down the cause, relating that experience and what you learn from it in another comment on this ticket might be enough to let someone else fix the problem.

As another datapoint, I'm running a Twisted Names DNS server and I never encounter this error.

comment:6 Changed 3 years ago by pigmej

I added debug code... but since then Names won't crash...

So we have to wait, then I will provide the failing case I hope.

comment:7 Changed 3 years ago by pigmej

Ok,

It finally crashed again :)

So when the working DNS entries has
!HHIH 1 1 1 0 as value
the not working one has
!HHIH 1 1 -90356.4524617 0
or
!HHIH 1 1 -90386.4629548 0

As you can see it's wrong.

That's how I collected those values:

    def encode(self, strio, compDict=None):
        self.name.encode(strio, compDict)
        print self.fmt, self.type, self.cls, self.ttl, 0
        strio.write(struct.pack(self.fmt, self.type, self.cls, self.ttl, 0))

And it seems that it's self.ttl, the prolbem is... it's always the same value.

And again, restarting the Names works fine.

For me it seems that after some number of requests it starts to fail. I have no idea why, but again that self.ttl is hardcoded in code.

comment:8 Changed 3 years ago by exarkun

Perhaps you can make the ttl attribute in your code a property (if it is an attribute of a new-style class):

    _ttl = 5

    def ttl_get(self):
        return self._ttl

    def ttl_set(self, value):
        import traceback
        print 'Changing ttl to', value, 'at:'
        traceback.print_stack()
        self._ttl = value

    ttl = property(ttl_get, ttl_set)

This should report the call stack at the point when the ttl value is set out of bounds. If you have a lot of these objects, it might spam your startup logs quite a bit, but hopefully after that it will only report problem areas.

Regardless of where this bug ends up being, a change that perhaps we could make to twisted.names is to have it report out-of-bounds values more clearly than it does now.

comment:9 Changed 3 years ago by pigmej

I added it yesterday, so we have to wait now again. I will post info there when it will crash again.

comment:10 Changed 3 years ago by jonathanj

Are you using CacheResolver when these tracebacks occur? I'm trying to decide if #5579 is a duplicate of this bug.

comment:11 Changed 3 years ago by pigmej

Yes CacheResolver is present. But I'm not 100% sure if it's the reason (might be)

comment:12 Changed 3 years ago by pigmej

Ok, It seems that the original self.ttl is never changed but the overflowed values appears.

Note: See TracTickets for help on using tickets.