Opened 2 years ago

#9738 defect new

PermissionError for lock file

Reported by: Newell Jensen Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description

MAAS uses twisted and we have ran into an issue where it seems to be that the MAAS's networking monitoring services has a lock file to ensure that only one processes updates the networking information. If the processes gets killed, the lock file stays, pointing to the PID the killed regiond process had.

Now what normally happens is that another process tries to acquire the lock, sees that the lock points to a killed PID , and recreates the lock.

This normally works, but what can happen is that the killed PID gets recycled, so that the lock now points to a PID which the maas user isn't allowed to kill. Now a PermissionError is raised, that the lock file implementation doesn't handle this case, and the networking monitoring service can never start.

Here is the traceback:

2019-10-31 08:12:13 twisted.scripts: [info] twistd 17.9.0 (/usr/bin/python3 3.6.8) starting up. 2019-10-31 08:12:13 twisted.scripts: [info] reactor class: twisted.internet.asyncioreactor.AsyncioSelectorReactor. 2019-10-31 08:12:14 -: [critical] Unhandled Error

Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/twisted/internet/task.py", line 194, in start

self()

File "/usr/lib/python3/dist-packages/twisted/internet/task.py", line 239, in call

d = defer.maybeDeferred(self.f, *self.a, self.kw)

File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 150, in maybeDeferred

result = f(*args, kw)

File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1532, in unwindGenerator

return _inlineCallbacks(None, gen, Deferred())

--- <exception caught here> ---

File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks

result = g.send(result)

File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1001, in updateInterfaces

responsible = self._assumeSoleResponsibility()

File "/usr/lib/python3/dist-packages/provisioningserver/utils/services.py", line 1077, in _assumeSoleResponsibility

self._lock.acquire()

File "/usr/lib/python3/dist-packages/provisioningserver/utils/fs.py", line 467, in acquire

if not self._fslock.lock():

File "/usr/lib/python3/dist-packages/twisted/python/lockfile.py", line 185, in lock

kill(int(pid), 0)

builtins.PermissionError: [Errno 1] Operation not permitted

As an idea, this can be reproduced locally by doing something similar (might come in handy for you when trying to debug locally):

root@maas-deb:~# service maas-regiond stop root@maas-deb:~# ls -l /var/run/lock/maas\:networks-monitoring . lrwxrwxrwx 1 maas maas 3 Oct 31 17:05 /var/run/lock/maas:networks-monitoring -> 259

.: total 0 root@maas-deb:~# ps aux | grep postgres postgres 431 0.0 0.1 320496 21308 ? S 16:38 0:00 /usr/lib/postgresql/10/bin/postgres -D /var/lib/postgresql/10/main -c config_file=/etc/postgresql/10/main/postgresql.conf postgres 468 0.0 0.8 320880 140196 ? Ss 16:38 0:01 postgres: 10/main: checkpointer process postgres 469 0.0 0.8 320496 138396 ? Ss 16:38 0:01 postgres: 10/main: writer process postgres 470 0.0 0.0 320496 8164 ? Ss 16:38 0:00 postgres: 10/main: wal writer process postgres 471 0.0 0.0 320932 4964 ? Ss 16:38 0:00 postgres: 10/main: autovacuum launcher process postgres 472 0.0 0.0 175692 3504 ? Ss 16:38 0:00 postgres: 10/main: stats collector process postgres 473 0.0 0.0 320788 4644 ? Ss 16:38 0:00 postgres: 10/main: bgworker: logical replication launcher postgres 2154 0.0 0.1 387244 19444 ? Ss 17:01 0:04 postgres: 10/main: autovacuum worker process maasdb root 2634 0.0 0.0 14852 820 ? S+ 17:06 0:00 grep --color=auto postgres

root@maas-deb:~# rm /var/run/lock/maas\:networks-monitoring root@maas-deb:~# ln -s 468 /var/run/lock/maas\:networks-monitoring root@maas-deb:~# ls -l /var/run/lock/maas\:networks-monitoring . lrwxrwxrwx 1 root root 3 Oct 31 17:07 /var/run/lock/maas:networks-monitoring -> 468

.: total 0 root@maas-deb:~# service maas-regiond start

Change History (0)

Note: See TracTickets for help on using tickets.