Thursday, 1 January 2015

Ceph - Debugging Python Code in gdb

Whilst playing with Ceph on AArch64, we encountered a problem whereby ./vstart.sh -n -d -x stalled on startup.

My first course of action was to get a process list, as lots of things are being executed:
$ pstree -p
[some output snipped]

           ├─screen(6609)─┬─bash(2204)───vstart.sh(23615)───python(23920)─┬─{python}(23934)
           │              │                                               ├─{python}(23941)
           │              │                                               ├─{python}(23942)
           │              │                                               ├─{python}(23943)
           │              │                                               ├─{python}(23944)
           │              │                                               ├─{python}(23945)
           │              │                                               ├─{python}(23946)
           │              │                                               ├─{python}(23947)
           │              │                                               ├─{python}(23948)
           │              │                                               ├─{python}(23965)
           │              │                                               ├─{python}(23966)
           │              │                                               ├─{python}(24063)
           │              │                                               └─{python}(24066)
           │              └─bash(6610)

[more output snipped]

So it looks like we have a hang in the launcher scripts.
Process 23920 is the parent process of the script that's holding up our launch, we have child threads (indicated with braces around the process names), which may also be of interest.

Let's throw a debugger at the main python process and dump a backtrace of all the threads:
$ gdb -p 23920
[...]
warning: Could not load shared library symbols for .libs/librados.so.2.
Do you need "set solib-search-path" or "set sysroot"?
[...]
(gdb) set solib-search-path /home/steve.capper/ceph/src/.libs/
Reading symbols from /home/steve.capper/ceph/src/.libs/librados.so.2...done.
Loaded symbols for /home/steve.capper/ceph/src/.libs/librados.so.2
(gdb) set pagination off
(gdb) thread apply all backtrace 10
Thread 14 (Thread 0x3ff7b23f1e0 (LWP 23934)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74003810, mutex=0x3ff74003780) at pthread_cond_wait.c:188
#1  0x000003ff7c3adcdc in ceph::log::Log::entry (this=0x3ff74003750) at log/Log.cc:345
#2  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74003750) at common/Thread.cc:61
#3  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#4  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 13 (Thread 0x3ff7a88f1e0 (LWP 23941)):
#0  0x000003ff85a6cb34 in __pthread_cond_timedwait (cond=0x3ff740216f8, mutex=0x3ff740216a0, abstime=0x3ff7a88e9a0) at pthread_cond_timedwait.c:199
#1  0x000003ff7c198b94 in WaitUntil (when=..., mutex=..., this=0x3ff740216f0) at ./common/Cond.h:71
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x3ff740216f0) at ./common/Cond.h:79
#3  CephContextServiceThread::entry (this=0x3ff74021670) at common/ceph_context.cc:58
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74021670) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 12 (Thread 0x3ff7a08f1e0 (LWP 23942)):
#0  0x000003ff8587346c in __GI___poll (fds=0x3ff7a08e890, nfds=4395833629524, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/generic/poll.c:43
#1  0x000003ff7c16358c in AdminSocket::entry (this=0x3ff74003e90) at common/admin_socket.cc:239
#2  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74003e90) at common/Thread.cc:61
#3  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#4  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 11 (Thread 0x3ff7988f1e0 (LWP 23943)):
#0  0x000003ff85a6cb34 in __pthread_cond_timedwait (cond=0x3ff740244b8, mutex=0x3ff74024450, abstime=0x3ff7988e890) at pthread_cond_timedwait.c:199
#1  0x000003ff7c17976c in WaitUntil (when=..., mutex=..., this=0x3ff740244b0) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x3ff740244a0) at common/Timer.cc:118
#3  0x000003ff7c17a0d8 in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff740271e0) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 10 (Thread 0x3ff7908f1e0 (LWP 23944)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74023a68, mutex=0x3ff74023a10) at pthread_cond_wait.c:188
#1  0x000003ff7c2a77d8 in Wait (mutex=..., this=0x3ff74023a60) at ./common/Cond.h:55
#2  DispatchQueue::entry (this=0x3ff740239f0) at msg/simple/DispatchQueue.cc:196
#3  0x000003ff7c31876c in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74023b80) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 9 (Thread 0x3ff7888f1e0 (LWP 23945)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74023c10, mutex=0x3ff74023bb8) at pthread_cond_wait.c:188
#1  0x000003ff7c2a8eec in Wait (mutex=..., this=0x3ff74023c08) at ./common/Cond.h:55
#2  DispatchQueue::run_local_delivery (this=0x3ff740239f0) at msg/simple/DispatchQueue.cc:114
#3  0x000003ff7c318860 in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:117
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74023c60) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 8 (Thread 0x3ff73fff1e0 (LWP 23946)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74023e88, mutex=0x3ff74023cd0) at pthread_cond_wait.c:188
#1  0x000003ff7c3119e8 in Wait (mutex=..., this=0x3ff74023e80) at ./common/Cond.h:55
#2  SimpleMessenger::reaper_entry (this=0x3ff74023830) at msg/simple/SimpleMessenger.cc:211
#3  0x000003ff7c318c24 in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/simple/SimpleMessenger.h:199
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74023c90) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 7 (Thread 0x3ff737ff1e0 (LWP 23947)):
#0  0x000003ff85a6cb34 in __pthread_cond_timedwait (cond=0x3ff74006810, mutex=0x3ff740067a8, abstime=0x3ff737fe890) at pthread_cond_timedwait.c:199
#1  0x000003ff7c17976c in WaitUntil (when=..., mutex=..., this=0x3ff74006808) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x3ff740067f8) at common/Timer.cc:118
#3  0x000003ff7c17a0d8 in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74027fb0) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 6 (Thread 0x3ff72fff1e0 (LWP 23948)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74006930, mutex=0x3ff740068d8) at pthread_cond_wait.c:188
#1  0x000003ff7c17ae10 in Wait (mutex=..., this=0x3ff74006928) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x3ff740068c0) at common/Finisher.cc:81
#3  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff740069e0) at common/Thread.cc:61
#4  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#5  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 5 (Thread 0x3ff724ff1e0 (LWP 23965)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74006c90, mutex=0x3ff74006be8) at pthread_cond_wait.c:188
#1  0x000003ff7c179448 in Wait (mutex=..., this=0x3ff74006c88) at common/Cond.h:55
#2  SafeTimer::timer_thread (this=0x3ff74006c78) at common/Timer.cc:116
#3  0x000003ff7c17a0d8 in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74022fe0) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 4 (Thread 0x3ff71cff1e0 (LWP 23966)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff74006dd8, mutex=0x3ff74006d80) at pthread_cond_wait.c:188
#1  0x000003ff7c17ae10 in Wait (mutex=..., this=0x3ff74006dd0) at ./common/Cond.h:55
#2  Finisher::finisher_thread_entry (this=0x3ff74006d68) at common/Finisher.cc:81
#3  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff74006e88) at common/Thread.cc:61
#4  0x000003ff85a66f30 in start_thread (arg=0x3ff7ba3f8d0) at pthread_create.c:312
#5  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 3 (Thread 0x3ff725ff1e0 (LWP 24063)):
#0  0x000003ff85a6c758 in __pthread_cond_wait (cond=0x3ff60004f08, mutex=0x3ff60004e30) at pthread_cond_wait.c:188
#1  0x000003ff7c3049c4 in Wait (mutex=..., this=0x3ff60004f00) at ./common/Cond.h:55
#2  Pipe::writer (this=0x3ff60000cd0) at msg/simple/Pipe.cc:1821
#3  0x000003ff7c30df4c in Pipe::Writer::entry (this=<optimized out>) at msg/simple/Pipe.h:62
#4  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff60000d10) at common/Thread.cc:61
#5  0x000003ff85a66f30 in start_thread (arg=0x3ff737ff8d0) at pthread_create.c:312
#6  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 2 (Thread 0x3ff726ff1e0 (LWP 24066)):
#0  0x000003ff8587346c in __GI___poll (fds=0x3ff726fe648, nfds=4395835036820, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/generic/poll.c:43
#1  0x000003ff7c2f8e5c in Pipe::tcp_read_wait (this=this@entry=0x3ff60000cd0) at msg/simple/Pipe.cc:2377
#2  0x000003ff7c2f9494 in Pipe::tcp_read (this=this@entry=0x3ff60000cd0, buf=buf@entry=0x3ff726fe86f "\377(\025", len=len@entry=1) at msg/simple/Pipe.cc:2347
#3  0x000003ff7c30a830 in Pipe::reader (this=0x3ff60000cd0) at msg/simple/Pipe.cc:1497
#4  0x000003ff7c30e5a4 in Pipe::Reader::entry (this=<optimized out>) at msg/simple/Pipe.h:50
#5  0x000003ff7c1a1b54 in Thread::entry_wrapper (this=0x3ff60000ce8) at common/Thread.cc:61
#6  0x000003ff85a66f30 in start_thread (arg=0x3ff725ff8d0) at pthread_create.c:312
#7  0x000003ff8587d910 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89

Thread 1 (Thread 0x3ff85ce4ad0 (LWP 23920)):
#0  0x000003ff85a6ebc8 in do_futex_wait (isem=isem@entry=0x260dfe70) at sem_wait.c:49
#1  0x000003ff85a6ec8c in __new_sem_wait (sem=0x260dfe70) at sem_wait.c:70
#2  0x000003ff85baf124 in PyThread_acquire_lock (lock=0x260dfe70, waitflag=1) at /usr/src/debug/Python-2.7.8/Python/thread_pthread.h:324
#3  0x000003ff85bb2f94 in lock_PyThread_acquire_lock (self=0x3ff85733eb0, args=<optimized out>) at /usr/src/debug/Python-2.7.8/Modules/threadmodule.c:52
#4  0x000003ff85b250e0 in PyCFunction_Call (func=func@entry=<built-in method acquire of thread.lock object at remote 0x3ff85733eb0>,
    arg=<optimized out>, arg@entry=(), kw=kw@entry=0x0) at /usr/src/debug/Python-2.7.8/Objects/methodobject.c:101
#5  0x000003ff85b84188 in call_function (oparg=<optimized out>, pp_stack=0x3ffc820e110) at /usr/src/debug/Python-2.7.8/Python/ceval.c:4110
#6  PyEval_EvalFrameEx (f=f@entry=Frame 0x261cc180, for file /usr/lib64/python2.7/threading.py, line 339, in wait (self=<_Condition(_Verbose__verbose=False,
    _Condition__lock=<thread.lock at remote 0x3ff85733e50>, acquire=<built-in method acquire of thread.lock object at remote 0x3ff85733e50>,
    _Condition__waiters=[<thread.lock at remote 0x3ff85733eb0>], release=<built-in method release of thread.lock object at remote 0x3ff85733e50>)
    at remote 0x3ff7e423190>, timeout=None, balancing=True, waiter=<thread.lock at remote 0x3ff85733eb0>, saved_state=None), throwflag=<optimized out>)
    at /usr/src/debug/Python-2.7.8/Python/ceval.c:2753
#7  0x000003ff85b84fcc in PyEval_EvalCodeEx (co=0x3ff856da9b0, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>,
    argcount=argcount@entry=3, kws=0x3ff7e3c93d0, kwcount=<optimized out>, defs=0x3ff7ec5d410, defcount=2, closure=0x0)
    at /usr/src/debug/Python-2.7.8/Python/ceval.c:3342
#8  0x000003ff85b83f70 in fast_function (nk=<optimized out>, na=3, n=<optimized out>, pp_stack=0x3ffc820e330, func=<optimized out>)
    at /usr/src/debug/Python-2.7.8/Python/ceval.c:4206
#9  call_function (oparg=<optimized out>, pp_stack=0x3ffc820e330) at /usr/src/debug/Python-2.7.8/Python/ceval.c:4131
(More stack frames follow...)

We have one thread executing python code, and a series of service style threads. We can see from the backtrace of thread #1, that some python code is attempting to acquire a lock and waiting.
Thankfully, the python folk have quite a nice set of gdb extensions (https://docs.python.org/devguide/gdb.html) which we can use to delve deeper. I find the py-bt, py-up, py-down and py-list commands to be particularly useful, and that is what I used to delve deeper. Let's take a python backtrace (I've snipped out most of the arguments, the below is simplified).
(gdb) py-bt
#6 Frame 0x261cc180, for file /usr/lib64/python2.7/threading.py, line 339, in wait
#10 Frame 0x3ff7e3c9228, for file /usr/lib64/python2.7/threading.py, line 623, in wait
#14 Frame 0x3ff7e437b90, for file /usr/lib64/python2.7/threading.py, line 753, in start 
#17 Frame 0x3ff7e3c8248, for file /home/steve.capper/ceph/src/pybind/rados.py, line 158, in run_in_thread
#21 Frame 0x3ff7e3cb590, for file /home/steve.capper/ceph/src/pybind/rados.py, line 256, in shutdown
#24 Frame 0x3ff7ec648c0, for file /home/steve.capper/ceph/src/pybind/rados.py, line 268, in __del__ 
So we appear to be hanging in python threading code, let's navigate up to the last Ceph code that gets executed:
(gdb) py-up
#10 Frame 0x3ff7e3c9228, for file /usr/lib64/python2.7/threading.py, line 623, in wait [...]
(gdb) py-up
#14 Frame 0x3ff7e437b90, for file /usr/lib64/python2.7/threading.py, line 753, in start [...]
(gdb) py-up
#17 Frame 0x3ff7e3c8248, for file /home/steve.capper/ceph/src/pybind/rados.py, line 158, in run_in_thread [...]
(gdb) py-list
 153        # allow the main thread to exit (presumably, avoid a join() on this
 154        # subthread) before this thread terminates.  This allows SIGINT
 155        # exit of a blocked call.  See below.
 156        t.daemon = True
 157    
>158        t.start()
 159        try:
 160            # poll for thread exit
 161            while t.is_alive():
 162                t.join(POLL_TIME_INCR)
 163                if timeout and t.is_alive():
Because, I'm too lazy to leave gdb, let's expand our python list within gdb:
(gdb) py-list 134,160
 134    
 135    class RadosThread(threading.Thread):
 136        def __init__(self, target, args=None):
 137            self.args = args
 138            self.target = target
 139            threading.Thread.__init__(self)
 140    
 141        def run(self):
 142            self.retval = self.target(*self.args)
 143    
 144    # time in seconds between each call to t.join() for child thread
 145    POLL_TIME_INCR = 0.5
 146    
 147    def run_in_thread(target, args, timeout=0):
 148        interrupt = False
 149    
 150        countdown = timeout
 151        t = RadosThread(target, args)
 152    
 153        # allow the main thread to exit (presumably, avoid a join() on this
 154        # subthread) before this thread terminates.  This allows SIGINT
 155        # exit of a blocked call.  See below.
 156        t.daemon = True
 157    
>158        t.start()
 159        try:
 160            # poll for thread exit
The code in rados.py looked okay to me, let's focus on threading.py.
(gdb) py-down
#14 Frame 0x3ff7e437b90, for file /usr/lib64/python2.7/threading.py, line 753, in start [...]
(gdb) py-list
 748                _start_new_thread(self.__bootstrap, ())
 749            except Exception:
 750                with _active_limbo_lock:
 751                    del _limbo[self]
 752                raise
>753            self.__started.wait()
 754    
 755        def run(self):
 756            """Method representing the thread's activity.
 757    
 758            You may override this method in a subclass. The standard run() method
(gdb) py-down
#10 Frame 0x3ff7e3c9228, for file /usr/lib64/python2.7/threading.py, line 623, in wait [...]
(gdb) py-list
 618    
 619            """
 620            self.__cond.acquire()
 621            try:
 622                if not self.__flag:
>623                    self.__cond.wait(timeout, balancing)
 624                return self.__flag
 625            finally:
 626                self.__cond.release()
 627    
 628    # Helper to generate new thread names
(gdb) py-down
#6 Frame 0x261cc180, for file /usr/lib64/python2.7/threading.py, line 339, in wait [...]
(gdb) py-list
 334            waiter.acquire()
 335            self.__waiters.append(waiter)
 336            saved_state = self._release_save()
 337            try:    # restore state no matter what (e.g., KeyboardInterrupt)
 338                if timeout is None:
>339                    waiter.acquire()
 340                    if __debug__:
 341                        self._note("%s.wait(): got it", self)
 342                else:
 343                    # Balancing act:  We can't afford a pure busy loop, so we
 344                    # have to sleep; but if we sleep the whole timeout time,
Okay, so the Python code is waiting for some built-in code, let's go back to C.
(gdb) down
#5  0x000003ff85b84188 in call_function (oparg=<optimized out>, pp_stack=0x3ffc820e110) at /usr/src/debug/Python-2.7.8/Python/ceval.c:4110
4110             C_TRACE(x, PyCFunction_Call(func,callargs,NULL));
(gdb) down
#4  0x000003ff85b250e0 in PyCFunction_Call (func=func@entry=<built-in method acquire of thread.lock object at remote 0x3ff85733eb0>,
    arg=<optimized out>, arg@entry=(), kw=kw@entry=0x0)
    at /usr/src/debug/Python-2.7.8/Objects/methodobject.c:101
101                 return (*meth)(self, PyTuple_GET_ITEM(arg, 0));
(gdb) down
#3  0x000003ff85bb2f94 in lock_PyThread_acquire_lock (self=0x3ff85733eb0, args=<optimized out>) at /usr/src/debug/Python-2.7.8/Modules/threadmodule.c:52
52     i = PyThread_acquire_lock(self->lock_lock, i);
(gdb) down
#2  0x000003ff85baf124 in PyThread_acquire_lock (lock=0x260dfe70, waitflag=1) at /usr/src/debug/Python-2.7.8/Python/thread_pthread.h:324
324             status = fix_status(sem_wait(thelock));
(gdb) down
#1  0x000003ff85a6ec8c in __new_sem_wait (sem=0x260dfe70) at sem_wait.c:70
70       err = do_futex_wait(isem);
(gdb) p isem
$1 = (struct new_sem *) 0x260dfe70
(gdb) p *isem
$2 = {value = 0, private = 128, nwaiters = 1} 
So we have a semaphore which is being waited on by only a single waiter and has a value of zero. It looked to me like the semaphore either had a spurious additional waiter or was incorrectly initialised with a zero value.

At this point, I was confident that we were dealing with a python library issue rather than a Ceph issue. Indeed, running the same Ceph code on a different distro (with a different python version), gave no stalls on vstart. Performing some searching led me to the following:
http://bugs.python.org/issue21963

This issue has been fixed in two places:
  1. The Ceph source code was changed to not spawn threads on Python shutdown. (Commit 5ba9b8f rados.py: remove Rados.__del__(); it just causes problems).
  2. The Python code was changed s.t. spawning a thread on shutdown didn't result in a hang. This change made it into 2.7.9 (2.7.8 was affected).