mariadb / gdb / debugging shutdown deadlock / part 2

Written by
Walter Doekes
Published on 2023-08-31

We were looking at a core dump of MariaDB instance that was in a deadlocked state. It was killed with SIGABRT. We now like to get MySQL thread info from it. Python support in gdb is here to help.

To recap: we were dissecting the core dump with gdb (gdb /usr/sbin/mariadbd core.208161), found out that it was waiting on one of the threads to stop before stopping completely. We now want to know what exactly it was waiting for.

It's too late to get a SHOW PROCESSLIST now that the process is dead. But we might be able to fetch the data from the server_threads structure:

(gdb) print ((THD*)server_threads->threads->first)->proc_info
$7 = 0x559aca9152dc "Freeing items"
(gdb) print ((THD*)server_threads->threads->first->next)->proc_info
$8 = 0x559aca922c02 "committing"
(gdb) print ((THD*)server_threads->threads->first->next->next)->proc_info
$9 = 0x559acaac848a "wsrep applier committed"

Threads

There aren't terribly many threads:

(gdb) info threads
  Id   Target Id                           Frame
* 1    Thread 0x7f70061c3840 (LWP 208161)  0x00007f700639c3db in kill
  2    Thread 0x7f635435b700 (LWP 208827)  futex_wait_cancelable
  3    Thread 0x7f5109852700 (LWP 208868)  0x00007f700643623f in __GI___clock_nanosleep
  4    Thread 0x7f4fe83de700 (LWP 279027)  futex_abstimed_wait_cancelable
  5    Thread 0x7f6ffe06c700 (LWP 208425)  futex_abstimed_wait_cancelable
  6    Thread 0x7f63543f1700 (LWP 208799)  0x00007f700639d1a2 in __GI___sigtimedwait
  7    Thread 0x7f63553fd700 (LWP 208750)  futex_wait_cancelable
  8    Thread 0x7f6357fff700 (LWP 208749)  futex_abstimed_wait_cancelable
  9    Thread 0x7f6ffdc17700 (LWP 208426)  futex_wait_cancelable
  10   Thread 0x7f6ffc492700 (LWP 208430)  futex_wait_cancelable
  11   Thread 0x7f6ffc09f700 (LWP 208790)  futex_wait_cancelable
  12   Thread 0x7f63543a6700 (LWP 208826)  futex_wait_cancelable
  13   Thread 0x7f63540b8700 (LWP 208836)  futex_wait_cancelable
  14   Thread 0x7f6ffc054700 (LWP 208791)  futex_wait_cancelable
  15   Thread 0x7f510922b700 (LWP 208889)  futex_wait_cancelable
  16   Thread 0x7f6340436700 (LWP 208841)  futex_wait_cancelable
  17   Thread 0x7f4f7d31f700 (LWP 1633036) futex_abstimed_wait_cancelable
  18   Thread 0x7f6ffcc93700 (LWP 1642065) futex_abstimed_wait_cancelable

But these threads do not show what it's doing. A thread apply all bt will have info, but is very verbose and not necessarily helpful.

Let's first pursue the server_threads. But not by manually adding ->next for each thread. There must be a better way.

Enter Python

gdb comes with Python integration, and it's straight forward enough to call: python, then the code, then end with end or EOF (press ^D).

(gdb) python
>print(gdb.parse_and_eval('server_threads->threads->first'))
>end
0x7f5080046088

Use the following Python script:

thd_type = gdb.lookup_type('THD').pointer()
it = gdb.parse_and_eval('server_threads->threads->first')
last = gdb.parse_and_eval('server_threads->threads->last')
while it != last.address:
    thdp = it.cast(thd_type)
    print(it, thdp['proc_info'], thdp['query_string']['string']['str'])
    it = it['next']

Invoking that looks like this:

(gdb) python
>thd_type = gdb.lookup_type('THD').pointer()
>it = gdb.parse_and_eval('server_threads->threads->first')
>last = gdb.parse_and_eval('server_threads->threads->last')
>while it != last.address:
>    thdp = it.cast(thd_type)
>    print(it, thdp['m_thread'], thdp['proc_info'], thdp['query_string']['string']['str'])
>    it = it['next']
>end
0x7f5080046088 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT"
0x7f4fc4000c58 0x559aca922c02 "committing" 0x0
0x7f5030000c58 0x559acaac848a "wsrep applier committed" 0x0
0x7f635c002228 0x559acaac848a "wsrep applier committed" 0x0
0x7f5098000c58 0x559aca922c02 "committing" 0x0
0x7f50b8000c58 0x559aca922c02 "committing" 0x0
0x7f50bc000c58 0x559aca922c02 "committing" 0x0
0x7f6360000c58 0x559acaacd584 "wsrep aborter idle" 0x0

No mapping to the info threads yet. But after some digging in various (sub)structures, I found thread in the lock_info struct:

(gdb) x/((THD*)server_threads->threads->first)->lock_info.thread
0x7f4fe83de700: -398596352

The improved code:

thd_type = gdb.lookup_type('THD').pointer()
it = gdb.parse_and_eval('server_threads->threads->first')
last = gdb.parse_and_eval('server_threads->threads->last')
while it != last.address:
    thdp = it.cast(thd_type)
    print(
        it, 'thread', hex(thdp['lock_info']['thread']), thdp['killed'],
        thdp['proc_info'], thdp['query_string']['string']['str'])
    it = it['next']

0x7f5080046088 thread 0x7f4fe83de700 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT"
0x7f4fc4000c58 thread 0x7f510922b700 0x559aca922c02 "committing" 0x0
0x7f5030000c58 thread 0x7f5109852700 0x559acaac848a "wsrep applier committed" 0x0
0x7f635c002228 thread 0x7f6340436700 0x559acaac848a "wsrep applier committed" 0x0
0x7f5098000c58 thread 0x7f63540b8700 0x559aca922c02 "committing" 0x0
0x7f50b8000c58 thread 0x7f635435b700 0x559aca922c02 "committing" 0x0
0x7f50bc000c58 thread 0x7f63543a6700 0x559aca922c02 "committing" 0x0
0x7f6360000c58 thread 0x7f6ffc492700 0x559acaacd584 "wsrep aborter idle" 0x0

Yes. These thread_id can be mapped to the list above (manully annotated and sorted):

0x7f50b8000c58 thread [ 2] 0x559aca922c02 "committing" 0x0
0x7f5030000c58 thread [ 3] 0x559acaac848a "wsrep applier committed" 0x0
0x7f5080046088 thread [ 4] 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT"
0x7f6360000c58 thread [10] 0x559acaacd584 "wsrep aborter idle" 0x0
0x7f50bc000c58 thread [12] 0x559aca922c02 "committing" 0x0
0x7f5098000c58 thread [13] 0x559aca922c02 "committing" 0x0
0x7f4fc4000c58 thread [15] 0x559aca922c02 "committing" 0x0
0x7f635c002228 thread [16] 0x559acaac848a "wsrep applier committed" 0x0

Deadlock?

Back to the task at hand. Now that we found something akin to SHOW PROCESSLIST, it looks like Thread 4 could use some closer inspection.

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f4fe83de700 (LWP 279027))]
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f4fe83dcd80, clockid=<optimized out>, expected=0, futex_word=0x7f50080b05c8) at ../sysdeps/nptl/futex-internal.h:320
320 ../sysdeps/nptl/futex-internal.h: No such file or directory.

(gdb) bt
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f4fe83dcd80, clockid=<optimized out>, expected=0, futex_word=0x7f50080b05c8) at ../sysdeps/nptl/futex-internal.h:320
#1  __pthread_cond_wait_common (abstime=0x7f4fe83dcd80, clockid=<optimized out>, mutex=0x559acbbc4dd8, cond=0x7f50080b05a0) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=cond@entry=0x7f50080b05a0, mutex=mutex@entry=0x559acbbc4dd8, abstime=0x7f4fe83dcd80) at pthread_cond_wait.c:665
#3  0x00007f6ffde2e673 in gu_cond_timedwait_SYS (ts=0x7f4fe83dcd80, mutex=0x559acbbc4dd8, cond=<optimized out>) at ./galerautils/src/gu_threads.h:264
#4  gu::Lock::wait (this=<synthetic pointer>, date=<synthetic pointer>..., cond=...) at ./galerautils/src/gu_lock.hpp:64
#5  galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=<synthetic pointer>..., gtid=..., this=0x559acbbc4dd8) at ./galera/src/monitor.hpp:399
#6  galera::ReplicatorSMM::sync_wait (this=0x559acbbc3b30, upto=<optimized out>, tout=<optimized out>, gtid=0x0) at ./galera/src/replicator_smm.cpp:1605
#7  0x00007f6ffde0b514 in galera_sync_wait (wsrep=<optimized out>, upto=<optimized out>, tout=<optimized out>, gtid=<optimized out>) at ./galera/src/wsrep_provider.cpp:503
#8  0x0000559aca90406e in wsrep::wsrep_provider_v26::wait_for_gtid (this=<optimized out>, gtid=..., timeout=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:1056
#9  0x0000559aca6049e6 in Wsrep_server_service::set_position (this=<optimized out>, c=..., gtid=...) at ./wsrep-lib/include/wsrep/server_state.hpp:320
#10 0x0000559aca8f9753 in wsrep::transaction::release_commit_order (this=0x7f508004c5c8, lock=...) at ./wsrep-lib/include/wsrep/provider.hpp:142
#11 0x0000559aca901c6f in wsrep::transaction::after_statement (this=this@entry=0x7f508004c5c8) at ./wsrep-lib/src/transaction.cpp:895
#12 0x0000559aca8e7cb2 in wsrep::client_state::after_statement (this=this@entry=0x7f508004c560) at ./wsrep-lib/src/client_state.cpp:287
#13 0x0000559aca0d9a08 in wsrep_after_statement (thd=0x7f5080046088) at ./sql/sql_class.h:5454
#14 wsrep_after_statement (thd=0x7f5080046088) at ./sql/wsrep_trans_observer.h:443
#15 wsrep_mysql_parse (thd=0x7f5080046088, rawbuf=0x7f5080055c10 "COMMIT", length=6, parser_state=0x7f4fe83dd3e0) at ./sql/sql_parse.cc:7878
#16 0x0000559aca0e7fee in dispatch_command (command=COM_QUERY, thd=0x7f5080046088, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at ./sql/sql_class.h:1388
#17 0x0000559aca0e899e in do_command (thd=0x7f5080046088, blocking=blocking@entry=true) at ./sql/sql_parse.cc:1409
#18 0x0000559aca1fd617 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at ./sql/sql_connect.cc:1416
...

Unfortunately Thread 4 is not waiting for a lock that we can trace somewhere:

(gdb) ptype pthread_mutex_t
type = union {
    struct __pthread_mutex_s __data;
    char __size[40];
    long __align;
}

(gdb) print *(pthread_mutex_t*)0x559acbbc4dd8
$10 = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}

(gdb) print *(__pthread_mutex_s*)0x559acbbc4dd8
$11 = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}

No owner. And the pthread_cont_t argument doesn't help us either. Apparently this thread is waiting for someone to notify cond=0x7f50080b05a0.

A thread apply all bt full does not reveal 0x7f50080b05a0 being in stack scope in the vicinity. It appears that it is waiting in this bit:

wsrep_status_t galera::ReplicatorSMM::sync_wait(wsrep_gtid_t* upto,
                                                int           tout,
                                                wsrep_gtid_t* gtid)
{
    gu::GTID wait_gtid;
// ...
        // Note: Since wsrep API 26 application may request release of
        // commit monitor before the commit actually happens (commit
        // may have been ordered/queued on application side for later
        // processing). Therefore we now rely on apply_monitor on sync
        // wait. This is sufficient since apply_monitor is always released
        // only after the whole transaction is over.
        apply_monitor_.wait(wait_gtid, wait_until); // <-- HERE

        void wait(gu::GTID& gtid, const gu::datetime::Date& wait_until)
        {
            gu::Lock lock(mutex_);
            if (gtid.uuid() != uuid_)
            {
                throw gu::NotFound();
            }
            while (last_left_ < gtid.seqno())
            {
                size_t idx(indexof(gtid.seqno()));
                lock.wait(process_[idx].wait_cond_, wait_until);
            }
        }

Debug info here is scarce:

(gdb) up 5
#5  galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=<synthetic pointer>..., gtid=..., this=0x559acbbc4dd8) at ./galera/src/monitor.hpp:399

(gdb) info args
wait_until = <synthetic pointer>: <optimized out>
gtid = @0x7f4fe83dcd90: {static SEQNO_UNDEFINED = -1, uuid_ = {<gu::UUID_base> = {uuid_ = {data = "s|\344\\\352\370\021\354\217G\a\177S0\226r", alignment = 17010650953019587699}}, <No data fields>}, seqno_ = 223595135}
this = 0x559acbbc4dd8

(gdb) info locals
idx = <optimized out>
cond = <optimized out>
lock = {_vptr.Lock = <optimized out>, mtx_ = @0x559acbbc4dd8}
lock = <optimized out>
idx = <optimized out>
cond = <optimized out>

idx(indexof(gtid.seqno())) can be rewritten as idx = (gtid.seqno_ & 65535), so the lock/wait is happening on the process_ with index 51839:

(gdb) print this->process_[51839]
$12 = {obj_ = 0x7f6354359f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 2, weak count 0) = {get() = 0x7f50080b05a0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}

Scanning over all 65536 process_ entries — most are empty, like $13 — yields:

(gdb) print this->process_[51836]
$13 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_IDLE}
(gdb) print this->process_[51837]
$14 = {obj_ = 0x7f5109229f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51838]
$15 = {obj_ = 0x7f63540b6f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51839]
$16 = {obj_ = 0x7f6354359f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 2, weak count 0) = {get() = 0x7f50080b05a0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51840]
$17 = {obj_ = 0x7f4fe83dce90, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51841]
$18 = {obj_ = 0x7f63543a4f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51842]
$19 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED}
(gdb) print this->process_[51843]
$20 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED}
(gdb) print this->process_[51844]
$21 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 1, weak count 0) = {get() = 0x7f4fb008f520}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED}
(gdb) print this->process_[51845]
$22 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_IDLE}

That 0x7f50080b05a0 is indeed the pthread_cont_t that we're waiting for.

Conclusion

At this point I'm stuck. That last piece of info only confirms what we already know. No obvious reason stands out why it is waiting for something that is not happening.

Searching through the MariaDB JIRA bug tracker did not yield any candidate issues. Luckily this deadlock happened two months ago, and we haven't seen it since. But if we do see it again, we'll at least have this post as a starting point.

mariadb / gdb / debugging shutdown deadlock / part 2