mariadb / gdb / debugging shutdown deadlock / part 2
We were looking at a core dump of MariaDB instance that was in a
deadlocked state. It was killed with SIGABRT
. We now like to get MySQL
thread info from it. Python support in gdb is here to help.
To recap: we were dissecting the core dump with
gdb
(gdb /usr/sbin/mariadbd core.208161
), found out that it was waiting
on one of the threads to stop before stopping completely. We now want to
know what exactly it was waiting for.
It's too late to get a SHOW PROCESSLIST
now that the process is dead.
But we might be able to fetch the data from the server_threads
structure:
(gdb) print ((THD*)server_threads->threads->first)->proc_info
$7 = 0x559aca9152dc "Freeing items"
(gdb) print ((THD*)server_threads->threads->first->next)->proc_info
$8 = 0x559aca922c02 "committing"
(gdb) print ((THD*)server_threads->threads->first->next->next)->proc_info
$9 = 0x559acaac848a "wsrep applier committed"
Threads
There aren't terribly many threads:
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f70061c3840 (LWP 208161) 0x00007f700639c3db in kill
2 Thread 0x7f635435b700 (LWP 208827) futex_wait_cancelable
3 Thread 0x7f5109852700 (LWP 208868) 0x00007f700643623f in __GI___clock_nanosleep
4 Thread 0x7f4fe83de700 (LWP 279027) futex_abstimed_wait_cancelable
5 Thread 0x7f6ffe06c700 (LWP 208425) futex_abstimed_wait_cancelable
6 Thread 0x7f63543f1700 (LWP 208799) 0x00007f700639d1a2 in __GI___sigtimedwait
7 Thread 0x7f63553fd700 (LWP 208750) futex_wait_cancelable
8 Thread 0x7f6357fff700 (LWP 208749) futex_abstimed_wait_cancelable
9 Thread 0x7f6ffdc17700 (LWP 208426) futex_wait_cancelable
10 Thread 0x7f6ffc492700 (LWP 208430) futex_wait_cancelable
11 Thread 0x7f6ffc09f700 (LWP 208790) futex_wait_cancelable
12 Thread 0x7f63543a6700 (LWP 208826) futex_wait_cancelable
13 Thread 0x7f63540b8700 (LWP 208836) futex_wait_cancelable
14 Thread 0x7f6ffc054700 (LWP 208791) futex_wait_cancelable
15 Thread 0x7f510922b700 (LWP 208889) futex_wait_cancelable
16 Thread 0x7f6340436700 (LWP 208841) futex_wait_cancelable
17 Thread 0x7f4f7d31f700 (LWP 1633036) futex_abstimed_wait_cancelable
18 Thread 0x7f6ffcc93700 (LWP 1642065) futex_abstimed_wait_cancelable
But these threads do not show what it's doing. A
thread apply all bt
will have info, but is very verbose and not
necessarily helpful.
Let's first pursue the server_threads
. But not by manually adding
->next
for each thread. There must be a better way.
Enter Python
gdb comes with Python integration, and it's straight forward enough
to call: python
, then the code, then end with end
or EOF (press
^D
).
(gdb) python
>print(gdb.parse_and_eval('server_threads->threads->first'))
>end
0x7f5080046088
Use the following Python script:
thd_type = gdb.lookup_type('THD').pointer()
it = gdb.parse_and_eval('server_threads->threads->first')
last = gdb.parse_and_eval('server_threads->threads->last')
while it != last.address:
thdp = it.cast(thd_type)
print(it, thdp['proc_info'], thdp['query_string']['string']['str'])
it = it['next']
Invoking that looks like this:
(gdb) python
>thd_type = gdb.lookup_type('THD').pointer()
>it = gdb.parse_and_eval('server_threads->threads->first')
>last = gdb.parse_and_eval('server_threads->threads->last')
>while it != last.address:
> thdp = it.cast(thd_type)
> print(it, thdp['m_thread'], thdp['proc_info'], thdp['query_string']['string']['str'])
> it = it['next']
>end
0x7f5080046088 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT"
0x7f4fc4000c58 0x559aca922c02 "committing" 0x0
0x7f5030000c58 0x559acaac848a "wsrep applier committed" 0x0
0x7f635c002228 0x559acaac848a "wsrep applier committed" 0x0
0x7f5098000c58 0x559aca922c02 "committing" 0x0
0x7f50b8000c58 0x559aca922c02 "committing" 0x0
0x7f50bc000c58 0x559aca922c02 "committing" 0x0
0x7f6360000c58 0x559acaacd584 "wsrep aborter idle" 0x0
No mapping to the info threads
yet. But after some digging in various
(sub)structures, I found thread
in the lock_info
struct:
(gdb) x/((THD*)server_threads->threads->first)->lock_info.thread
0x7f4fe83de700: -398596352
The improved code:
thd_type = gdb.lookup_type('THD').pointer()
it = gdb.parse_and_eval('server_threads->threads->first')
last = gdb.parse_and_eval('server_threads->threads->last')
while it != last.address:
thdp = it.cast(thd_type)
print(
it, 'thread', hex(thdp['lock_info']['thread']), thdp['killed'],
thdp['proc_info'], thdp['query_string']['string']['str'])
it = it['next']
0x7f5080046088 thread 0x7f4fe83de700 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT"
0x7f4fc4000c58 thread 0x7f510922b700 0x559aca922c02 "committing" 0x0
0x7f5030000c58 thread 0x7f5109852700 0x559acaac848a "wsrep applier committed" 0x0
0x7f635c002228 thread 0x7f6340436700 0x559acaac848a "wsrep applier committed" 0x0
0x7f5098000c58 thread 0x7f63540b8700 0x559aca922c02 "committing" 0x0
0x7f50b8000c58 thread 0x7f635435b700 0x559aca922c02 "committing" 0x0
0x7f50bc000c58 thread 0x7f63543a6700 0x559aca922c02 "committing" 0x0
0x7f6360000c58 thread 0x7f6ffc492700 0x559acaacd584 "wsrep aborter idle" 0x0
Yes. These thread_id
can be mapped to the list above (manully
annotated and sorted):
0x7f50b8000c58 thread [ 2] 0x559aca922c02 "committing" 0x0
0x7f5030000c58 thread [ 3] 0x559acaac848a "wsrep applier committed" 0x0
0x7f5080046088 thread [ 4] 0x559aca9152dc "Freeing items" 0x7f5080055c10 "COMMIT"
0x7f6360000c58 thread [10] 0x559acaacd584 "wsrep aborter idle" 0x0
0x7f50bc000c58 thread [12] 0x559aca922c02 "committing" 0x0
0x7f5098000c58 thread [13] 0x559aca922c02 "committing" 0x0
0x7f4fc4000c58 thread [15] 0x559aca922c02 "committing" 0x0
0x7f635c002228 thread [16] 0x559acaac848a "wsrep applier committed" 0x0
Deadlock?
Back to the task at hand. Now that we found something akin to
SHOW PROCESSLIST
, it looks like Thread 4 could use some closer
inspection.
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f4fe83de700 (LWP 279027))]
#0 futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f4fe83dcd80, clockid=<optimized out>, expected=0, futex_word=0x7f50080b05c8) at ../sysdeps/nptl/futex-internal.h:320
320 ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0 futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f4fe83dcd80, clockid=<optimized out>, expected=0, futex_word=0x7f50080b05c8) at ../sysdeps/nptl/futex-internal.h:320
#1 __pthread_cond_wait_common (abstime=0x7f4fe83dcd80, clockid=<optimized out>, mutex=0x559acbbc4dd8, cond=0x7f50080b05a0) at pthread_cond_wait.c:520
#2 __pthread_cond_timedwait (cond=cond@entry=0x7f50080b05a0, mutex=mutex@entry=0x559acbbc4dd8, abstime=0x7f4fe83dcd80) at pthread_cond_wait.c:665
#3 0x00007f6ffde2e673 in gu_cond_timedwait_SYS (ts=0x7f4fe83dcd80, mutex=0x559acbbc4dd8, cond=<optimized out>) at ./galerautils/src/gu_threads.h:264
#4 gu::Lock::wait (this=<synthetic pointer>, date=<synthetic pointer>..., cond=...) at ./galerautils/src/gu_lock.hpp:64
#5 galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=<synthetic pointer>..., gtid=..., this=0x559acbbc4dd8) at ./galera/src/monitor.hpp:399
#6 galera::ReplicatorSMM::sync_wait (this=0x559acbbc3b30, upto=<optimized out>, tout=<optimized out>, gtid=0x0) at ./galera/src/replicator_smm.cpp:1605
#7 0x00007f6ffde0b514 in galera_sync_wait (wsrep=<optimized out>, upto=<optimized out>, tout=<optimized out>, gtid=<optimized out>) at ./galera/src/wsrep_provider.cpp:503
#8 0x0000559aca90406e in wsrep::wsrep_provider_v26::wait_for_gtid (this=<optimized out>, gtid=..., timeout=<optimized out>) at ./wsrep-lib/src/wsrep_provider_v26.cpp:1056
#9 0x0000559aca6049e6 in Wsrep_server_service::set_position (this=<optimized out>, c=..., gtid=...) at ./wsrep-lib/include/wsrep/server_state.hpp:320
#10 0x0000559aca8f9753 in wsrep::transaction::release_commit_order (this=0x7f508004c5c8, lock=...) at ./wsrep-lib/include/wsrep/provider.hpp:142
#11 0x0000559aca901c6f in wsrep::transaction::after_statement (this=this@entry=0x7f508004c5c8) at ./wsrep-lib/src/transaction.cpp:895
#12 0x0000559aca8e7cb2 in wsrep::client_state::after_statement (this=this@entry=0x7f508004c560) at ./wsrep-lib/src/client_state.cpp:287
#13 0x0000559aca0d9a08 in wsrep_after_statement (thd=0x7f5080046088) at ./sql/sql_class.h:5454
#14 wsrep_after_statement (thd=0x7f5080046088) at ./sql/wsrep_trans_observer.h:443
#15 wsrep_mysql_parse (thd=0x7f5080046088, rawbuf=0x7f5080055c10 "COMMIT", length=6, parser_state=0x7f4fe83dd3e0) at ./sql/sql_parse.cc:7878
#16 0x0000559aca0e7fee in dispatch_command (command=COM_QUERY, thd=0x7f5080046088, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at ./sql/sql_class.h:1388
#17 0x0000559aca0e899e in do_command (thd=0x7f5080046088, blocking=blocking@entry=true) at ./sql/sql_parse.cc:1409
#18 0x0000559aca1fd617 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at ./sql/sql_connect.cc:1416
...
Unfortunately Thread 4 is not waiting for a lock that we can trace somewhere:
(gdb) ptype pthread_mutex_t
type = union {
struct __pthread_mutex_s __data;
char __size[40];
long __align;
}
(gdb) print *(pthread_mutex_t*)0x559acbbc4dd8
$10 = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
(gdb) print *(__pthread_mutex_s*)0x559acbbc4dd8
$11 = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
No owner
. And the pthread_cont_t
argument doesn't help us either.
Apparently this thread is waiting for someone to notify
cond=0x7f50080b05a0
.
A thread apply all bt full
does not reveal 0x7f50080b05a0
being in
stack scope in the vicinity. It appears that it is waiting in this bit:
wsrep_status_t galera::ReplicatorSMM::sync_wait(wsrep_gtid_t* upto,
int tout,
wsrep_gtid_t* gtid)
{
gu::GTID wait_gtid;
// ...
// Note: Since wsrep API 26 application may request release of
// commit monitor before the commit actually happens (commit
// may have been ordered/queued on application side for later
// processing). Therefore we now rely on apply_monitor on sync
// wait. This is sufficient since apply_monitor is always released
// only after the whole transaction is over.
apply_monitor_.wait(wait_gtid, wait_until); // <-- HERE
void wait(gu::GTID& gtid, const gu::datetime::Date& wait_until)
{
gu::Lock lock(mutex_);
if (gtid.uuid() != uuid_)
{
throw gu::NotFound();
}
while (last_left_ < gtid.seqno())
{
size_t idx(indexof(gtid.seqno()));
lock.wait(process_[idx].wait_cond_, wait_until);
}
}
Debug info here is scarce:
(gdb) up 5
#5 galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::wait (wait_until=<synthetic pointer>..., gtid=..., this=0x559acbbc4dd8) at ./galera/src/monitor.hpp:399
(gdb) info args
wait_until = <synthetic pointer>: <optimized out>
gtid = @0x7f4fe83dcd90: {static SEQNO_UNDEFINED = -1, uuid_ = {<gu::UUID_base> = {uuid_ = {data = "s|\344\\\352\370\021\354\217G\a\177S0\226r", alignment = 17010650953019587699}}, <No data fields>}, seqno_ = 223595135}
this = 0x559acbbc4dd8
(gdb) info locals
idx = <optimized out>
cond = <optimized out>
lock = {_vptr.Lock = <optimized out>, mtx_ = @0x559acbbc4dd8}
lock = <optimized out>
idx = <optimized out>
cond = <optimized out>
idx(indexof(gtid.seqno()))
can be rewritten as
idx = (gtid.seqno_ & 65535)
, so the lock/wait is happening on the
process_
with index 51839:
(gdb) print this->process_[51839]
$12 = {obj_ = 0x7f6354359f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 2, weak count 0) = {get() = 0x7f50080b05a0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
Scanning over all 65536 process_
entries — most are empty, like
$13
— yields:
(gdb) print this->process_[51836]
$13 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_IDLE}
(gdb) print this->process_[51837]
$14 = {obj_ = 0x7f5109229f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51838]
$15 = {obj_ = 0x7f63540b6f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51839]
$16 = {obj_ = 0x7f6354359f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 2, weak count 0) = {get() = 0x7f50080b05a0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51840]
$17 = {obj_ = 0x7f4fe83dce90, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51841]
$18 = {obj_ = 0x7f63543a4f70, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_APPLYING}
(gdb) print this->process_[51842]
$19 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED}
(gdb) print this->process_[51843]
$20 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED}
(gdb) print this->process_[51844]
$21 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (use count 1, weak count 0) = {get() = 0x7f4fb008f520}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_FINISHED}
(gdb) print this->process_[51845]
$22 = {obj_ = 0x0, cond_ = 0x0, wait_cond_ = std::shared_ptr<gu::Cond> (empty) = {get() = 0x0}, state_ = galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::Process::S_IDLE}
That 0x7f50080b05a0
is indeed the pthread_cont_t
that we're waiting
for.
Conclusion
At this point I'm stuck. That last piece of info only confirms what we already know. No obvious reason stands out why it is waiting for something that is not happening.
Searching through the MariaDB JIRA bug tracker did not yield any candidate issues. Luckily this deadlock happened two months ago, and we haven't seen it since. But if we do see it again, we'll at least have this post as a starting point.