Message ID | 20161023212408.cjqmnzw3547ujzil@thunk.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/23/2016 03:24 PM, Theodore Ts'o wrote: > OK, perhaps this is helpful. I reverted my last patch (so I could > repro the problem), and then added the following patch to fio/mutex.c > (see below). > > When it hung, I got the following stack trace on the main thread: > > (gdb) where > #0 0x00007ff8c018cf2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > #1 0x00007ff8c01b4fb4 in usleep (useconds=<optimized out>) > at ../sysdeps/unix/sysv/linux/usleep.c:32 > #2 0x000000000045b691 in thread_main (data=0x1bc64000) at backend.c:1738 > #3 0x000000000045d623 in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2268 > #4 0x000000000045d95d in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400 > #5 0x000000000040cc18 in main (argc=2, argv=0x7fff8446f0d8, envp=<optimized out>) at fio.c:65 > > .... and it looks like the reason why we can't take the mutex is that > somehow some thread managed to call stat_exit(), which calls > fio_mutex_down() followed by fio_mutex_remove(): > > (gdb) p (void *) stat_mutex->owner > $4 = (void *) 0x42fd60 <stat_exit+16> > > How this could have happened, I have no idea. The only call to > stat_exit() appears to be at the end of fio_backend(), and the main > thread is still in the middle of calling thread_main() which is called > by run_threads(). > > Do you have any thoughts? Let me know if there's something I can try. Wow, that is very odd. That should only be possible if the main thread exits too early, and calls stat_exit() while we still have jobs running. That should trigger other funkiness too. I have no idea how that is happening, but it obviously is since your debugging patch showed that stat_exit() owns the mutex. But this gives me more to work from, very useul, thanks Ted! I'll pound on this tomorrow.
I enabled some more debugging and it's become more clear what's going on. (See attached for the full log). The main issue seems to be that once one of fio is done, it kills off the other threads (actually, we're using processes): process 31848 terminate group_id=0 process 31848 setting terminate on direct_aio/31846 process 31848 setting terminate on direct_aio/31848 process 31848 setting terminate on direct_aio/31849 process 31848 setting terminate on direct_aio/31851 process 31848 setting terminate on aio-dio-verifier/31852 process 31848 setting terminate on buffered-aio-verifier/31854 process 31851 pid=31851: runstate RUNNING -> FINISHING process 31851 terminate group_id=0 process 31851 setting terminate on direct_aio/31846 process 31851 setting terminate on direct_aio/31848 process 31851 setting terminate on direct_aio/31849 process 31851 setting terminate on direct_aio/31851 process 31851 setting terminate on aio-dio-verifier/31852 process 31851 setting terminate on buffered-aio-verifier/31854 process 31852 pid=31852: runstate RUNNING -> FINISHING process 31846 pid=31846: runstate RUNNING -> FINISHING ... but one or more of the threads doesn't exit within 60 seconds: fio: job 'direct_aio' (state=5) hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. process 31794 pid=31849: runstate RUNNING -> REAPED fio: job 'buffered-aio-verifier' (state=5) hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. process 31794 pid=31854: runstate RUNNING -> REAPED process 31794 terminate group_id=-1 The main thread then prints all of the statistics, and calls stat_exit(): stat_exit called by tid: 31794 <---- debugging message which prints gettid() Unfortunately, this process(es) aren't actually, killed, they are marked as reap, but they are still in the process listing: root@xfstests:~# ps augxww | grep fio root 1585 0.0 0.0 0 0 ? S< 18:45 0:00 [dm_bufio_cache] root 7191 0.0 0.0 12732 2200 pts/1 S+ 23:05 0:00 grep fio root 31849 1.5 0.2 407208 18876 ? Ss 22:36 0:26 /root/xfstests/bin/fio /tmp/31503.fio root 31854 1.2 0.1 398480 10240 ? Ssl 22:36 0:22 /root/xfstests/bin/fio /tmp/31503.fio And if you attach to them with a gdb, they are spinning trying to grab the stat_mutex(), which they can't get because the main thread has already called stat_exit() and then has exited. So these two threads did eventually return, but some time after 60 seconds had passed, and then they hung waiting for stat_mutex(), which they will never get because the main thread has already called stat_exit(). This probably also explains why you had trouble reproducing it. It requires a disk whose performance is variable enougoh that under heavy load, it might take more than 60 seconds for the direct_aio or buffered-aio-verifier thread to close itself out. And I suspect once the main thread exited, it probably also closed out the debugging channel so the deadlock detector did probably trip, but somehow we just didn't see the output. So I can imagine some possible fixes. We could make the thread timeout configurable, and/or increase it from 60 seconds to something like 300 seconds. We could make stat_exit() a no-op --- after all, if the main thread is exiting, there's no real point to down and then destroy the stat_mutex. And/or we could change the forced reap to send a kill -9 to the thread, and instead of maring it as reaped. Cheers, - Ted
diff --git a/mutex.c b/mutex.c index 7580922..26e2955 100644 --- a/mutex.c +++ b/mutex.c @@ -108,6 +108,8 @@ int __fio_mutex_init(struct fio_mutex *mutex, int value) mutex->value = value; mutex->magic = FIO_MUTEX_MAGIC; + if (!value) + mutex->owner = __builtin_return_address(0); ret = mutex_cond_init_pshared(&mutex->lock, &mutex->cond); if (ret) @@ -128,8 +130,11 @@ struct fio_mutex *fio_mutex_init(int value) return NULL; } - if (!__fio_mutex_init(mutex, value)) + if (!__fio_mutex_init(mutex, value)) { + if (!value) + mutex->owner = __builtin_return_address(0); return mutex; + } fio_mutex_remove(mutex); return NULL; @@ -194,7 +199,13 @@ bool fio_mutex_down_trylock(struct fio_mutex *mutex) pthread_mutex_lock(&mutex->lock); if (mutex->value) { + if (mutex->value > 1) { + log_err("Completely bogus mutex value?!? (%d, %p)\n", + mutex->value, mutex->owner); + abort(); + } mutex->value--; + mutex->owner = __builtin_return_address(0); ret = false; } pthread_mutex_unlock(&mutex->lock); @@ -215,6 +226,7 @@ void fio_mutex_down(struct fio_mutex *mutex) } mutex->value--; + mutex->owner = __builtin_return_address(0); pthread_mutex_unlock(&mutex->lock); } @@ -228,6 +240,8 @@ void fio_mutex_up(struct fio_mutex *mutex) read_barrier(); if (!mutex->value && mutex->waiters) do_wake = 1; + if (!mutex->value) + mutex->owner = 0; mutex->value++; pthread_mutex_unlock(&mutex->lock); diff --git a/mutex.h b/mutex.h index 54009ba..ea43845 100644 --- a/mutex.h +++ b/mutex.h @@ -13,6 +13,7 @@ struct fio_mutex { int value; int waiters; int magic; + unsigned long owner; }; struct fio_rwlock {