[v13] blk-mq: Rework blk-mq timeout handling again

Recently the blk-mq timeout handling code was reworked. See also Tejun
Heo, "[PATCHSET v4] blk-mq: reimplement timeout handling", 08 Jan 2018
(https://www.mail-archive.com/linux-block@vger.kernel.org/msg16985.html).
This patch reworks the blk-mq timeout handling code again. The timeout
handling code is simplified by introducing a state machine per request.
This change avoids that the blk-mq timeout handling code ignores
completions that occur after blk_mq_check_expired() has been called and
before blk_mq_rq_timed_out() has been called.

Fix this race as follows:
- Reduce the gstate field from 64 to 32 bits such that cmpxchg() can
  be used to update it. Introduce deadline_seq for updating the deadline
  on 32-bit systems.
- Remove the request member variables that became superfluous due to
  this change, namely gstate_seq and aborted_gstate_sync.
- Remove the request state information that became superfluous due to
  this patch, namely RQF_MQ_TIMEOUT_EXPIRED.
- Remove the code that became superfluous due to this change, namely
  the RCU lock and unlock statements in blk_mq_complete_request() and
  also the synchronize_rcu() call in the timeout handler.

Notes:
- Atomic instructions are only used to update the request state if
  a concurrent request state change could be in progress.
- blk_add_timer() has been split into two functions - one for the
  legacy block layer and one for blk-mq.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Sebastian Ott <sebott@linux.ibm.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Israel Rukshin <israelr@mellanox.com>,
Cc: Max Gurtovoy <maxg@mellanox.com>
---

Changes compared to v12:
- Switched from cmpxchg64() to cmpxchg(). This became possible because the
  deadline is now updated before the request state.
- Introduced a new request state to ensure that completions that occur while
  the timeout function is in progress are not lost.
- Left out the ARM cmpxchg64() patch.

Changes compared to v11:
- Reworked patch 1/2: instead of introducing CONFIG_ARCH_HAVE_CMPXCHG64, make
  sure that cmpxchg64() is only defined if it can be used.

Changes compared to v10:
- In patch 1/2, added "default y if 64BIT" to the "config ARCH_HAVE_CMPXCHG64"
  entry in arch/Kconfig. Left out the "select ARCH_HAVE_CMPXCHG64" statements
  that became superfluous due to this change (alpha, arm64, powerpc and s390).
- Also in patch 1/2, only select ARCH_HAVE_CMPXCHG64 if X86_CMPXCHG64 has been
  selected.
- In patch 2/2, moved blk_mq_change_rq_state() from blk-mq.h to blk-mq.c.
- Added a comment header above __blk_mq_requeue_request() and
  blk_mq_requeue_request().
- Documented the MQ_RQ_* state transitions in block/blk-mq.h.
- Left out the fourth argument of blk_mq_rq_set_deadline().

Changes compared to v9:
- Addressed multiple comments related to patch 1/2: added
  CONFIG_ARCH_HAVE_CMPXCHG64 for riscv, modified
  features/locking/cmpxchg64/arch-support.txt as requested and made the
  order of the symbols in the arch/*/Kconfig alphabetical where possible.

Changes compared to v8:
- Split into two patches.
- Moved the spin_lock_init() call from blk_mq_rq_ctx_init() into
  blk_mq_init_request().
- Fixed the deadline set by blk_add_timer().
- Surrounded the das_lock member with #ifndef CONFIG_ARCH_HAVE_CMPXCHG64 /
  #endif.

Changes compared to v7:
- Fixed the generation number mechanism. Note: with this patch applied the
  behavior of the block layer does not depend on the generation number.
- Added more 32-bit architectures to the list of architectures on which
  cmpxchg64() should not be used.

Changes compared to v6:
- Used a union instead of bit manipulations to store multiple values into
  a single 64-bit field.
- Reduced the size of the timeout field from 64 to 32 bits.
- Made sure that the block layer still builds with this patch applied
  for the sh and mips architectures.
- Fixed two sparse warnings that were introduced by this patch in the
  WRITE_ONCE() calls.

Changes compared to v5:
- Restored the synchronize_rcu() call between marking a request for timeout
  handling and the actual timeout handling to avoid that timeout handling
  starts while .queue_rq() is still in progress if the timeout is very short.
- Only use cmpxchg() if another context could attempt to change the request
  state concurrently. Use WRITE_ONCE() otherwise.

Changes compared to v4:
- Addressed multiple review comments from Christoph. The most important are
  that atomic_long_cmpxchg() has been changed into cmpxchg() and also that
  there is now a nice and clean split between the legacy and blk-mq versions
  of blk_add_timer().
- Changed the patch name and modified the patch description because there is
  disagreement about whether or not the v4.16 blk-mq core can complete a
  single request twice. Kept the "Cc: stable" tag because of
  https://bugzilla.kernel.org/show_bug.cgi?id=199077.

Changes compared to v3 (see also https://www.mail-archive.com/linux-block@vger.kernel.org/msg20073.html):
- Removed the spinlock again that was introduced to protect the request state.
  v4 uses atomic_long_cmpxchg() instead.
- Split __deadline into two variables - one for the legacy block layer and one
  for blk-mq.

Changes compared to v2 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18338.html):
- Rebased and retested on top of kernel v4.16.

Changes compared to v1 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18089.html):
- Removed the gstate and aborted_gstate members of struct request and used
  the __deadline member to encode both the generation and state information.

 block/blk-core.c       |   9 +-
 block/blk-mq-debugfs.c |   4 +-
 block/blk-mq.c         | 250 ++++++++++++++++++++++++++-----------------------
 block/blk-mq.h         |  54 +++++------
 block/blk-timeout.c    | 107 +++++++++++++--------
 block/blk.h            |   1 +
 include/linux/blkdev.h |  44 ++++-----
 7 files changed, 250 insertions(+), 219 deletions(-)

[v13] blk-mq: Rework blk-mq timeout handling again

Commit Message

Comments

Patch