Message ID | CAHsXFKFOBSmZeT9j7U2S6J4Af2k2ODe_pkFVGJg73v0Wp7O38A@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: Set request mapping to NULL in blk_mq_put_driver_tag | expand |
On Tue, Dec 04, 2018 at 03:30:11PM +0530, Kashyap Desai wrote: > Problem statement : > Whenever try to get outstanding request via scsi_host_find_tag, > block layer will return stale entries instead of actual outstanding > request. Kernel panic if stale entry is inaccessible or memory is reused. > Fix : > Undo request mapping in blk_mq_put_driver_tag nce request is return. > > More detail : > Whenever each SDEV entry is created, block layer allocate separate tags > and static requestis.Those requests are not valid after SDEV is deleted > from the system. On the fly, block layer maps static rqs to rqs as below > from blk_mq_get_driver_tag() > > data.hctx->tags->rqs[rq->tag] = rq; > > Above mapping is active in-used requests and it is the same mapping which > is referred in function scsi_host_find_tag(). > After running some IOs, “data.hctx->tags->rqs[rq->tag]” will have some > entries which will never be reset in block layer. However, if rq & rq->tag is valid, data.hctx->tags->rqs[rq->tag] should have pointed to one active request instead of the stale one, right? Thanks, Ming
On 12/4/18 2:00 AM, Kashyap Desai wrote: > Problem statement : > Whenever try to get outstanding request via scsi_host_find_tag, > block layer will return stale entries instead of actual outstanding > request. Kernel panic if stale entry is inaccessible or memory is reused. > Fix : > Undo request mapping in blk_mq_put_driver_tag nce request is return. > > More detail : > Whenever each SDEV entry is created, block layer allocate separate tags > and static requestis.Those requests are not valid after SDEV is deleted > from the system. On the fly, block layer maps static rqs to rqs as below > from blk_mq_get_driver_tag() > > data.hctx->tags->rqs[rq->tag] = rq; > > Above mapping is active in-used requests and it is the same mapping which > is referred in function scsi_host_find_tag(). > After running some IOs, “data.hctx->tags->rqs[rq->tag]” will have some > entries which will never be reset in block layer. > > There would be a kernel panic, If request pointing to > “data.hctx->tags->rqs[rq->tag]” is part of “sdev” which is removed > and as part of that all the memory allocation of request associated with > that sdev might be reused or inaccessible to the driver. > Kernel panic snippet - > > BUG: unable to handle kernel paging request at ffffff8000000010 > IP: [<ffffffffc048306c>] mpt3sas_scsih_scsi_lookup_get+0x6c/0xc0 [mpt3sas] > PGD aa4414067 PUD 0 > Oops: 0000 [#1] SMP > Call Trace: > [<ffffffffc046f72f>] mpt3sas_get_st_from_smid+0x1f/0x60 [mpt3sas] > [<ffffffffc047e125>] scsih_shutdown+0x55/0x100 [mpt3sas] > > Cc: <stable@vger.kernel.org> > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> > > > --- > block/blk-mq.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/block/blk-mq.h b/block/blk-mq.h > index 9497b47..57432be 100644 > --- a/block/blk-mq.h > +++ b/block/blk-mq.h > @@ -175,6 +175,7 @@ static inline bool > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, > struct request *rq) > { > + hctx->tags->rqs[rq->tag] = NULL; > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > rq->tag = -1; No SCSI driver should call scsi_host_find_tag() after a request has finished. The above patch introduces yet another race and hence can't be a proper fix. Bart.
+ Linux-scsi > > diff --git a/block/blk-mq.h b/block/blk-mq.h > > index 9497b47..57432be 100644 > > --- a/block/blk-mq.h > > +++ b/block/blk-mq.h > > @@ -175,6 +175,7 @@ static inline bool > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, > > struct request *rq) > > { > > + hctx->tags->rqs[rq->tag] = NULL; > > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > > rq->tag = -1; > > No SCSI driver should call scsi_host_find_tag() after a request has > finished. The above patch introduces yet another race and hence can't be > a proper fix. Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id to find out pending IO in firmware. One of the use case is - HBA firmware recovery. In case of firmware recovery, driver may require to traverse the list and return back pending scsi command to SML for retry. I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic, mpt3sas are using API scsi_host_find_tag for the same purpose. Without this patch, we hit very basic kernel panic due to page fault. This is not an issue in non-mq code path. Non-mq path use blk_map_queue_find_tag() and that particular API does not provide stale requests. Kashyap > > Bart.
> On Tue, Dec 04, 2018 at 03:30:11PM +0530, Kashyap Desai wrote: > > Problem statement : > > Whenever try to get outstanding request via scsi_host_find_tag, > > block layer will return stale entries instead of actual outstanding > > request. Kernel panic if stale entry is inaccessible or memory is > > reused. > > Fix : > > Undo request mapping in blk_mq_put_driver_tag nce request is return. > > > > More detail : > > Whenever each SDEV entry is created, block layer allocate separate tags > > and static requestis.Those requests are not valid after SDEV is deleted > > from the system. On the fly, block layer maps static rqs to rqs as below > > from blk_mq_get_driver_tag() > > > > data.hctx->tags->rqs[rq->tag] = rq; > > > > Above mapping is active in-used requests and it is the same mapping > > which > > is referred in function scsi_host_find_tag(). > > After running some IOs, “data.hctx->tags->rqs[rq->tag]” will have some > > entries which will never be reset in block layer. > > However, if rq & rq->tag is valid, data.hctx->tags->rqs[rq->tag] should > have pointed to one active request instead of the stale one, right? Yes that is my understanding and learning from this issue. Side note - At driver load whenever driver does scsi_add_host_with_dma(), it follows below code path in block layer. scsi_mq_setup_tags ->blk_mq_alloc_tag_set -> blk_mq_alloc_rq_maps -> __blk_mq_alloc_rq_maps SML create two set of request pool. One is per HBA and other is per SDEV. I was confused why SML creates request pool per HBA. > > Thanks, > Ming
On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote: > + Linux-scsi > > > > diff --git a/block/blk-mq.h b/block/blk-mq.h > > > index 9497b47..57432be 100644 > > > --- a/block/blk-mq.h > > > +++ b/block/blk-mq.h > > > @@ -175,6 +175,7 @@ static inline bool > > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > > > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, > > > struct request *rq) > > > { > > > + hctx->tags->rqs[rq->tag] = NULL; > > > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > > > rq->tag = -1; > > > > No SCSI driver should call scsi_host_find_tag() after a request has > > finished. The above patch introduces yet another race and hence can't be > > a proper fix. > > Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id to > find out pending IO in firmware. > One of the use case is - HBA firmware recovery. In case of firmware > recovery, driver may require to traverse the list and return back pending > scsi command to SML for retry. > I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic, > mpt3sas are using API scsi_host_find_tag for the same purpose. > > Without this patch, we hit very basic kernel panic due to page fault. This > is not an issue in non-mq code path. Non-mq path use > blk_map_queue_find_tag() and that particular API does not provide stale > requests. As I wrote before, your patch doesn't fix the race you described but only makes the race window smaller. If you want an example of how to use scsi_host_find_tag() properly, have a look at the SRP initiator driver (drivers/infiniband/ulp/srp). That driver uses scsi_host_find_tag() without triggering any NULL pointer dereferences. The approach used in that driver also works when having to support HBA firmware recovery. Bart.
> -----Original Message----- > From: Bart Van Assche [mailto:bvanassche@acm.org] > Sent: Tuesday, December 4, 2018 10:45 PM > To: Kashyap Desai; linux-block; Jens Axboe; Ming Lei; linux-scsi > Cc: Suganath Prabu Subramani; Sreekanth Reddy; Sathya Prakash Veerichetty > Subject: Re: [PATCH] blk-mq: Set request mapping to NULL in > blk_mq_put_driver_tag > > On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote: > > + Linux-scsi > > > > > > diff --git a/block/blk-mq.h b/block/blk-mq.h > > > > index 9497b47..57432be 100644 > > > > --- a/block/blk-mq.h > > > > +++ b/block/blk-mq.h > > > > @@ -175,6 +175,7 @@ static inline bool > > > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > > > > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx > *hctx, > > > > struct request *rq) > > > > { > > > > + hctx->tags->rqs[rq->tag] = NULL; > > > > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > > > > rq->tag = -1; > > > > > > No SCSI driver should call scsi_host_find_tag() after a request has > > > finished. The above patch introduces yet another race and hence can't > > > be > > > a proper fix. > > > > Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id > > to > > find out pending IO in firmware. > > One of the use case is - HBA firmware recovery. In case of firmware > > recovery, driver may require to traverse the list and return back > > pending > > scsi command to SML for retry. > > I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic, > > mpt3sas are using API scsi_host_find_tag for the same purpose. > > > > Without this patch, we hit very basic kernel panic due to page fault. > > This > > is not an issue in non-mq code path. Non-mq path use > > blk_map_queue_find_tag() and that particular API does not provide stale > > requests. > > As I wrote before, your patch doesn't fix the race you described but only > makes the race window smaller. Hi Bart, Let me explain the issue. It is not a race, but very straight issue. Let's say we have one scsi_device /dev/sda and total IO submitted + completed are some number 100. All the 100 IO is *completed*. Now, As part of Firmware recovery, driver tries to find our outstanding IOs using scsi_host_find_tag(). Block layer will return all the 100 commands to the driver but really those 100 commands are not outstanding. This patch will return *actual* outstanding commands. If scsi_device /dev/sda is not removed in OS, driver accessing scmd of those 100 commands are safe memory access. Now consider a case where scsi_device /dev/sda is removed and driver performs firmware recovery. This time driver will crash while accessing scmd (randomly based on memory reused.). Along with this patch, low level driver should make sure that all request queue at block layer is quiesce. If you want an example of how to use > scsi_host_find_tag() properly, have a look at the SRP initiator driver > (drivers/infiniband/ulp/srp). That driver uses scsi_host_find_tag() > without > triggering any NULL pointer dereferences. I am not able to find right context from srp, but I check the srp code and looks like that driver is getting scmd using scsi_host_find_tag() for live command. > The approach used in that driver > also works when having to support HBA firmware recovery. > > Bart.
On Tue, 2018-12-04 at 23:48 +0530, Kashyap Desai wrote: > Let me explain the issue. It is not a race, but very straight issue. Please clarify what makes you think that iterating over all requests does not race with request completion. Is request submission perhaps blocked during the firmware recovery process? Does the firmware recovery process wait for completion interrupts that are in progress to finish? > Let's > say we have one scsi_device /dev/sda and total IO submitted + completed are > some number 100. > All the 100 IO is *completed*. Now, As part of Firmware recovery, driver > tries to find our outstanding IOs using scsi_host_find_tag(). > Block layer will return all the 100 commands to the driver but really those > 100 commands are not outstanding. This patch will return *actual* > outstanding commands. Would iterating over started requests be a good alternative to the patch that you had posted? The header above blk_mq_tagset_busy_iter() is as follows (this function is used by e.g. the skd driver for firmware recovery): /** * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set * @tagset: Tag set to iterate over. * @fn: Pointer to the function that will be called for each started * request. @fn will be called as follows: @fn(rq, @priv, * reserved) where rq is a pointer to a request. 'reserved' * indicates whether or not @rq is a reserved request. * @priv: Will be passed as second argument to @fn. */ > I am not able to find right context from srp, but I check the srp code and > looks like that driver is getting scmd using scsi_host_find_tag() for live > command. Sorry, my e-mail was a bit too short to be comprehensible. When running sg_reset -d /dev/sd... in a loop it is possible that the SCSI reset handler terminates a request while the SRP driver is handling a request completion for one of the terminated requests. That is why both srp_process_srp() and the reset handler call srp_claim_req() to make sure that only one of these two contexts completes a request. Bart.
On Tue, Dec 04, 2018 at 11:48:48PM +0530, Kashyap Desai wrote: > > -----Original Message----- > > From: Bart Van Assche [mailto:bvanassche@acm.org] > > Sent: Tuesday, December 4, 2018 10:45 PM > > To: Kashyap Desai; linux-block; Jens Axboe; Ming Lei; linux-scsi > > Cc: Suganath Prabu Subramani; Sreekanth Reddy; Sathya Prakash Veerichetty > > Subject: Re: [PATCH] blk-mq: Set request mapping to NULL in > > blk_mq_put_driver_tag > > > > On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote: > > > + Linux-scsi > > > > > > > > diff --git a/block/blk-mq.h b/block/blk-mq.h > > > > > index 9497b47..57432be 100644 > > > > > --- a/block/blk-mq.h > > > > > +++ b/block/blk-mq.h > > > > > @@ -175,6 +175,7 @@ static inline bool > > > > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > > > > > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx > > *hctx, > > > > > struct request *rq) > > > > > { > > > > > + hctx->tags->rqs[rq->tag] = NULL; > > > > > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > > > > > rq->tag = -1; > > > > > > > > No SCSI driver should call scsi_host_find_tag() after a request has > > > > finished. The above patch introduces yet another race and hence can't > > > > be > > > > a proper fix. > > > > > > Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id > > > to > > > find out pending IO in firmware. > > > One of the use case is - HBA firmware recovery. In case of firmware > > > recovery, driver may require to traverse the list and return back > > > pending > > > scsi command to SML for retry. > > > I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic, > > > mpt3sas are using API scsi_host_find_tag for the same purpose. > > > > > > Without this patch, we hit very basic kernel panic due to page fault. > > > This > > > is not an issue in non-mq code path. Non-mq path use > > > blk_map_queue_find_tag() and that particular API does not provide stale > > > requests. > > > > As I wrote before, your patch doesn't fix the race you described but only > > makes the race window smaller. > Hi Bart, > > Let me explain the issue. It is not a race, but very straight issue. Let's > say we have one scsi_device /dev/sda and total IO submitted + completed are > some number 100. > All the 100 IO is *completed*. Now, As part of Firmware recovery, driver > tries to find our outstanding IOs using scsi_host_find_tag(). If the 'tag' passed to scsi_host_find_tag() is valid, I think there shouldn't have such issue. If you want to find outstanding IOs, maybe you can try blk_mq_queue_tag_busy_iter() or blk_mq_tagset_busy_iter(), because you may not know if the passed 'tag' to scsi_host_find_tag() is valid or not. Thanks, Ming
> > If the 'tag' passed to scsi_host_find_tag() is valid, I think there > shouldn't have such issue. > > If you want to find outstanding IOs, maybe you can try > blk_mq_queue_tag_busy_iter() > or blk_mq_tagset_busy_iter(), because you may not know if the passed 'tag' > to > scsi_host_find_tag() is valid or not. We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter and it returns/callback for valid requests (no stale entries are returned). Expected. Above two APIs are only for blk-mq. What about non-mq case ? Driver should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter for blk-mq case ? I don't see that will be good interface. Also, blk_mq_tagset_busy_iter API does not provide control if driver wants to quit in-between or do some retry logic etc. Why can't we add single API which provides the correct output. scsi_host_find_tag () API works well in non-mq case because, blk_queue_end_tag() set bqt->tag_index[tag] = NULL;. We are missing similar reset upon request completion in blk-mq case. This patch has similar approach as non-mq and there is no race condition I can foresee. BTW - My original patch is half fix. We also need below changes - diff --git a/block/blk-mq.c b/block/blk-mq.c index 3f91c6e..d8f53ac 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -477,8 +477,10 @@ static void __blk_mq_free_request(struct request *rq) const int sched_tag = rq->internal_tag; blk_pm_mark_last_busy(rq); - if (rq->tag != -1) + if (rq->tag != -1) { + hctx->tags->rqs[rq->tag] = NULL; blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag); + } if (sched_tag != -1) blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag); blk_mq_sched_restart(hctx); > > > Thanks, > Ming
On 12/5/18 10:45 PM, Kashyap Desai wrote: >> >> If the 'tag' passed to scsi_host_find_tag() is valid, I think there >> shouldn't have such issue. >> >> If you want to find outstanding IOs, maybe you can try >> blk_mq_queue_tag_busy_iter() >> or blk_mq_tagset_busy_iter(), because you may not know if the passed > 'tag' >> to >> scsi_host_find_tag() is valid or not. > > We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter and > it returns/callback for valid requests (no stale entries are returned). > Expected. > Above two APIs are only for blk-mq. What about non-mq case ? Driver > should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter for > blk-mq case ? > I don't see that will be good interface. Also, blk_mq_tagset_busy_iter API > does not provide control if driver wants to quit in-between or do some > retry logic etc. > > Why can't we add single API which provides the correct output. From 4.21 and forward, there will only be blk/scsi-mq. This is exactly the problem with having to maintain two stacks, it's a huge pain.
> On 12/5/18 10:45 PM, Kashyap Desai wrote: > >> > >> If the 'tag' passed to scsi_host_find_tag() is valid, I think there > >> shouldn't have such issue. > >> > >> If you want to find outstanding IOs, maybe you can try > >> blk_mq_queue_tag_busy_iter() > >> or blk_mq_tagset_busy_iter(), because you may not know if the passed > > 'tag' > >> to > >> scsi_host_find_tag() is valid or not. > > > > We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter > > and > > it returns/callback for valid requests (no stale entries are returned). > > Expected. > > Above two APIs are only for blk-mq. What about non-mq case ? Driver > > should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter for > > blk-mq case ? > > I don't see that will be good interface. Also, blk_mq_tagset_busy_iter > > API > > does not provide control if driver wants to quit in-between or do some > > retry logic etc. > > > > Why can't we add single API which provides the correct output. > > From 4.21 and forward, there will only be blk/scsi-mq. This is exactly > the problem with having to maintain two stacks, it's a huge pain. Hi Jens, Fix for this issue also required to be back ported to stable kernels (which still use non-mq + mq stack). We have multiple choices to fix this. 1. Use " blk_mq_tagset_busy_iter" in *all* the affected drivers. This API has certain limitation as explained and also fix only blk-mq part. Using this API may need more code in low level drivers to handle non-mq and mq separately. 2. Driver can use internal memory for scsiio_track (driver private) and track all the outstanding IO within a driver. This is mostly a scsi mid layer interface. All the affected driver require a changes. 3. Fix blk-mq code around blk_mq_put_tag and driver can still use " scsi_host_find_tag". No driver changes are required. This is smooth to back port stable kernel and Linux Distribution which normally pick critical fixes from stable can pick the fix. This is better fix and did not change any design. In fact, I mimic the same flow as non-mq code is doing. I don't see any design or functional issue with #3 (PATCH provided in this thread.) What is your feedback for this patch ? Kashyap > > -- > Jens Axboe
On Thu, Dec 06, 2018 at 11:15:13AM +0530, Kashyap Desai wrote: > > > > If the 'tag' passed to scsi_host_find_tag() is valid, I think there > > shouldn't have such issue. > > > > If you want to find outstanding IOs, maybe you can try > > blk_mq_queue_tag_busy_iter() > > or blk_mq_tagset_busy_iter(), because you may not know if the passed > 'tag' > > to > > scsi_host_find_tag() is valid or not. > > We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter and > it returns/callback for valid requests (no stale entries are returned). > Expected. > Above two APIs are only for blk-mq. What about non-mq case ? Driver > should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter for > blk-mq case ? But your patch is only for blk-mq, is there same issue on non-mq case? Thanks, Ming
> -----Original Message----- > From: Ming Lei [mailto:ming.lei@redhat.com] > Sent: Friday, December 7, 2018 3:50 PM > To: Kashyap Desai > Cc: Bart Van Assche; linux-block; Jens Axboe; linux-scsi; Suganath Prabu > Subramani; Sreekanth Reddy; Sathya Prakash Veerichetty > Subject: Re: +AFs-PATCH+AF0- blk-mq: Set request mapping to NULL in > blk+AF8-mq+AF8-put+AF8-driver+AF8-tag > > On Thu, Dec 06, 2018 at 11:15:13AM +0530, Kashyap Desai wrote: > > > > > > If the 'tag' passed to scsi_host_find_tag() is valid, I think there > > > shouldn't have such issue. > > > > > > If you want to find outstanding IOs, maybe you can try > > > blk_mq_queue_tag_busy_iter() > > > or blk_mq_tagset_busy_iter(), because you may not know if the passed > > 'tag' > > > to > > > scsi_host_find_tag() is valid or not. > > > > We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter and > > it returns/callback for valid requests (no stale entries are returned). > > Expected. > > Above two APIs are only for blk-mq. What about non-mq case ? Driver > > should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter for > > blk-mq case ? > > But your patch is only for blk-mq, is there same issue on non-mq case? Problematic part from below function is code path which goes from " shost_use_blk_mq(shost))". Non-mq path works fine because every IO completion set bqt->tag_index[tag] = NULL from blk_queue_end_tag(). I did similar things for mq path in this patch. static inline struct scsi_cmnd *scsi_host_find_tag(struct Scsi_Host *shost, int tag) { struct request *req = NULL; if (tag == SCSI_NO_TAG) return NULL; if (shost_use_blk_mq(shost)) { u16 hwq = blk_mq_unique_tag_to_hwq(tag); if (hwq < shost->tag_set.nr_hw_queues) { req = blk_mq_tag_to_rq(shost->tag_set.tags[hwq], blk_mq_unique_tag_to_tag(tag)); } } else { req = blk_map_queue_find_tag(shost->bqt, tag); } Kashyap > > Thanks, > Ming
> > On Thu, Dec 06, 2018 at 11:15:13AM +0530, Kashyap Desai wrote: > > > > > > > > If the 'tag' passed to scsi_host_find_tag() is valid, I think there > > > > shouldn't have such issue. > > > > > > > > If you want to find outstanding IOs, maybe you can try > > > > blk_mq_queue_tag_busy_iter() > > > > or blk_mq_tagset_busy_iter(), because you may not know if the passed > > > 'tag' > > > > to > > > > scsi_host_find_tag() is valid or not. > > > > > > We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter > and > > > it returns/callback for valid requests (no stale entries are returned). > > > Expected. > > > Above two APIs are only for blk-mq. What about non-mq case ? Driver > > > should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter > for > > > blk-mq case ? > > > > But your patch is only for blk-mq, is there same issue on non-mq case? > > Problematic part from below function is code path which goes from " > shost_use_blk_mq(shost))". > Non-mq path works fine because every IO completion set bqt->tag_index[tag] > = NULL from blk_queue_end_tag(). > > I did similar things for mq path in this patch. > > static inline struct scsi_cmnd *scsi_host_find_tag(struct Scsi_Host *shost, > int tag) > { > struct request *req = NULL; > > if (tag == SCSI_NO_TAG) > return NULL; > > if (shost_use_blk_mq(shost)) { > u16 hwq = blk_mq_unique_tag_to_hwq(tag); > > if (hwq < shost->tag_set.nr_hw_queues) { > req = blk_mq_tag_to_rq(shost->tag_set.tags[hwq], > blk_mq_unique_tag_to_tag(tag)); > } > } else { > req = blk_map_queue_find_tag(shost->bqt, tag); > } Hi Jens, Any conclusion/feedback on this topic/patch ? As discussed, This is a safe change and good to have if no design issue. Kashyap
> > > On Thu, Dec 06, 2018 at 11:15:13AM +0530, Kashyap Desai wrote: > > > > > > > > > > If the 'tag' passed to scsi_host_find_tag() is valid, I think there > > > > > shouldn't have such issue. > > > > > > > > > > If you want to find outstanding IOs, maybe you can try > > > > > blk_mq_queue_tag_busy_iter() > > > > > or blk_mq_tagset_busy_iter(), because you may not know if the passed > > > > 'tag' > > > > > to > > > > > scsi_host_find_tag() is valid or not. > > > > > > > > We tried quick change in mpt3sas driver using blk_mq_tagset_busy_iter > > and > > > > it returns/callback for valid requests (no stale entries are returned). > > > > Expected. > > > > Above two APIs are only for blk-mq. What about non-mq case ? Driver > > > > should use scsi_host_find_tag for non-mq and blk_mq_tagset_busy_iter > > for > > > > blk-mq case ? > > > > > > But your patch is only for blk-mq, is there same issue on non-mq case? > > > > Problematic part from below function is code path which goes from " > > shost_use_blk_mq(shost))". > > Non-mq path works fine because every IO completion set bqt- > >tag_index[tag] > > = NULL from blk_queue_end_tag(). > > > > I did similar things for mq path in this patch. > > > > static inline struct scsi_cmnd *scsi_host_find_tag(struct Scsi_Host *shost, > > int tag) > > { > > struct request *req = NULL; > > > > if (tag == SCSI_NO_TAG) > > return NULL; > > > > if (shost_use_blk_mq(shost)) { > > u16 hwq = blk_mq_unique_tag_to_hwq(tag); > > > > if (hwq < shost->tag_set.nr_hw_queues) { > > req = blk_mq_tag_to_rq(shost->tag_set.tags[hwq], > > blk_mq_unique_tag_to_tag(tag)); > > } > > } else { > > req = blk_map_queue_find_tag(shost->bqt, tag); > > } > > > Hi Jens, > > Any conclusion/feedback on this topic/patch ? As discussed, This is a safe > change and good to have if no design issue. Hi, Since we have not concluded on fix, we can resume discussion on next revision of the patch. I will be posting V2 patch with complete fix. Kashyap > > Kashyap
diff --git a/block/blk-mq.h b/block/blk-mq.h index 9497b47..57432be 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -175,6 +175,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq) { + hctx->tags->rqs[rq->tag] = NULL; blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); rq->tag = -1;