Message ID | 20201020065420.124885-1-jefflexu@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support of iopoll for dm device | expand |
On Tue, Oct 20 2020 at 2:54am -0400, Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > This patch set adds support of iopoll for dm device. > > This is only an RFC patch. I'm really looking forward getting your > feedbacks on if you're interested in supporting iopoll for dm device, > or if there's a better design to implement that. > > Thanks. > > > [Purpose] > IO polling is an important mode of io_uring. Currently only mq devices > support iopoll. As for dm devices, only dm-multipath is request-base, > while others are all bio-based and have no support for iopoll. > Supporting iopoll for dm devices can be of great meaning when the > device seen by application is dm device such as dm-linear/dm-stripe, > in which case iopoll is not usable for io_uring. I appreciate you taking the initiative on this; polling support is on my TODO so your work serves as a nice reminder to pursue this more urgently. > [Design Notes] > > cookie > ------ > Let's start from cookie. Cookie is one important concept in iopoll. It > is used to identify one specific request in one specific hardware queue. > The concept of cookie is initially designed as a per-bio concept, and > thus it doesn't work well when bio-split involved. When bio is split, > the returned cookie is indeed the cookie of one of the split bio, and > the following polling on this returned cookie can only guarantee the > completion of this specific split bio, while the other split bios may > be still uncompleted. Bio-split is also resolved for dm device, though > in another form, in which case the original bio submitted to the dm > device may be split into multiple bios submitted to the underlying > devices. > > In previous discussion, Lei Ming has suggested that iopoll should be > disabled for bio-split. This works for the normal bio-split (done in > blk_mq_submit_bio->__blk_queue_split), while iopoll will never be > usable for dm devices if this also applies for dm device. > > So come back to the design of the cookie for dm devices. At the very > beginning I want to refactor the design of cookie, from 'unsigned int' > type to the pointer type for dm device, so that cookie can point to > something, e.g. a list containing all cookies of one original bio, > something like this: > > struct dm_io { // the returned cookie points to dm_io > ... > struct list_head cookies; > }; > > In this design, we can fetch all cookies of one original bio, but the > implementation can be difficult and buggy. For example, the > 'struct dm_io' structure may be already freed when the returned cookie > is used in blk_poll(). Then what if maintain a refcount in struct dm_io > so that 'struct dm_io' structure can not be freed until blk_poll() > called? Then the 'struct dm_io' structure will never be freed if the > IO polling is not used at all. I'd have to look closer at the race in the code you wrote (though you didn't share it); but we cannot avoid properly mapping a cookie to each split bio. Otherwise you resort to inefficiently polling everything. Seems your attempt to have the cookie point to a dm_io object was likely too coarse-grained (when bios are split they get their own dm_io on recursive re-entry to DM core from ->submit_bio); but isn't having a list of cookies still too imprecise for polling purposes? You could easily have a list that translates to many blk-mq queues. Possibly better than your current approach of polling everything -- but not much. > So finally I gived up refactoring the design of cookie and maintain all > cookies of one original bio. The initial design of cookie gets retained. > The returned cookie is still the cookie of one of the split bio, and thus > it is not used at all when polling dm devices. The polling of dm device, > is indeed iterating and polling on all hardware queues (in polling mode) > of all underlying target devices. > > This design is simple and easy to implement. But I'm not sure if the > performance can be an issue if one dm device mapped to too many target > devices or the dm stack depth grows. You showed 40% performance improvement but it really just does the bare minimum of implementing polling. It is so simplistic that I really don't think the design even serves as a starting point for what will be needed. So this needs further design time. What you've _done_ could serve as a stop-gap but I'd really rather we get it properly designed from the start. > REQ_NOWAIT > ---------- > The polling bio will set REQ_NOWAIT in bio->bi_flags, and the device > need to be capable of handling REQ_NOWAIT. Commit 021a24460dc2 > ("block: add QUEUE_FLAG_NOWAIT") and commit 6abc49468eea ("dm: add > support for REQ_NOWAIT and enable it for linear target") add this > support for dm device, and currently only dm-linear supports that. > > hybrid polling > -------------- > When execute hybrid polling, cookie is used to fetch the request, > and check if the request has completed or not. In the design for > dm device described above, the returned cookie is still the cookie > of one of the split bios, and thus we have no way checking if all > the split bios have completed or not. Thus in the current > implementation, hybrid polling is not supported for dm device. I'll look closer at all this. But DM targets do allow for additional target specific per-bio-data to be added to the overall per-bio-data size (via ti->per_io_data_size). DM core _could_ internalize adding additional space to per-bio-data for storing a unique cookie per split/clone bio. Conversely, block-core's bio cloning could manage this so long as it knew how to access the offset into the established per-bio-data. But that might be more challenging to do without impacting use-cases outside of DM. Thanks, Mike > [Performance] > I test 4k-randread on a dm-linear mapped to only one nvme device. > > SSD: INTEL SSDPEDME800G4 > dm-linear:dmsetup create testdev --table "0 209715200 linear /dev/nvme0n1 0" > > fio configs: > ``` > ioengine=io_uring > iodepth=128 > numjobs=1 > thread > rw=randread > direct=1 > registerfiles=1 > #hipri=1 with iopoll enabled, hipri=0 with iopoll disabled > bs=4k > size=100G > runtime=60 > time_based > group_reporting > randrepeat=0 > > [device] > filename=/dev/mapper/testdev > ``` > > The test result shows that there's ~40% performance growth when iopoll > enabled. > > test | iops | bw | avg lat > ---- | ---- | ---- | ---- > iopoll-disabled | 244k | 953MiB/s | 524us > iopoll-enabled | 345k | 1349MiB/s | 370us > > [TODO] > The store method of "io_poll"/"io_poll_delay" has not been adapted > for dm device. > > > Jeffle Xu (3): > block/mq: add iterator for polling hw queues > block: add back ->poll_fn in request queue > dm: add support for IO polling > > block/blk-mq.c | 30 ++++++++++++++++++++++++------ > drivers/md/dm-core.h | 1 + > drivers/md/dm-table.c | 30 ++++++++++++++++++++++++++++++ > drivers/md/dm.c | 40 ++++++++++++++++++++++++++++++++++++++++ > include/linux/blk-mq.h | 6 ++++++ > include/linux/blkdev.h | 3 +++ > 6 files changed, 104 insertions(+), 6 deletions(-) > > -- > 2.27.0 >
On 10/22/20 4:39 AM, Mike Snitzer wrote: > What you've _done_ could serve as a stop-gap but I'd really rather we > get it properly designed from the start. Indeed I totally agree with you that the design should be done nicely at the very beginning. And this is indeed the purpose of this RFC patch. >> This patch set adds support of iopoll for dm device. >> >> This is only an RFC patch. I'm really looking forward getting your >> feedbacks on if you're interested in supporting iopoll for dm device, >> or if there's a better design to implement that. >> >> Thanks. >> >> >> [Purpose] >> IO polling is an important mode of io_uring. Currently only mq devices >> support iopoll. As for dm devices, only dm-multipath is request-base, >> while others are all bio-based and have no support for iopoll. >> Supporting iopoll for dm devices can be of great meaning when the >> device seen by application is dm device such as dm-linear/dm-stripe, >> in which case iopoll is not usable for io_uring. > I appreciate you taking the initiative on this; polling support is on my > TODO so your work serves as a nice reminder to pursue this more > urgently. It's a good news that iopoll for DM is meaningful. > but we cannot avoid properly mapping a cookie to each > split bio. Otherwise you resort to inefficiently polling everything. Yes. At the very beginning I tried to build the mapping a cookie to each bio, but I failed with several blocking design issues. By the way maybe we could clarify these design issues here, if you'd like. > > Seems your attempt to have the cookie point to a dm_io object was likely > too coarse-grained (when bios are split they get their own dm_io on > recursive re-entry to DM core from ->submit_bio); but isn't having a > list of cookies still too imprecise for polling purposes? You could > easily have a list that translates to many blk-mq queues. Possibly > better than your current approach of polling everything -- but not > much. To make the discussion more specific, assume that dm0 is mapped to dm1/2/3, while dm1 mapped to nvme1, dm2 mapped to dm2, etc.. dm0 dm1 dm2 dm3 nvme1 nvme2 nvme3 Then the returned cookie of dm0 could be pointer pointing to dm_io object of dm0. struct dm_io { // the returned cookie points to dm_io object ... + struct list_head cookies; }; struct dm_target_io { ... /* * The corresponding polling hw queue if submitted to mq device (such as nvme1/2/3), * NULL if submitted to dm device (such as dm1/2/3) */ + struct blk_mq_hw_ctx *hctx; + struct list_head node; // add to @cookies list }; The @cookies list of dm_io object could maintain all dm_target_io objects of all **none-dm** devices, that is, all hw queues that we should poll on. returned -> @cookies list cookie of dm_io object of dm0 | +--> dm_target_io -> dm_target_io -> dm_target_io object of nvme1 object of nvme2 object of nvme3 When polling returned cookie of dm0, actually we're polling @cookies list. Once one of the dm_target_io completed (e.g. nvme2), it should be removed from the @cookies list., and thus we should only focus on hw queues that have not completed. >> [Design Notes] >> >> cookie >> ------ >> Let's start from cookie. Cookie is one important concept in iopoll. It >> is used to identify one specific request in one specific hardware queue. >> The concept of cookie is initially designed as a per-bio concept, and >> thus it doesn't work well when bio-split involved. When bio is split, >> the returned cookie is indeed the cookie of one of the split bio, and >> the following polling on this returned cookie can only guarantee the >> completion of this specific split bio, while the other split bios may >> be still uncompleted. Bio-split is also resolved for dm device, though >> in another form, in which case the original bio submitted to the dm >> device may be split into multiple bios submitted to the underlying >> devices. >> >> In previous discussion, Lei Ming has suggested that iopoll should be >> disabled for bio-split. This works for the normal bio-split (done in >> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be >> usable for dm devices if this also applies for dm device. >> >> So come back to the design of the cookie for dm devices. At the very >> beginning I want to refactor the design of cookie, from 'unsigned int' >> type to the pointer type for dm device, so that cookie can point to >> something, e.g. a list containing all cookies of one original bio, >> something like this: >> >> struct dm_io { // the returned cookie points to dm_io >> ... >> struct list_head cookies; >> }; >> >> In this design, we can fetch all cookies of one original bio, but the >> implementation can be difficult and buggy. For example, the >> 'struct dm_io' structure may be already freed when the returned cookie >> is used in blk_poll(). Then what if maintain a refcount in struct dm_io >> so that 'struct dm_io' structure can not be freed until blk_poll() >> called? Then the 'struct dm_io' structure will never be freed if the >> IO polling is not used at all. > I'd have to look closer at the race in the code you wrote (though you > didn't share it); I worried that dm_target_io/dm_io objects could have been freed before/when we are polling on them, and thus could cause use-after-free when accessing @cookies list in dm_target_io. It could happen when there are multiple polling instance. io_uring has implemented per-instance polling thread. If there are two bios submitted to dm0, please consider the following race sequence: 1. race 1: dm_target_io/dm_io objects could have been freed ****when**** we are polling on them ```* * *thread1 polling on bio1 thread2 polling on bio2* fetch dm_io object by the cookie reaps completions in nvme 1/2/3, completes bio1 and frees dm_io object of bio1 by the way use-after-free when accessing dm_io object ``` Maybe we should get a refcount of dm_io of bio1 when start polling, but I'm not sure if the design is elegant or not. 2. race 2: dm_target_io/dm_io objects could have been freed ****before**** we are polling on them ``` *thread1 polling on bio1 thread2 polling on bio2* reaps completions in nvme 1/2/3, clone_endio dec_pending free_io(md, io); // free dm_io object __blkdev_direct_IO READ_ONCE(dio->waiter) // dio->waiter is still none-NULL bio_endio(io->orig_bio) //call bi_end_io() of original bio, that is blkdev_bio_end_io WRITE_ONCE(dio->waiter, NULL); blk_poll fetch dm_io object by the cookie // use-after-free! ``` Thanks. Jeffle.
On Thu, Oct 22 2020 at 1:28am -0400, JeffleXu <jefflexu@linux.alibaba.com> wrote: > > On 10/22/20 4:39 AM, Mike Snitzer wrote: > > >What you've _done_ could serve as a stop-gap but I'd really rather we > >get it properly designed from the start. > > Indeed I totally agree with you that the design should be done > nicely at the very beginning. And this > > is indeed the purpose of this RFC patch. > > > >>This patch set adds support of iopoll for dm device. > >> > >>This is only an RFC patch. I'm really looking forward getting your > >>feedbacks on if you're interested in supporting iopoll for dm device, > >>or if there's a better design to implement that. > >> > >>Thanks. > >> > >> > >>[Purpose] > >>IO polling is an important mode of io_uring. Currently only mq devices > >>support iopoll. As for dm devices, only dm-multipath is request-base, > >>while others are all bio-based and have no support for iopoll. > >>Supporting iopoll for dm devices can be of great meaning when the > >>device seen by application is dm device such as dm-linear/dm-stripe, > >>in which case iopoll is not usable for io_uring. > >I appreciate you taking the initiative on this; polling support is on my > >TODO so your work serves as a nice reminder to pursue this more > >urgently. > > It's a good news that iopoll for DM is meaningful. > > > >but we cannot avoid properly mapping a cookie to each > >split bio. Otherwise you resort to inefficiently polling everything. > > Yes. At the very beginning I tried to build the mapping a cookie to > each bio, but I failed with several > > blocking design issues. By the way maybe we could clarify these > design issues here, if you'd like. Biggest issue I'm seeing is that block core's bio-based IO submission implementation really never seriously carried the blk_qc_t changes through. The cookie return from __submit_bio is thrown away when recursion occurs in __submit_bio_noacct -- last bio submission's cookie is what is returned back to caller. That cookie could be very far removed from all the others returned earlier in the recursion. Fixing this would require quite some design and cleanup. > >Seems your attempt to have the cookie point to a dm_io object was likely > >too coarse-grained (when bios are split they get their own dm_io on > >recursive re-entry to DM core from ->submit_bio); but isn't having a > >list of cookies still too imprecise for polling purposes? You could > >easily have a list that translates to many blk-mq queues. Possibly > >better than your current approach of polling everything -- but not > >much. > > To make the discussion more specific, assume that dm0 is mapped to > dm1/2/3, while dm1 mapped to > > nvme1, dm2 mapped to dm2, etc.. > > dm0 > > dm1 dm2 dm3 > > nvme1 nvme2 nvme3 > > > Then the returned cookie of dm0 could be pointer pointing to dm_io > object of dm0. > > struct dm_io { // the returned cookie points to dm_io object > ... > + struct list_head cookies; > }; > > struct dm_target_io { > ... > /* > * The corresponding polling hw queue if submitted to mq device (such as nvme1/2/3), > * NULL if submitted to dm device (such as dm1/2/3) > */ > + struct blk_mq_hw_ctx *hctx; > + struct list_head node; // add to @cookies list > }; > > The @cookies list of dm_io object could maintain all dm_target_io objects > of all **none-dm** devices, that is, all hw queues that we should poll on. > > > returned -> @cookies list > cookie of dm_io object of dm0 > | > +--> dm_target_io -> dm_target_io -> dm_target_io > object of nvme1 object of nvme2 object of nvme3 > > When polling returned cookie of dm0, actually we're polling @cookies > list. Once one of the dm_target_io > > completed (e.g. nvme2), it should be removed from the @cookies > list., and thus we should only focus on > > hw queues that have not completed. What you detailed there isn't properly modeling what it needs to. A given dm_target_io could result in quite a few bios (e.g. for dm-striped we clone each bio for each of N stripes). So the fan-out, especially if then stacked on N layers of stacked devices, to all the various hctx at the lowest layers is like herding cats. But the recursion in block core's submit_bio path makes that challenging to say the least. So much so that any solution related to enabling proper bio-based IO polling is going to need a pretty significant investment in fixing block core (storing __submit_bio()'s cookie during recursion, possibly storing to driver provided memory location, e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM clone bio's per-bio-data). SO __submit_bio_noacct would become: retp = &ret; if (bio->submit_cookie) retp = bio->submit_cookie; *retp = __submit_bio(bio); > >>[Design Notes] > >> > >>cookie > >>------ > >>Let's start from cookie. Cookie is one important concept in iopoll. It > >>is used to identify one specific request in one specific hardware queue. > >>The concept of cookie is initially designed as a per-bio concept, and > >>thus it doesn't work well when bio-split involved. When bio is split, > >>the returned cookie is indeed the cookie of one of the split bio, and > >>the following polling on this returned cookie can only guarantee the > >>completion of this specific split bio, while the other split bios may > >>be still uncompleted. Bio-split is also resolved for dm device, though > >>in another form, in which case the original bio submitted to the dm > >>device may be split into multiple bios submitted to the underlying > >>devices. > >> > >>In previous discussion, Lei Ming has suggested that iopoll should be > >>disabled for bio-split. This works for the normal bio-split (done in > >>blk_mq_submit_bio->__blk_queue_split), while iopoll will never be > >>usable for dm devices if this also applies for dm device. > >> > >>So come back to the design of the cookie for dm devices. At the very > >>beginning I want to refactor the design of cookie, from 'unsigned int' > >>type to the pointer type for dm device, so that cookie can point to > >>something, e.g. a list containing all cookies of one original bio, > >>something like this: > >> > >>struct dm_io { // the returned cookie points to dm_io > >> ... > >> struct list_head cookies; > >>}; > >> > >>In this design, we can fetch all cookies of one original bio, but the > >>implementation can be difficult and buggy. For example, the > >>'struct dm_io' structure may be already freed when the returned cookie > >>is used in blk_poll(). Then what if maintain a refcount in struct dm_io > >>so that 'struct dm_io' structure can not be freed until blk_poll() > >>called? Then the 'struct dm_io' structure will never be freed if the > >>IO polling is not used at all. > >I'd have to look closer at the race in the code you wrote (though you > >didn't share it); > > I worried that dm_target_io/dm_io objects could have been freed > before/when we are polling on them, > > and thus could cause use-after-free when accessing @cookies list in > dm_target_io. It could happen > > when there are multiple polling instance. io_uring has implemented > per-instance polling thread. If > > there are two bios submitted to dm0, please consider the following > race sequence: The lifetime of the bios should be fine given that the cloning nature of DM requires that all clones complete before the origin may complete. I think you probably just got caught out by the recursive nature of the bio submission path -- makes creating a graph of submitted bios and their associated per-bio-data and generated cookies a mess to track (again, like herding cats). Could also be you didn't quite understand the DM code's various structures. In any case, the block core changes needed to make bio-based IO polling work is the main limiting factor right now. Not sure it is worth the investment... but I could be persuaded to try harder! ;) But then once block core is fixed to allow this, we _still_ need to link all the various 'struct dm_poll_data' structures to allow blk_poll() to call DM specific method to walk all in the list to calling blk_poll() for stored cookie and request_queue*, e.g.: struct dm_poll_data { blk_qc_t cookie; struct request_queue *queue; struct list_head list; }; Again, it is the recursive nature of submit_bio() that makes this challenging. Mike
On 10/27/20 2:53 AM, Mike Snitzer wrote: > What you detailed there isn't properly modeling what it needs to. > A given dm_target_io could result in quite a few bios (e.g. for > dm-striped we clone each bio for each of N stripes). So the fan-out, > especially if then stacked on N layers of stacked devices, to all the > various hctx at the lowest layers is like herding cats. > > But the recursion in block core's submit_bio path makes that challenging > to say the least. So much so that any solution related to enabling > proper bio-based IO polling is going to need a pretty significant > investment in fixing block core (storing __submit_bio()'s cookie during > recursion, possibly storing to driver provided memory location, > e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM > clone bio's per-bio-data). > > SO __submit_bio_noacct would become: > > retp = &ret; > if (bio->submit_cookie) > retp = bio->submit_cookie; > *retp = __submit_bio(bio); Sorry for the late reply. Exactly I missed this point before. IF you have not started working on this, I'd like to try to implement this as an RFC. > I think you probably just got caught out by the recursive nature of the bio > submission path -- makes creating a graph of submitted bios and their > associated per-bio-data and generated cookies a mess to track (again, > like herding cats). > > Could also be you didn't quite understand the DM code's various > structures. > > In any case, the block core changes needed to make bio-based IO polling > work is the main limiting factor right now. Yes the logic is kind of subtle and maybe what I'm concerned here is really should be concerned at the coding phase. Thanks. Jeffle Xu
On Sun, Nov 01 2020 at 10:14pm -0500, JeffleXu <jefflexu@linux.alibaba.com> wrote: > > On 10/27/20 2:53 AM, Mike Snitzer wrote: > >What you detailed there isn't properly modeling what it needs to. > >A given dm_target_io could result in quite a few bios (e.g. for > >dm-striped we clone each bio for each of N stripes). So the fan-out, > >especially if then stacked on N layers of stacked devices, to all the > >various hctx at the lowest layers is like herding cats. > > > >But the recursion in block core's submit_bio path makes that challenging > >to say the least. So much so that any solution related to enabling > >proper bio-based IO polling is going to need a pretty significant > >investment in fixing block core (storing __submit_bio()'s cookie during > >recursion, possibly storing to driver provided memory location, > >e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM > >clone bio's per-bio-data). > > > >SO __submit_bio_noacct would become: > > > > retp = &ret; > > if (bio->submit_cookie) > > retp = bio->submit_cookie; > > *retp = __submit_bio(bio); > > Sorry for the late reply. Exactly I missed this point before. IF you > have not started working on this, I'd like to try to implement this as > an RFC. I did start on this line of development but it needs quite a bit more work. Even the pseudo code I provided above isn't useful in the context of DM clone bios that have their own per-bio-data to assist with this implementation. Because the __submit_bio_noacct() recursive call drivers/md/dm.c:__split_and_process_bio() makes is supplying the original bio (modified to only point to remaining work). But sure, I'm not opposed to you carrying this line of work forward. I can always lend a hand if you need help later or if you need to hand it off to me. > >I think you probably just got caught out by the recursive nature of the bio > >submission path -- makes creating a graph of submitted bios and their > >associated per-bio-data and generated cookies a mess to track (again, > >like herding cats). > > > >Could also be you didn't quite understand the DM code's various > >structures. > > > >In any case, the block core changes needed to make bio-based IO polling > >work is the main limiting factor right now. > > Yes the logic is kind of subtle and maybe what I'm concerned here is > really should be concerned at the coding phase. Definitely, lots of little details and associations. Mike
On 11/2/20 11:28 PM, Mike Snitzer wrote: > On Sun, Nov 01 2020 at 10:14pm -0500, > JeffleXu <jefflexu@linux.alibaba.com> wrote: > >> On 10/27/20 2:53 AM, Mike Snitzer wrote: >>> What you detailed there isn't properly modeling what it needs to. >>> A given dm_target_io could result in quite a few bios (e.g. for >>> dm-striped we clone each bio for each of N stripes). So the fan-out, >>> especially if then stacked on N layers of stacked devices, to all the >>> various hctx at the lowest layers is like herding cats. >>> >>> But the recursion in block core's submit_bio path makes that challenging >>> to say the least. So much so that any solution related to enabling >>> proper bio-based IO polling is going to need a pretty significant >>> investment in fixing block core (storing __submit_bio()'s cookie during >>> recursion, possibly storing to driver provided memory location, >>> e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM >>> clone bio's per-bio-data). >>> >>> SO __submit_bio_noacct would become: >>> >>> retp = &ret; >>> if (bio->submit_cookie) >>> retp = bio->submit_cookie; >>> *retp = __submit_bio(bio); >> Sorry for the late reply. Exactly I missed this point before. IF you >> have not started working on this, I'd like to try to implement this as >> an RFC. > I did start on this line of development but it needs quite a bit more > work. Even the pseudo code I provided above isn't useful in the context > of DM clone bios that have their own per-bio-data to assist with this > implementation. Because the __submit_bio_noacct() recursive call > drivers/md/dm.c:__split_and_process_bio() makes is supplying the > original bio (modified to only point to remaining work). > > But sure, I'm not opposed to you carrying this line of work forward. I > can always lend a hand if you need help later or if you need to hand it > off to me. Thanks. I will continue working on this. Jeffle
On 11/2/20 11:28 PM, Mike Snitzer wrote: > On Sun, Nov 01 2020 at 10:14pm -0500, > JeffleXu <jefflexu@linux.alibaba.com> wrote: > >> On 10/27/20 2:53 AM, Mike Snitzer wrote: >>> What you detailed there isn't properly modeling what it needs to. >>> A given dm_target_io could result in quite a few bios (e.g. for >>> dm-striped we clone each bio for each of N stripes). So the fan-out, >>> especially if then stacked on N layers of stacked devices, to all the >>> various hctx at the lowest layers is like herding cats. >>> >>> But the recursion in block core's submit_bio path makes that challenging >>> to say the least. So much so that any solution related to enabling >>> proper bio-based IO polling is going to need a pretty significant >>> investment in fixing block core (storing __submit_bio()'s cookie during >>> recursion, possibly storing to driver provided memory location, >>> e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM >>> clone bio's per-bio-data). >>> >>> SO __submit_bio_noacct would become: >>> >>> retp = &ret; >>> if (bio->submit_cookie) >>> retp = bio->submit_cookie; >>> *retp = __submit_bio(bio); >> Sorry for the late reply. Exactly I missed this point before. IF you >> have not started working on this, I'd like to try to implement this as >> an RFC. > I did start on this line of development but it needs quite a bit more > work. Even the pseudo code I provided above isn't useful in the context > of DM clone bios that have their own per-bio-data to assist with this > implementation. Because the __submit_bio_noacct() recursive call > drivers/md/dm.c:__split_and_process_bio() makes is supplying the > original bio (modified to only point to remaining work). Yes I noticed this recently. Since the depth-first splitting introduced in commit 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk"), one bio to dm device can be split into multiple bios to this dm device. ``` one bio to dm device (dm0) = one dm_io (to nvme0) + one bio to this same dm device (dm0) ``` In this case we need a mechanism to track all split sub-bios of the very beginning original bio. I'm doubted if this should be implemented in block layer like: ``` struct bio { ... struct list_head cookies; }; ``` After all it's only used by bio-based queue, or more specifically only dm device currently. Another design I can come up with is to maintain a global data structure for the very beginning original bio. Currently the blocking point is that now one original bio to the dm device (@bio of dm_submit()) can correspond to multiple dm_io and thus we have nowhere to place the @cookies list. Now we have to maintain one data structure for every original bio, something like ``` struct dm_poll_instance { ... struct list_head cookies; }; ``` We can transfer this dm_poll_instance between split bios by bio->bi_private, like ``` dm_submit_bio(...) { struct dm_poll_instance *ins; if (bio->bi_private) ins = bio->bi_private; else { ins = alloc_poll_instance(); bio->bi_private = ins; } ... } ```
On Wed, Nov 04 2020 at 1:47am -0500, JeffleXu <jefflexu@linux.alibaba.com> wrote: > > On 11/2/20 11:28 PM, Mike Snitzer wrote: > >On Sun, Nov 01 2020 at 10:14pm -0500, > >JeffleXu <jefflexu@linux.alibaba.com> wrote: > > > >>On 10/27/20 2:53 AM, Mike Snitzer wrote: > >>>What you detailed there isn't properly modeling what it needs to. > >>>A given dm_target_io could result in quite a few bios (e.g. for > >>>dm-striped we clone each bio for each of N stripes). So the fan-out, > >>>especially if then stacked on N layers of stacked devices, to all the > >>>various hctx at the lowest layers is like herding cats. > >>> > >>>But the recursion in block core's submit_bio path makes that challenging > >>>to say the least. So much so that any solution related to enabling > >>>proper bio-based IO polling is going to need a pretty significant > >>>investment in fixing block core (storing __submit_bio()'s cookie during > >>>recursion, possibly storing to driver provided memory location, > >>>e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM > >>>clone bio's per-bio-data). > >>> > >>>SO __submit_bio_noacct would become: > >>> > >>> retp = &ret; > >>> if (bio->submit_cookie) > >>> retp = bio->submit_cookie; > >>> *retp = __submit_bio(bio); > >>Sorry for the late reply. Exactly I missed this point before. IF you > >>have not started working on this, I'd like to try to implement this as > >>an RFC. > >I did start on this line of development but it needs quite a bit more > >work. Even the pseudo code I provided above isn't useful in the context > >of DM clone bios that have their own per-bio-data to assist with this > >implementation. Because the __submit_bio_noacct() recursive call > >drivers/md/dm.c:__split_and_process_bio() makes is supplying the > >original bio (modified to only point to remaining work). > > Yes I noticed this recently. Since the depth-first splitting > introduced in commit 18a25da84354 > > ("dm: ensure bio submission follows a depth-first tree walk"), one > bio to dm device can be > > split into multiple bios to this dm device. > > ``` > > one bio to dm device (dm0) = one dm_io (to nvme0) + one bio to this > same dm device (dm0) > > ``` > > > In this case we need a mechanism to track all split sub-bios of the > very beginning original bio. Yes, splitting causes additional potential for sub-bios. There are other cases that cause a 1-to-many bio generation (e.g. dm-striped) or splitting cases where a DM target makes use of dm_accept_partial_bio (e.g. dm-snapshot, dm-integrity, dm-writecache, etc). > I'm doubted if this should be implemented in block layer like: > > ``` > > struct bio { > > ... > > struct list_head cookies; > > }; > > ``` > > After all it's only used by bio-based queue, or more specifically > only dm device currently. I do think this line of work really should be handled in block core because I cannot see any reason why MD or bcache or whatever bio-based device wouldn't want the ability to better support io_uring (with IO poll). > Another design I can come up with is to maintain a global data > structure for the very beginning > original bio. Currently the blocking point is that now one original > bio to the dm device (@bio of dm_submit()) can correspond to multiple > dm_io and thus we have nowhere to place the @cookies list. Yes, and that will always be the case. We need the design to handle an arbitrary sprawl of splitting from a given bio. The graph of bios resulting from that fan-out needs to be walked at various levels -- be it the top-level original bio's submit_bio() returned cookie or some intermediate point in the chain of bios. The problem is the lifetime of the data structure created for a given split bio versus layering boundaries (that come from block core's simplistic recursion via bio using submit_bio). > Now we have to maintain one data structure for every original bio, > something like > > ``` > > struct dm_poll_instance { > > ... > > struct list_head cookies; > > }; > > ``` I do think we need a hybrid where at the point of recursion we're able to make the associated data structure available across the recursion boundary so that modeling the association in a chain of split bios is possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it, _but_ that struct definition would live in block core, but would be part of per-bio-data; so 'struct blk_poll_data' is more logical name when elevated to block core). It _might_ be worthwhile to see if a new BIO_ flag could be added to allow augmenting the bio_split + bio_chain pattern to also track this additional case of carrying additional data per-bio while creating bio-chains. I may not be clear yet, said differently: augmenting bio_chain to not only chain bios, but to _also_ thread/chain together per-bio-data that lives within those chained bios. SO you have the chain of bios _and_ the chain of potentially opaque void * that happens to point to a list head for a list of 'struct blk_poll_data'. Does that make sense? > We can transfer this dm_poll_instance between split bios by > bio->bi_private, like > > ``` > > dm_submit_bio(...) { > > struct dm_poll_instance *ins; > > if (bio->bi_private) > > ins = bio->bi_private; > > else { > > ins = alloc_poll_instance(); > > bio->bi_private = ins; > > } > > ... > > } > > ``` Sadly, we cannot (ab)use bi_private for this given its (ab)used via the bio_chain() interface. It's almost like we need to add a new pointer in the bio that isn't left for block core to hijack. There is the well-worn pattern of saving off the original bi_private, hooking a new endio method and then when that endio is called restoring bi_private but we really want to avoid excessive indirect function calls for this usecase. The entire point of implementing blk_poll support is for performance after all. Mike
On 11/4/20 11:08 PM, Mike Snitzer wrote: >> I'm doubted if this should be implemented in block layer like: >> >> ``` >> >> struct bio { >> >> ... >> >> struct list_head cookies; >> >> }; >> >> ``` >> >> After all it's only used by bio-based queue, or more specifically >> only dm device currently. > I do think this line of work really should be handled in block core > because I cannot see any reason why MD or bcache or whatever bio-based > device wouldn't want the ability to better support io_uring (with IO > poll). > >> Another design I can come up with is to maintain a global data >> structure for the very beginning >> original bio. Currently the blocking point is that now one original >> bio to the dm device (@bio of dm_submit()) can correspond to multiple >> dm_io and thus we have nowhere to place the @cookies list. > Yes, and that will always be the case. We need the design to handle an > arbitrary sprawl of splitting from a given bio. The graph of bios > resulting from that fan-out needs to be walked at various levels -- be > it the top-level original bio's submit_bio() returned cookie or some > intermediate point in the chain of bios. > > The problem is the lifetime of the data structure created for a given > split bio versus layering boundaries (that come from block core's > simplistic recursion via bio using submit_bio). > >> Now we have to maintain one data structure for every original bio, >> something like >> >> ``` >> >> struct dm_poll_instance { >> >> ... >> >> struct list_head cookies; >> >> }; >> >> ``` > I do think we need a hybrid where at the point of recursion we're able > to make the associated data structure available across the recursion > boundary so that modeling the association in a chain of split bios is > possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it, > _but_ that struct definition would live in block core, but would be part > of per-bio-data; so 'struct blk_poll_data' is more logical name when > elevated to block core). > > It _might_ be worthwhile to see if a new BIO_ flag could be added to > allow augmenting the bio_split + bio_chain pattern to also track this > additional case of carrying additional data per-bio while creating > bio-chains. I may not be clear yet, said differently: augmenting > bio_chain to not only chain bios, but to _also_ thread/chain together > per-bio-data that lives within those chained bios. SO you have the > chain of bios _and_ the chain of potentially opaque void * that happens > to point to a list head for a list of 'struct blk_poll_data'. > > Does that make sense? I'm doubted if it really makes sense to maintain a *complete* bio chain across the recursive call boundary. Considering the following device stack: ``` dm0 dm1 dm2 dm3 nvme0 nvme1 .... .... ``` We have the following bio graph (please let me know if it's wrong or the image can't display) For example, for dm1 there are three bios containing valid cookie in the end, that is bio 9/10/11. We only need to insert the corresponding blk_poll_data (containing request_queue, cookie, etc) of these three bios into the very beginning original bio (that is bio0). Of course we can track all the sub-bios down through the device stack, layer by layer, e.g., - get bio 1/2/3 from bio 0 - get bio 4 from bio 3 - finally get bio 9 from bio 4 But I'm doubted if it's overkill to just implement the iopoll. Another issue still unclear is that, if we should implement the iopoll in a recursive way. In a recursive way, to poll dm 0, we should poll all its sub-devices, that is, bio 4/5/7/8. Oppositely we can insert only the bottom bio (bio 9/10/11 which have valid cookie) at the very beginning (at submit_bio() phase), and to poll dm 0, we only need to poll bio 9/10/11. To implement this non-recursive design, we can add a field in struct bio ``` struct bio { ... struct bio *orig; } ``` @orig points to the original bio inputted into submit_bio(), that is, bio 0. @orig field is transmitted through bio splitting. ``` bio_split() split->orig = bio->orig ? : bio dm.c: __clone_and_map_data_bio clone->orig = bio->orig ? : bio ``` Finally bio 9/10/11 can be inserted into bio 0. ``` blk-mq.c: blk_mq_submit_bio if (bio->orig) init blk_poll_data and insert it into bio->orig's @cookies list ``` > >> We can transfer this dm_poll_instance between split bios by >> bio->bi_private, like >> >> ``` >> >> dm_submit_bio(...) { >> >> struct dm_poll_instance *ins; >> >> if (bio->bi_private) >> >> ins = bio->bi_private; >> >> else { >> >> ins = alloc_poll_instance(); >> >> bio->bi_private = ins; >> >> } >> >> ... >> >> } >> >> ``` > Sadly, we cannot (ab)use bi_private for this given its (ab)used via the > bio_chain() interface. It's almost like we need to add a new pointer in > the bio that isn't left for block core to hijack. > > There is the well-worn pattern of saving off the original bi_private, > hooking a new endio method and then when that endio is called restoring > bi_private but we really want to avoid excessive indirect function calls > for this usecase. The entire point of implementing blk_poll support is > for performance after all. > > Mike > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, Nov 05 2020 at 9:51pm -0500, JeffleXu <jefflexu@linux.alibaba.com> wrote: > > On 11/4/20 11:08 PM, Mike Snitzer wrote: > >>I'm doubted if this should be implemented in block layer like: > >> > >>``` > >> > >>struct bio { > >> > >> ... > >> > >> struct list_head cookies; > >> > >>}; > >> > >>``` > >> > >>After all it's only used by bio-based queue, or more specifically > >>only dm device currently. > >I do think this line of work really should be handled in block core > >because I cannot see any reason why MD or bcache or whatever bio-based > >device wouldn't want the ability to better support io_uring (with IO > >poll). > > > >>Another design I can come up with is to maintain a global data > >>structure for the very beginning > >>original bio. Currently the blocking point is that now one original > >>bio to the dm device (@bio of dm_submit()) can correspond to multiple > >>dm_io and thus we have nowhere to place the @cookies list. > >Yes, and that will always be the case. We need the design to handle an > >arbitrary sprawl of splitting from a given bio. The graph of bios > >resulting from that fan-out needs to be walked at various levels -- be > >it the top-level original bio's submit_bio() returned cookie or some > >intermediate point in the chain of bios. > > > >The problem is the lifetime of the data structure created for a given > >split bio versus layering boundaries (that come from block core's > >simplistic recursion via bio using submit_bio). > > > >>Now we have to maintain one data structure for every original bio, > >>something like > >> > >>``` > >> > >>struct dm_poll_instance { > >> > >> ... > >> > >> struct list_head cookies; > >> > >>}; > >> > >>``` > >I do think we need a hybrid where at the point of recursion we're able > >to make the associated data structure available across the recursion > >boundary so that modeling the association in a chain of split bios is > >possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it, > >_but_ that struct definition would live in block core, but would be part > >of per-bio-data; so 'struct blk_poll_data' is more logical name when > >elevated to block core). > > > >It _might_ be worthwhile to see if a new BIO_ flag could be added to > >allow augmenting the bio_split + bio_chain pattern to also track this > >additional case of carrying additional data per-bio while creating > >bio-chains. I may not be clear yet, said differently: augmenting > >bio_chain to not only chain bios, but to _also_ thread/chain together > >per-bio-data that lives within those chained bios. SO you have the > >chain of bios _and_ the chain of potentially opaque void * that happens > >to point to a list head for a list of 'struct blk_poll_data'. > > > >Does that make sense? > > > I'm doubted if it really makes sense to maintain a *complete* bio > chain across the recursive > > call boundary. > > > Considering the following device stack: > > ``` > > dm0 > > dm1 dm2 dm3 > > nvme0 nvme1 .... .... > > ``` > > > We have the following bio graph (please let me know if it's wrong or > the image can't display) > > > For example, for dm1 there are three bios containing valid cookie in > the end, that is > > bio 9/10/11. We only need to insert the corresponding blk_poll_data > (containing > > request_queue, cookie, etc) of these three bios into the very > beginning original > > bio (that is bio0). Of course we can track all the sub-bios down > through the device > > stack, layer by layer, e.g., > > - get bio 1/2/3 from bio 0 > > - get bio 4 from bio 3 > > - finally get bio 9 from bio 4 > > But I'm doubted if it's overkill to just implement the iopoll. > > > Another issue still unclear is that, if we should implement the > iopoll in a recursive way. > > In a recursive way, to poll dm 0, we should poll all its > sub-devices, that is, bio 4/5/7/8. > > Oppositely we can insert only the bottom bio (bio 9/10/11 which have > valid cookie) at > > the very beginning (at submit_bio() phase), and to poll dm 0, we > only need to poll bio 9/10/11. I feel we need the ability to walk the entire graph and call down to next level. The lowest level would be what you call a "valid cookie" that blk-mq returned. But the preceding cookies need to be made valid and used when walking the graph (from highest to lowest) and calling down to the underlying layers. > > > To implement this non-recursive design, we can add a field in struct bio > > ``` > > struct bio { > > ... > > struct bio *orig; > > } > > ``` > > @orig points to the original bio inputted into submit_bio(), that is, bio 0. > > > @orig field is transmitted through bio splitting. > > ``` > > bio_split() > > split->orig = bio->orig ? : bio > > > dm.c: __clone_and_map_data_bio > > clone->orig = bio->orig ? : bio > > ``` > > > Finally bio 9/10/11 can be inserted into bio 0. > > ``` > > blk-mq.c: blk_mq_submit_bio > > if (bio->orig) > > init blk_poll_data and insert it into bio->orig's @cookies list > > ``` If you feel that is doable: certainly give it a shot. But it is not clear to me how you intend to translate from cookie passed in to ->blk_poll hook (returned from submit_bio) to the saved off bio->orig. What is your cookie strategy in this non-recursive implementation? What will you be returning? Where will you be storing it? Mike
On 11/7/20 1:45 AM, Mike Snitzer wrote: > On Thu, Nov 05 2020 at 9:51pm -0500, > JeffleXu <jefflexu@linux.alibaba.com> wrote: > >> On 11/4/20 11:08 PM, Mike Snitzer wrote: >>>> I'm doubted if this should be implemented in block layer like: >>>> >>>> ``` >>>> >>>> struct bio { >>>> >>>> ... >>>> >>>> struct list_head cookies; >>>> >>>> }; >>>> >>>> ``` >>>> >>>> After all it's only used by bio-based queue, or more specifically >>>> only dm device currently. >>> I do think this line of work really should be handled in block core >>> because I cannot see any reason why MD or bcache or whatever bio-based >>> device wouldn't want the ability to better support io_uring (with IO >>> poll). >>> >>>> Another design I can come up with is to maintain a global data >>>> structure for the very beginning >>>> original bio. Currently the blocking point is that now one original >>>> bio to the dm device (@bio of dm_submit()) can correspond to multiple >>>> dm_io and thus we have nowhere to place the @cookies list. >>> Yes, and that will always be the case. We need the design to handle an >>> arbitrary sprawl of splitting from a given bio. The graph of bios >>> resulting from that fan-out needs to be walked at various levels -- be >>> it the top-level original bio's submit_bio() returned cookie or some >>> intermediate point in the chain of bios. >>> >>> The problem is the lifetime of the data structure created for a given >>> split bio versus layering boundaries (that come from block core's >>> simplistic recursion via bio using submit_bio). >>> >>>> Now we have to maintain one data structure for every original bio, >>>> something like >>>> >>>> ``` >>>> >>>> struct dm_poll_instance { >>>> >>>> ... >>>> >>>> struct list_head cookies; >>>> >>>> }; >>>> >>>> ``` >>> I do think we need a hybrid where at the point of recursion we're able >>> to make the associated data structure available across the recursion >>> boundary so that modeling the association in a chain of split bios is >>> possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it, >>> _but_ that struct definition would live in block core, but would be part >>> of per-bio-data; so 'struct blk_poll_data' is more logical name when >>> elevated to block core). >>> >>> It _might_ be worthwhile to see if a new BIO_ flag could be added to >>> allow augmenting the bio_split + bio_chain pattern to also track this >>> additional case of carrying additional data per-bio while creating >>> bio-chains. I may not be clear yet, said differently: augmenting >>> bio_chain to not only chain bios, but to _also_ thread/chain together >>> per-bio-data that lives within those chained bios. SO you have the >>> chain of bios _and_ the chain of potentially opaque void * that happens >>> to point to a list head for a list of 'struct blk_poll_data'. >>> >>> Does that make sense? >> >> I'm doubted if it really makes sense to maintain a *complete* bio >> chain across the recursive >> >> call boundary. >> >> >> Considering the following device stack: >> >> ``` >> >> dm0 >> >> dm1 dm2 dm3 >> >> nvme0 nvme1 .... .... >> >> ``` >> >> >> We have the following bio graph (please let me know if it's wrong or >> the image can't display) >> >> >> For example, for dm1 there are three bios containing valid cookie in >> the end, that is >> >> bio 9/10/11. We only need to insert the corresponding blk_poll_data >> (containing >> >> request_queue, cookie, etc) of these three bios into the very >> beginning original >> >> bio (that is bio0). Of course we can track all the sub-bios down >> through the device >> >> stack, layer by layer, e.g., >> >> - get bio 1/2/3 from bio 0 >> >> - get bio 4 from bio 3 >> >> - finally get bio 9 from bio 4 >> >> But I'm doubted if it's overkill to just implement the iopoll. >> >> >> Another issue still unclear is that, if we should implement the >> iopoll in a recursive way. >> >> In a recursive way, to poll dm 0, we should poll all its >> sub-devices, that is, bio 4/5/7/8. >> >> Oppositely we can insert only the bottom bio (bio 9/10/11 which have >> valid cookie) at >> >> the very beginning (at submit_bio() phase), and to poll dm 0, we >> only need to poll bio 9/10/11. > I feel we need the ability to walk the entire graph and call down to > next level. The lowest level would be what you call a "valid cookie" > that blk-mq returned. But the preceding cookies need to be made valid > and used when walking the graph (from highest to lowest) and calling > down to the underlying layers. > >> >> To implement this non-recursive design, we can add a field in struct bio >> >> ``` >> >> struct bio { >> >> ... >> >> struct bio *orig; >> >> } >> >> ``` >> >> @orig points to the original bio inputted into submit_bio(), that is, bio 0. >> >> >> @orig field is transmitted through bio splitting. >> >> ``` >> >> bio_split() >> >> split->orig = bio->orig ? : bio >> >> >> dm.c: __clone_and_map_data_bio >> >> clone->orig = bio->orig ? : bio >> >> ``` >> >> >> Finally bio 9/10/11 can be inserted into bio 0. >> >> ``` >> >> blk-mq.c: blk_mq_submit_bio >> >> if (bio->orig) >> >> init blk_poll_data and insert it into bio->orig's @cookies list >> >> ``` > If you feel that is doable: certainly give it a shot. Make sense. > But it is not clear to me how you intend to translate from cookie passed > in to ->blk_poll hook (returned from submit_bio) to the saved off > bio->orig. > > What is your cookie strategy in this non-recursive implementation? What > will you be returning? Where will you be storing it? Actually I think it's a common issue to design the cookie returned by submit_bio() whenever it's implemented in a recursive or non-recursive way. After all you need to translate the returned cookie to the original bio even if it's implemented in a recursive way as you described. Or how could you walk through the bio graph when the returned cookie is only 'unsigned int' type? How about this: ``` typedef uintptr_t blk_qc_t; ``` or something like union ``` typedef union { unsigned int cookie; struct bio *orig; // the original bio of submit_bio() } blk_qc_t; ``` When serving for blk-mq, the integer part of blk_qc_t is used and it stores the valid cookie, while it stores a pointer to the original bio when serving bio-based device. By the way, would you mind sharing your plan and progress on this work, I mean, supporting iopoll for dm device. To be honest, I don't want to re-invent the wheel as you have started on this work, but I do want to participate in somehow. Please let me know if there's something I could do here.
On Sat, Nov 07 2020 at 8:09pm -0500, JeffleXu <jefflexu@linux.alibaba.com> wrote: > > On 11/7/20 1:45 AM, Mike Snitzer wrote: > >On Thu, Nov 05 2020 at 9:51pm -0500, > >JeffleXu <jefflexu@linux.alibaba.com> wrote: > > > >>blk-mq.c: blk_mq_submit_bio > >> > >> if (bio->orig) > >> > >> init blk_poll_data and insert it into bio->orig's @cookies list > >> > >>``` > >If you feel that is doable: certainly give it a shot. > > Make sense. > > >But it is not clear to me how you intend to translate from cookie passed > >in to ->blk_poll hook (returned from submit_bio) to the saved off > >bio->orig. > > > >What is your cookie strategy in this non-recursive implementation? What > >will you be returning? Where will you be storing it? > > Actually I think it's a common issue to design the cookie returned > by submit_bio() whenever it's implemented in a recursive or > non-recursive way. After all you need to translate the returned cookie > to the original bio even if it's implemented in a recursive way as you > described. Yes. > Or how could you walk through the bio graph when the returned cookie > is only 'unsigned int' type? You could store a pointer (to orig bio, or per-bio-data stored in split clone, or whatever makes sense for how you're associating poll data with split bios) in 'unsigned int' type but that's very clumsy -- as I discovered when trying to do that ;) > How about this: > > > ``` > > typedef uintptr_t blk_qc_t; > > ``` > > > or something like union > > ``` > > typedef union { > > unsigned int cookie; > > struct bio *orig; // the original bio of submit_bio() > > } blk_qc_t; > > ``` > When serving for blk-mq, the integer part of blk_qc_t is used and it > stores the valid cookie, while it stores a pointer to the original bio > when serving bio-based device. Union looks ideal, but maybe make it a void *? Initial implementation may store bio pointer but it _could_ evolve to be 'struct blk_poll_data *' or whatever. But not a big deal either way. > By the way, would you mind sharing your plan and progress on this > work, I mean, supporting iopoll for dm device. To be honest, I don't > want to re-invent the wheel as you have started on this work, but I do > want to participate in somehow. Please let me know if there's something > I could do here. I thought I said as much before but: I really don't have anything meaningful to share. My early exploration was more rough pseudo code that served to try to get my arms around the scope of the design problem. Please feel free to own all aspects of this work. I will gladly help analyze/test/refine your approach once you reach the point of sharing RFC patches. Thanks, Mike
On 11/10/20 2:15 AM, Mike Snitzer wrote: > On Sat, Nov 07 2020 at 8:09pm -0500, > JeffleXu <jefflexu@linux.alibaba.com> wrote: > >> On 11/7/20 1:45 AM, Mike Snitzer wrote: >>> On Thu, Nov 05 2020 at 9:51pm -0500, >>> JeffleXu <jefflexu@linux.alibaba.com> wrote: >>> >>>> blk-mq.c: blk_mq_submit_bio >>>> >>>> if (bio->orig) >>>> >>>> init blk_poll_data and insert it into bio->orig's @cookies list >>>> >>>> ``` >>> If you feel that is doable: certainly give it a shot. >> Make sense. >> >>> But it is not clear to me how you intend to translate from cookie passed >>> in to ->blk_poll hook (returned from submit_bio) to the saved off >>> bio->orig. >>> >>> What is your cookie strategy in this non-recursive implementation? What >>> will you be returning? Where will you be storing it? >> Actually I think it's a common issue to design the cookie returned >> by submit_bio() whenever it's implemented in a recursive or >> non-recursive way. After all you need to translate the returned cookie >> to the original bio even if it's implemented in a recursive way as you >> described. > Yes. > >> Or how could you walk through the bio graph when the returned cookie >> is only 'unsigned int' type? > You could store a pointer (to orig bio, or per-bio-data stored in split > clone, or whatever makes sense for how you're associating poll data with > split bios) in 'unsigned int' type but that's very clumsy -- as I > discovered when trying to do that ;) Fine, that's also what I thought. > >> How about this: >> >> >> ``` >> >> typedef uintptr_t blk_qc_t; >> >> ``` >> >> >> or something like union >> >> ``` >> >> typedef union { >> >> unsigned int cookie; >> >> struct bio *orig; // the original bio of submit_bio() >> >> } blk_qc_t; >> >> ``` >> When serving for blk-mq, the integer part of blk_qc_t is used and it >> stores the valid cookie, while it stores a pointer to the original bio >> when serving bio-based device. > Union looks ideal, but maybe make it a void *? Initial implementation > may store bio pointer but it _could_ evolve to be 'struct blk_poll_data > *' or whatever. But not a big deal either way. Of course you could define blk_qc_t as a pointer type (e.g. void *), or integer type (e.g. unsigned int), but you will get a gcc warning in each case. For example, if it's defined as "void *", then gcc will warn 'return makes pointer from integer without a cast' in request_to_qc_t() as cookie returned by mq device is actually integer type. Vice versa. So we need a type cast in request_to_qc_t(). The union is also not perfect though, as we also need type cast. So both these two designs are quite equal to me, though 'void *' may be more concise though. But one annoying issue is that the request_to_qc_t() and blk_poll() should be revised somehow if it's defined as a union or 'void *'. For example if it's defined as 'void *', then in request_to_qc_t() integer need to be cast to pointer and in blk_poll() pointer need to be cast to integer. The benefit of uintptr_t is that, it's still integer type which means the original request_to_qc_t()/ blk_poll() routine for blk-mq can retain unchanged, while the size of the data type can be large enough to contain a pointer type. So we only need type cast in bio-based routine, while keeping the request-based routine unchanged. And yes it's a trivial issue though. >> By the way, would you mind sharing your plan and progress on this >> work, I mean, supporting iopoll for dm device. To be honest, I don't >> want to re-invent the wheel as you have started on this work, but I do >> want to participate in somehow. Please let me know if there's something >> I could do here. > I thought I said as much before but: I really don't have anything > meaningful to share. My early exploration was more rough pseudo code > that served to try to get my arms around the scope of the design > problem. > > Please feel free to own all aspects of this work. > > I will gladly help analyze/test/refine your approach once you reach the > point of sharing RFC patches. Got it. Thanks for that. Really. I will continue working on this.