Message ID | 20210125121340.70459-1-jefflexu@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | dm: support IO polling for bio-based dm device | expand |
On Mon, Jan 25 2021 at 7:13am -0500, Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > Since currently we have no simple but efficient way to implement the > bio-based IO polling in the split-bio tracking style, this patch set > turns to the original implementation mechanism that iterates and > polls all underlying hw queues in polling mode. One optimization is > introduced to mitigate the race of one hw queue among multiple polling > instances. > > I'm still open to the split bio tracking mechanism, if there's > reasonable way to implement it. > > > [Performance Test] > The performance is tested by fio (engine=io_uring) 4k randread on > dm-linear device. The dm-linear device is built upon nvme devices, > and every nvme device has one polling hw queue (nvme.poll_queues=1). > > Test Case | IOPS in IRQ mode | IOPS in polling mode | Diff > | (hipri=0) | (hipri=1) | > --------------------------- | ---------------- | -------------------- | ---- > 3 target nvme, num_jobs = 1 | 198k | 276k | ~40% > 3 target nvme, num_jobs = 3 | 608k | 705k | ~16% > 6 target nvme, num_jobs = 6 | 1197k | 1347k | ~13% > 3 target nvme, num_jobs = 6 | 1285k | 1293k | ~0% > > As the number of polling instances (num_jobs) increases, the > performance improvement decreases, though it's still positive > compared to the IRQ mode. I think there is serious room for improvement for DM's implementation; but the block changes for this are all we'd need for DM in the longrun anyway (famous last words). So on a block interface level I'm OK with block patches 1-3. I don't see why patch 5 is needed (said the same in reply to it; but I just saw your reason below..). Anyway, I can pick up DM patches 4 and 6 via linux-dm.git if Jens picks up patches 1-3. Jens, what do you think? > [Optimization] > To mitigate the race when iterating all the underlying hw queues, one > flag is maintained on a per-hw-queue basis. This flag is used to > indicate whether this polling hw queue currently being polled on or > not. Every polling hw queue is exclusive to one polling instance, i.e., > the polling instance will skip this polling hw queue if this hw queue > currently is being polled by another polling instance, and start > polling on the next hw queue. > > This per-hw-queue flag map is currently maintained in dm layer. In > the table load phase, a table describing all underlying polling hw > queues is built and stored in 'struct dm_table'. It is safe when > reloading the mapping table. > > > changes since v1: > - patch 1,2,4 is the same as v1 and have already been reviewed > - patch 3 is refactored a bit on the basis of suggestions from > Mike Snitzer. > - patch 5 is newly added and introduces one new queue flag > representing if the queue is capable of IO polling. This mainly > simplifies the logic in queue_poll_store(). Ah OK, don't see why we want to eat a queue flag for that though! > - patch 6 implements the core mechanism supporting IO polling. > The sanity check checking if the dm device supports IO polling is > also folded into this patch, and the queue flag will be cleared if > it doesn't support, in case of table reloading. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 1/28/21 1:19 AM, Mike Snitzer wrote: > On Mon, Jan 25 2021 at 7:13am -0500, > Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > >> Since currently we have no simple but efficient way to implement the >> bio-based IO polling in the split-bio tracking style, this patch set >> turns to the original implementation mechanism that iterates and >> polls all underlying hw queues in polling mode. One optimization is >> introduced to mitigate the race of one hw queue among multiple polling >> instances. >> >> I'm still open to the split bio tracking mechanism, if there's >> reasonable way to implement it. >> >> >> [Performance Test] >> The performance is tested by fio (engine=io_uring) 4k randread on >> dm-linear device. The dm-linear device is built upon nvme devices, >> and every nvme device has one polling hw queue (nvme.poll_queues=1). >> >> Test Case | IOPS in IRQ mode | IOPS in polling mode | Diff >> | (hipri=0) | (hipri=1) | >> --------------------------- | ---------------- | -------------------- | ---- >> 3 target nvme, num_jobs = 1 | 198k | 276k | ~40% >> 3 target nvme, num_jobs = 3 | 608k | 705k | ~16% >> 6 target nvme, num_jobs = 6 | 1197k | 1347k | ~13% >> 3 target nvme, num_jobs = 6 | 1285k | 1293k | ~0% >> >> As the number of polling instances (num_jobs) increases, the >> performance improvement decreases, though it's still positive >> compared to the IRQ mode. > > I think there is serious room for improvement for DM's implementation; > but the block changes for this are all we'd need for DM in the longrun > anyway (famous last words). Agreed. > So on a block interface level I'm OK with > block patches 1-3. > > I don't see why patch 5 is needed (said the same in reply to it; but I > just saw your reason below..). > > Anyway, I can pick up DM patches 4 and 6 via linux-dm.git if Jens picks > up patches 1-3. Jens, what do you think? cc Jens. Also I will send a new version later, maybe some refactor on patch5 and some typo modifications. > >> [Optimization] >> To mitigate the race when iterating all the underlying hw queues, one >> flag is maintained on a per-hw-queue basis. This flag is used to >> indicate whether this polling hw queue currently being polled on or >> not. Every polling hw queue is exclusive to one polling instance, i.e., >> the polling instance will skip this polling hw queue if this hw queue >> currently is being polled by another polling instance, and start >> polling on the next hw queue. >> >> This per-hw-queue flag map is currently maintained in dm layer. In >> the table load phase, a table describing all underlying polling hw >> queues is built and stored in 'struct dm_table'. It is safe when >> reloading the mapping table. >> >> >> changes since v1: >> - patch 1,2,4 is the same as v1 and have already been reviewed >> - patch 3 is refactored a bit on the basis of suggestions from >> Mike Snitzer. >> - patch 5 is newly added and introduces one new queue flag >> representing if the queue is capable of IO polling. This mainly >> simplifies the logic in queue_poll_store(). > > Ah OK, don't see why we want to eat a queue flag for that though! > >> - patch 6 implements the core mechanism supporting IO polling. >> The sanity check checking if the dm device supports IO polling is >> also folded into this patch, and the queue flag will be cleared if >> it doesn't support, in case of table reloading. > > Thanks, > Mike >
On Wed, Jan 27 2021 at 10:06pm -0500, JeffleXu <jefflexu@linux.alibaba.com> wrote: > > > On 1/28/21 1:19 AM, Mike Snitzer wrote: > > On Mon, Jan 25 2021 at 7:13am -0500, > > Jeffle Xu <jefflexu@linux.alibaba.com> wrote: > > > >> Since currently we have no simple but efficient way to implement the > >> bio-based IO polling in the split-bio tracking style, this patch set > >> turns to the original implementation mechanism that iterates and > >> polls all underlying hw queues in polling mode. One optimization is > >> introduced to mitigate the race of one hw queue among multiple polling > >> instances. > >> > >> I'm still open to the split bio tracking mechanism, if there's > >> reasonable way to implement it. > >> > >> > >> [Performance Test] > >> The performance is tested by fio (engine=io_uring) 4k randread on > >> dm-linear device. The dm-linear device is built upon nvme devices, > >> and every nvme device has one polling hw queue (nvme.poll_queues=1). > >> > >> Test Case | IOPS in IRQ mode | IOPS in polling mode | Diff > >> | (hipri=0) | (hipri=1) | > >> --------------------------- | ---------------- | -------------------- | ---- > >> 3 target nvme, num_jobs = 1 | 198k | 276k | ~40% > >> 3 target nvme, num_jobs = 3 | 608k | 705k | ~16% > >> 6 target nvme, num_jobs = 6 | 1197k | 1347k | ~13% > >> 3 target nvme, num_jobs = 6 | 1285k | 1293k | ~0% > >> > >> As the number of polling instances (num_jobs) increases, the > >> performance improvement decreases, though it's still positive > >> compared to the IRQ mode. > > > > I think there is serious room for improvement for DM's implementation; > > but the block changes for this are all we'd need for DM in the longrun > > anyway (famous last words). > > Agreed. > > > > So on a block interface level I'm OK with > > block patches 1-3. > > > > I don't see why patch 5 is needed (said the same in reply to it; but I > > just saw your reason below..). > > > > Anyway, I can pick up DM patches 4 and 6 via linux-dm.git if Jens picks > > up patches 1-3. Jens, what do you think? > > cc Jens. > > Also I will send a new version later, maybe some refactor on patch5 and > some typo modifications. Thinking further, there is no benefit to Jens picking up the block core changes until the DM changes are ready. While I think the refactoring to expose the blk_poll (in patch 3) that supports blk-mq and bio-based is reasonable -- Christoph correctly points out there is extra branching that blk-mq must tolerate as implemented. So even that needs followup work as suggested here: https://www.redhat.com/archives/dm-devel/2021-January/msg00397.html Also, your followup about oversights in the the latest bio-based DM io polling implementation speaks to all of this needing more time: https://www.redhat.com/archives/dm-devel/2021-January/msg00436.html You advocating going back to what is effectively the first RFC patchset you proposed (with its underwhelming bio-based polling performance) isn't a strong indication these changes are ready, or that we even have a patch forward for how to make bio-based IO polling be worthwhile. So: I retract my question to Jens about whether he'd pick up the block core changes (while I think those are close, the corresponding DM changes aren't). Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel