Message ID | 20181211233519.9350-1-sagi@grimberg.me (mailing list archive) |
---|---|
Headers | show |
Series | implement nvmf read/write queue maps | expand |
On 12/11/2018 3:35 PM, Sagi Grimberg wrote: > This set implements read/write queue maps to nvmf (implemented in tcp > and rdma). We basically allow the users to pass in nr_write_queues > argument that will basically maps a separate set of queues to host > write I/O (or more correctly non-read I/O) and a set of queues to > hold read I/O (which is now controlled by the known nr_io_queues). > > A patchset that restores nvme-rdma polling is in the pipe. > The polling is less trivial because: > 1. we can find non I/O completions in the cq (i.e. memreg) > 2. we need to start with non-polling for a sane connect and > then switch to polling which is not trivial behind the > cq API we use. > > Note that read/write separation for rdma but especially tcp this can be > very clear win as we minimize the risk for head-of-queue blocking for > mixed workloads over a single tcp byte stream. Sagi, What other wins are there for this split ? I'm considering whether its worthwhile for fc as well, but the hol issue doesn't exist with fc. What else is being resolved ? -- james
> Sagi, > > What other wins are there for this split ? > > I'm considering whether its worthwhile for fc as well, but the hol issue > doesn't exist with fc. What else is being resolved ? I've pondered it myself, which is why I didn't add it to fc as well (would have been easy enough I think). I guess that with this the user can limit writes compared to read by assigning less queues as jens suggested in his patches...
On 12/12/2018 9:58 AM, Sagi Grimberg wrote: > >> Sagi, >> >> What other wins are there for this split ? >> >> I'm considering whether its worthwhile for fc as well, but the hol >> issue doesn't exist with fc. What else is being resolved ? > > I've pondered it myself, which is why I didn't add it to fc as well > (would have been easy enough I think). I guess that with this the > user can limit writes compared to read by assigning less queues as > jens suggested in his patches... not enough to interest me. I also would prefer to not require more queues - it only increases the oversubscription requirements on the target, which is already a major issue on real shared subsystems. Thanks -- james
On 12/12/18 6:58 PM, Sagi Grimberg wrote: > >> Sagi, >> >> What other wins are there for this split ? >> >> I'm considering whether its worthwhile for fc as well, but the hol >> issue doesn't exist with fc. What else is being resolved ? > > I've pondered it myself, which is why I didn't add it to fc as well > (would have been easy enough I think). I guess that with this the > user can limit writes compared to read by assigning less queues as > jens suggested in his patches... > Weelll ... isn't that what blkcg is for? I do understand if one would want to implement something like this if there's a real benefit from the _hardware_ side, but if there isn't we should try to keep it in the upper layers where it belongs. Cheers, Hannes
On 12/13/18 5:04 AM, Hannes Reinecke wrote: > On 12/12/18 6:58 PM, Sagi Grimberg wrote: >> >>> Sagi, >>> >>> What other wins are there for this split ? >>> >>> I'm considering whether its worthwhile for fc as well, but the hol >>> issue doesn't exist with fc. What else is being resolved ? >> >> I've pondered it myself, which is why I didn't add it to fc as well >> (would have been easy enough I think). I guess that with this the >> user can limit writes compared to read by assigning less queues as >> jens suggested in his patches... > > > Weelll ... isn't that what blkcg is for? > I do understand if one would want to implement something like this if > there's a real benefit from the _hardware_ side, but if there isn't we > should try to keep it in the upper layers where it belongs. The point is that nvme round robins queues, if you have 1 write queue for N read queues, then you would get better read servicing as writes can't flood more than one queue. Obviously this isn't a replacement for any kind of real throttling, but it's cheap and easy to do. The multiple queue maps are more useful for the polled IO, and for future prioritized IO in general. But it is also handy for not oversubscribing the write side.
On 12/13/18 3:02 PM, Jens Axboe wrote: > On 12/13/18 5:04 AM, Hannes Reinecke wrote: >> On 12/12/18 6:58 PM, Sagi Grimberg wrote: >>> >>>> Sagi, >>>> >>>> What other wins are there for this split ? >>>> >>>> I'm considering whether its worthwhile for fc as well, but the hol >>>> issue doesn't exist with fc. What else is being resolved ? >>> >>> I've pondered it myself, which is why I didn't add it to fc as well >>> (would have been easy enough I think). I guess that with this the >>> user can limit writes compared to read by assigning less queues as >>> jens suggested in his patches... >>> >> Weelll ... isn't that what blkcg is for? >> I do understand if one would want to implement something like this if >> there's a real benefit from the _hardware_ side, but if there isn't we >> should try to keep it in the upper layers where it belongs. > > The point is that nvme round robins queues, if you have 1 write queue > for N read queues, then you would get better read servicing as writes > can't flood more than one queue. > > Obviously this isn't a replacement for any kind of real throttling, > but it's cheap and easy to do. > > The multiple queue maps are more useful for the polled IO, and for > future prioritized IO in general. But it is also handy for not > oversubscribing the write side. > But how can you tell? It's not that the hardware tells us "Oh, incidentally, I'm far slower writing than reading, please use only so many queues for writing.". So any number we pick would be pretty heuristic to start with. Cheers, Hannes