Message ID | 20201204191356.2516405-3-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: Don't complete in IRQ, use llist_head | expand |
On 12/4/20 12:13 PM, Sebastian Andrzej Siewior wrote: > Controllers with multiple queues have their IRQ-handelers pinned to a > CPU. The core shouldn't need to complete the request on a remote CPU. > > Remove this case and always raise the softirq to complete the request. I don't like this one at all, it'll add a softirq jump for the fast path for eg nvme devices. Did you run any performance testing with this? I can give it a spin, will do so anyway, but was curious if anything but "this still works" testing was done.
On 2020-12-07 16:52:57 [-0700], Jens Axboe wrote: > On 12/4/20 12:13 PM, Sebastian Andrzej Siewior wrote: > > Controllers with multiple queues have their IRQ-handelers pinned to a > > CPU. The core shouldn't need to complete the request on a remote CPU. > > > > Remove this case and always raise the softirq to complete the request. > > I don't like this one at all, it'll add a softirq jump for the fast path > for eg nvme devices. Did you run any performance testing with this? I > can give it a spin, will do so anyway, but was curious if anything but > "this still works" testing was done. I understood that the nvme devices have a queue per CPU and don't need to complete on a remote-CPU in general. Also, they don't jump to softirq but invoke softirq on the return from IRQ. So if softirq is already busy because of network then softirq of the block layer is delayed. Otherwise there should be no significant delay if the CPU is idle (so the IRQ is handled and softirq right after it). Sagi mentioned nvme-tcp as a user of this remote completion and Daniel has been kind to run some nvme-tcp tests. > -- > Jens Axboe > Sebastian
On Tue, Dec 08, 2020 at 09:22:20AM +0100, Sebastian Andrzej Siewior wrote: > Sagi mentioned nvme-tcp as a user of this remote completion and Daniel > has been kind to run some nvme-tcp tests. I've started with some benchmarking. The first thing I tried is to find a setup where the remote path is taken. I found a setup with nvme-fc with a workload which results in ca 10% remote completion. Setup: - NVMe over FC - 2x Emulex LPe36000 32Gb PCIe Fibre Channel Adapter - 8 mpaths - 4 E7-4820 v3, 80 cores Workload: - fio --rw=randwrite --name=test --size=50M \ --iodepth=32 --direct=0 --bs=4k --numjobs=40 \ --time_based --runtime=1h --ioengine=libaio \ --group_reporting (I played a bit around with different workloads, most of them wont use the remote completion path) I've annotated the code with a counter and exported it via debugfs. @@ -671,6 +673,8 @@ bool blk_mq_complete_request_remote(struct request *rq) return false; if (blk_mq_complete_need_ipi(rq)) { + ctx->rq_remote++; + rq->csd.func = __blk_mq_complete_request_remote; rq->csd.info = rq; rq->csd.flags = 0; And hacked a small script to collect the data. - Baseline (5.10-rc7) Starting 40 processes Jobs: 40 (f=40): [w(40)][100.0%][r=0KiB/s,w=12.0GiB/s][r=0,w=3405k IOPS][eta 00m:00s] test: (groupid=0, jobs=40): err= 0: pid=14225: Mon Dec 7 20:09:57 2020 write: IOPS=3345k, BW=12.8GiB/s (13.7GB/s)(44.9TiB/3600002msec) slat (usec): min=2, max=90772, avg= 9.43, stdev=10.67 clat (usec): min=2, max=91343, avg=371.79, stdev=119.52 lat (usec): min=5, max=91358, avg=381.31, stdev=122.45 clat percentiles (usec): | 1.00th=[ 231], 5.00th=[ 245], 10.00th=[ 253], 20.00th=[ 273], | 30.00th=[ 293], 40.00th=[ 322], 50.00th=[ 351], 60.00th=[ 388], | 70.00th=[ 420], 80.00th=[ 465], 90.00th=[ 529], 95.00th=[ 570], | 99.00th=[ 644], 99.50th=[ 676], 99.90th=[ 750], 99.95th=[ 783], | 99.99th=[ 873] bw ( KiB/s): min=107333, max=749152, per=2.51%, avg=335200.07, stdev=87628.57, samples=288000 iops : min=26833, max=187286, avg=83799.66, stdev=21907.09, samples=288000 lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01% lat (usec) : 250=8.04%, 500=6.79%, 750=13.75%, 1000=0.09% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01% cpu : usr=29.14%, sys=70.83%, ctx=320219, majf=0, minf=13056 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=28.7%, >=64=0.0% submit : 0=0.0%, 4=28.7%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=28.7%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,12042333583,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=12.8GiB/s (13.7GB/s), 12.8GiB/s-12.8GiB/s (13.7GB/s-13.7GB/s), io=44.9TiB (49.3TB), run=3600002-3600002msec Disk stats (read/write): nvme5n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% - Patched Jobs: 40 (f=40): [w(40)][100.0%][r=0KiB/s,w=12.9GiB/s][r=0,w=3383k IOPS][eta 00m:00s] test: (groupid=0, jobs=40): err= 0: pid=13413: Mon Dec 7 21:31:01 2020 write: IOPS=3371k, BW=12.9GiB/s (13.8GB/s)(45.2TiB/3600004msec) slat (nsec): min=1984, max=90341k, avg=9308.73, stdev=7068.58 clat (usec): min=2, max=91259, avg=368.94, stdev=118.31 lat (usec): min=5, max=91269, avg=378.34, stdev=121.43 clat percentiles (usec): | 1.00th=[ 231], 5.00th=[ 245], 10.00th=[ 255], 20.00th=[ 277], | 30.00th=[ 302], 40.00th=[ 318], 50.00th=[ 334], 60.00th=[ 359], | 70.00th=[ 392], 80.00th=[ 433], 90.00th=[ 562], 95.00th=[ 635], | 99.00th=[ 693], 99.50th=[ 709], 99.90th=[ 766], 99.95th=[ 816], | 99.99th=[ 914] bw ( KiB/s): min=124304, max=770204, per=2.50%, avg=337559.07, stdev=91383.66, samples=287995 iops : min=31076, max=192551, avg=84389.45, stdev=22845.91, samples=287995 lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01% lat (usec) : 250=7.44%, 500=7.84%, 750=13.79%, 1000=0.15% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.01% cpu : usr=30.30%, sys=69.69%, ctx=179950, majf=0, minf=7403 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=29.2%, >=64=0.0% submit : 0=0.0%, 4=29.2%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=29.2%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,12135617715,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=12.9GiB/s (13.8GB/s), 12.9GiB/s-12.9GiB/s (13.8GB/s-13.8GB/s), io=45.2TiB (49.7TB), run=3600004-3600004msec Disk stats (read/write): nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% - Baseline nvme5c0n1 completed 411218 remote 38777 9.43% nvme5c0n2 completed 0 remote 0 0.00% nvme5c1n1 completed 411270 remote 38770 9.43% nvme5c1n2 completed 50 remote 0 0.00% nvme5c2n1 completed 0 remote 0 0.00% nvme5c2n2 completed 0 remote 0 0.00% nvme5c3n1 completed 411220 remote 38751 9.42% nvme5c3n2 completed 0 remote 0 0.00% nvme5c4n1 completed 0 remote 0 0.00% nvme5c4n2 completed 0 remote 0 0.00% nvme5c5n1 completed 0 remote 0 0.00% nvme5c5n2 completed 0 remote 0 0.00% nvme5c6n1 completed 411216 remote 38759 9.43% nvme5c6n2 completed 0 remote 0 0.00% nvme5c7n1 completed 0 remote 0 0.00% nvme5c7n2 completed 0 remote 0 0.00% - Patched nvme1c0n1 completed 0 remote 0 0.00% nvme1c0n2 completed 0 remote 0 0.00% nvme1c1n1 completed 172202 remote 17813 10.34% nvme1c1n2 completed 50 remote 0 0.00% nvme1c2n1 completed 172147 remote 17831 10.36% nvme1c2n2 completed 0 remote 0 0.00% nvme1c3n1 completed 0 remote 0 0.00% nvme1c3n2 completed 0 remote 0 0.00% nvme1c4n1 completed 172159 remote 17825 10.35% nvme1c4n2 completed 0 remote 0 0.00% nvme1c5n1 completed 0 remote 0 0.00% nvme1c5n2 completed 0 remote 0 0.00% nvme1c6n1 completed 0 remote 0 0.00% nvme1c6n2 completed 0 remote 0 0.00% nvme1c7n1 completed 172156 remote 17781 10.33% nvme1c7n2 completed 0 remote 0 0.00% It looks like the patched version show tiny bit better numbers for this workload. slat seems to be one of the major difference between the two runs. But that is the only thing I really spotted to be really off. I keep going with some more testing. Let what kind of tests you would also want to see. I'll do a few plain NVMe tests next. Thanks, Daniel
On Tue, Dec 08, 2020 at 09:44:10AM +0100, Daniel Wagner wrote: > On Tue, Dec 08, 2020 at 09:22:20AM +0100, Sebastian Andrzej Siewior wrote: > > Sagi mentioned nvme-tcp as a user of this remote completion and Daniel > > has been kind to run some nvme-tcp tests. > > I've started with some benchmarking. The first thing I tried is to find > a setup where the remote path is taken. I found a setup with nvme-fc > with a workload which results in ca 10% remote completion. Setup: - Dell Express Flash NVMe PM1725 800GB SFF - 2 Gold 6130, 64 cores Workload: - fio --rw=randread --name=test --filename=/dev/nvme0n1 \ --iodepth=64 --direct=1 --bs=4k --numjobs=32 \ --time_based --runtime=5m --ioengine=libaio --group_reporting (Searched for a workload with the highest IOPs which seems to be randread) Obvious in this configuration there are no remote completions (verified it). - baseline 5.10-rc7 Jobs: 32 (f=32): [r(32)][100.0%][r=2544MiB/s][r=651k IOPS][eta 00m:00s] test: (groupid=0, jobs=32): err= 0: pid=24118: Tue Dec 8 11:33:21 2020 read: IOPS=636k, BW=2485MiB/s (2605MB/s)(728GiB/300006msec) slat (nsec): min=1502, max=450956, avg=5576.99, stdev=1475.94 clat (usec): min=195, max=59296, avg=3212.51, stdev=1640.48 lat (usec): min=201, max=59302, avg=3218.23, stdev=1640.58 clat percentiles (usec): | 1.00th=[ 2573], 5.00th=[ 2671], 10.00th=[ 2769], 20.00th=[ 2868], | 30.00th=[ 2933], 40.00th=[ 2999], 50.00th=[ 3064], 60.00th=[ 3163], | 70.00th=[ 3261], 80.00th=[ 3359], 90.00th=[ 3589], 95.00th=[ 3818], | 99.00th=[ 4948], 99.50th=[ 5669], 99.90th=[40633], 99.95th=[44303], | 99.99th=[49021] bw ( MiB/s): min= 444, max= 2598, per=99.99%, avg=2484.33, stdev= 8.36, samples=19200 iops : min=113782, max=665312, avg=635988.04, stdev=2139.63, samples=19200 lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=96.85%, 10=2.96%, 20=0.01%, 50=0.16% lat (msec) : 100=0.01% cpu : usr=9.11%, sys=14.58%, ctx=131930047, majf=0, minf=28434 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=190817510,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=2485MiB/s (2605MB/s), 2485MiB/s-2485MiB/s (2605MB/s-2605MB/s), io=728GiB (782GB), run=300006-300006msec Disk stats (read/write): nvme0n1: ios=190707084/0, merge=0/0, ticks=611781701/0, in_queue=611781702, util=100.00% - patched Jobs: 32 (f=32): [r(32)][100.0%][r=2548MiB/s][r=652k IOPS][eta 00m:00s] test: (groupid=0, jobs=32): err= 0: pid=3059: Tue Dec 8 12:11:25 2020 read: IOPS=637k, BW=2489MiB/s (2610MB/s)(729GiB/300006msec) slat (nsec): min=1453, max=4793.6k, avg=5662.01, stdev=1960.75 clat (usec): min=77, max=59685, avg=3207.13, stdev=1633.85 lat (usec): min=82, max=59696, avg=3212.92, stdev=1633.95 clat percentiles (usec): | 1.00th=[ 2573], 5.00th=[ 2671], 10.00th=[ 2737], 20.00th=[ 2835], | 30.00th=[ 2933], 40.00th=[ 2999], 50.00th=[ 3064], 60.00th=[ 3163], | 70.00th=[ 3228], 80.00th=[ 3359], 90.00th=[ 3556], 95.00th=[ 3785], | 99.00th=[ 4948], 99.50th=[ 5669], 99.90th=[40633], 99.95th=[43779], | 99.99th=[49021] bw ( MiB/s): min= 560, max= 2617, per=99.99%, avg=2488.34, stdev= 8.39, samples=19199 iops : min=143452, max=670006, avg=637013.93, stdev=2148.64, samples=19199 lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=96.92%, 10=2.89%, 20=0.01%, 50=0.16% lat (msec) : 100=0.01% cpu : usr=9.32%, sys=14.88%, ctx=130862793, majf=0, minf=38825 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=191130719,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=2489MiB/s (2610MB/s), 2489MiB/s-2489MiB/s (2610MB/s-2610MB/s), io=729GiB (783GB), run=300006-300006msec Disk stats (read/write): nvme0n1: ios=191019060/0, merge=0/0, ticks=611718395/0, in_queue=611718395, util=100.00% Again, the numbers look very alike. Thanks, Daniel
On 2020-12-08 12:36:53 [+0100], Daniel Wagner wrote: > Obvious in this configuration there are no remote completions (verified > it). do you complete on a remote CPU if you limit the queues to one (this is untested of course)? diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3be352403839a..f35224a64a56e 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2126,7 +2126,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) * If tags are shared with admin queue (Apple bug), then * make sure we only use one IO queue. */ - if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS) + if (1 || dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS) nr_io_queues = 1; else nr_io_queues = min(nvme_max_io_queues(dev), > Thanks, > Daniel Sebastian
On Tue, Dec 08, 2020 at 12:49:36PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-12-08 12:36:53 [+0100], Daniel Wagner wrote: > > Obvious in this configuration there are no remote completions (verified > > it). > > do you complete on a remote CPU if you limit the queues to one (this is > untested of course)? nvme0n1/ completed 11913011 remote 6718563 56.40% yes, but how is this relevant? I thought Jens complain was about the additional indirection via the softirq context - rq->q->mq_ops->complete(rq); + blk_mq_trigger_softirq(rq); and not the remote completion path. I can benchmark it out but I don't know if it's really helping in the discussion.
On 2020-12-08 13:41:48 [+0100], Daniel Wagner wrote: > On Tue, Dec 08, 2020 at 12:49:36PM +0100, Sebastian Andrzej Siewior wrote: > > On 2020-12-08 12:36:53 [+0100], Daniel Wagner wrote: > > > Obvious in this configuration there are no remote completions (verified > > > it). > > > > do you complete on a remote CPU if you limit the queues to one (this is > > untested of course)? > > nvme0n1/ completed 11913011 remote 6718563 56.40% > > yes, but how is this relevant? I thought Jens complain was about the > additional indirection via the softirq context > > - rq->q->mq_ops->complete(rq); > + blk_mq_trigger_softirq(rq); > > and not the remote completion path. I can benchmark it out but I don't > know if it's really helping in the discussion. The only additional softirq path is for cross-CPU completion. If I understood you correctly then your NVME device always completes locally because the queue interrupt fires on the correct CPU. If you take away the queues then you should have cross-CPU completion since you have only one queue and this will now complete on the remote CPU in softirq context (and not in IRQ as it used to). If this single queue NVME device, which may complete on another CPU, is not an issue / interesting because it is already limited then ignore this. Sebastian
On 2020-12-08 13:52:25 [+0100], To Daniel Wagner wrote: > On 2020-12-08 13:41:48 [+0100], Daniel Wagner wrote: > > On Tue, Dec 08, 2020 at 12:49:36PM +0100, Sebastian Andrzej Siewior wrote: > > > On 2020-12-08 12:36:53 [+0100], Daniel Wagner wrote: > > > > Obvious in this configuration there are no remote completions (verified > > > > it). > > > > > > do you complete on a remote CPU if you limit the queues to one (this is > > > untested of course)? > > > > nvme0n1/ completed 11913011 remote 6718563 56.40% > > > > yes, but how is this relevant? I thought Jens complain was about the > > additional indirection via the softirq context > > > > - rq->q->mq_ops->complete(rq); > > + blk_mq_trigger_softirq(rq); > > > > and not the remote completion path. I can benchmark it out but I don't > > know if it's really helping in the discussion. … blurp Yes, you are right. Even cross-CPU completion for single-queue was already completing in softirq. So the only change is for multiqueue devices which you just demonstrated that it does not happen. Thank you! Sebastian
On Mon, Dec 07, 2020 at 04:52:57PM -0700, Jens Axboe wrote: > On 12/4/20 12:13 PM, Sebastian Andrzej Siewior wrote: > > Controllers with multiple queues have their IRQ-handelers pinned to a > > CPU. The core shouldn't need to complete the request on a remote CPU. > > > > Remove this case and always raise the softirq to complete the request. > > I don't like this one at all, it'll add a softirq jump for the fast path > for eg nvme devices. For the real fast path, that is either a polled queue or irq driven queues that only map to a single CPU we are never reaching this code, so I'm not too worried. Not that I'd complain about numbers, preferably in the commit log.
On Tue, Dec 08, 2020 at 01:57:09PM +0100, Sebastian Andrzej Siewior wrote: > Yes, you are right. Even cross-CPU completion for single-queue was > already completing in softirq. So the only change is for multiqueue > devices which you just demonstrated that it does not happen. It can happen if there are less hardware queues than CPUs. But I'd bet that application which care about performance are well aware of this and try to keep the work vertical aligned.
On 2020-12-08 13:13:19 [+0000], Christoph Hellwig wrote: > On Mon, Dec 07, 2020 at 04:52:57PM -0700, Jens Axboe wrote: > > On 12/4/20 12:13 PM, Sebastian Andrzej Siewior wrote: > > > Controllers with multiple queues have their IRQ-handelers pinned to a > > > CPU. The core shouldn't need to complete the request on a remote CPU. > > > > > > Remove this case and always raise the softirq to complete the request. > > > > I don't like this one at all, it'll add a softirq jump for the fast path > > for eg nvme devices. > > For the real fast path, that is either a polled queue or irq driven > queues that only map to a single CPU we are never reaching this code, > so I'm not too worried. Not that I'd complain about numbers, preferably > in the commit log. Did Daniel provide all the numbers you/Jens were looking for or do we still wait for some? Sebastian
On 12/8/20 1:44 AM, Daniel Wagner wrote: > It looks like the patched version show tiny bit better numbers for this > workload. slat seems to be one of the major difference between the two > runs. But that is the only thing I really spotted to be really off. slat is the same, just one is in nsec and the other is in usec. > I keep going with some more testing. Let what kind of tests you would > also want to see. I'll do a few plain NVMe tests next. This is a good test, thanks.
On Thu, Dec 17, 2020 at 09:45:47AM -0700, Jens Axboe wrote: > On 12/8/20 1:44 AM, Daniel Wagner wrote: > > It looks like the patched version show tiny bit better numbers for this > > workload. slat seems to be one of the major difference between the two > > runs. But that is the only thing I really spotted to be really off. > > slat is the same, just one is in nsec and the other is in usec. Ah, good eyes. Need to remember this :) > > I keep going with some more testing. Let what kind of tests you would > > also want to see. I'll do a few plain NVMe tests next. > > This is a good test, thanks. Got sidetracked. Haven't started yet with these tests.
On 12/17/20 9:49 AM, Daniel Wagner wrote: > On Thu, Dec 17, 2020 at 09:45:47AM -0700, Jens Axboe wrote: >> On 12/8/20 1:44 AM, Daniel Wagner wrote: >>> It looks like the patched version show tiny bit better numbers for this >>> workload. slat seems to be one of the major difference between the two >>> runs. But that is the only thing I really spotted to be really off. >> >> slat is the same, just one is in nsec and the other is in usec. > > Ah, good eyes. Need to remember this :) > >>> I keep going with some more testing. Let what kind of tests you would >>> also want to see. I'll do a few plain NVMe tests next. >> >> This is a good test, thanks. > > Got sidetracked. Haven't started yet with these tests. I just ran some here, and as expected, didn't see much change. Single core IOPS at 2.8M in both cases, and not expecting any remote IPI for that test case. So I'd say that's good enough, wasn't expecting a change, and we don't see one even for your case with remote IPI.
On 12/17/20 9:43 AM, Sebastian Andrzej Siewior wrote: > On 2020-12-08 13:13:19 [+0000], Christoph Hellwig wrote: >> On Mon, Dec 07, 2020 at 04:52:57PM -0700, Jens Axboe wrote: >>> On 12/4/20 12:13 PM, Sebastian Andrzej Siewior wrote: >>>> Controllers with multiple queues have their IRQ-handelers pinned to a >>>> CPU. The core shouldn't need to complete the request on a remote CPU. >>>> >>>> Remove this case and always raise the softirq to complete the request. >>> >>> I don't like this one at all, it'll add a softirq jump for the fast path >>> for eg nvme devices. >> >> For the real fast path, that is either a polled queue or irq driven >> queues that only map to a single CPU we are never reaching this code, >> so I'm not too worried. Not that I'd complain about numbers, preferably >> in the commit log. > > Did Daniel provide all the numbers you/Jens were looking for or do we > still wait for some? Yeah, I think we're good at this point. I'll queue this up for 5.11.
On 2020-12-17 09:55:08 [-0700], Jens Axboe wrote: > > Yeah, I think we're good at this point. I'll queue this up for 5.11. Thank you very much. Thank you Daniel for running all these tests. Sebastian
On Thu, Dec 17, 2020 at 05:58:44PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-12-17 09:55:08 [-0700], Jens Axboe wrote: > > > > Yeah, I think we're good at this point. I'll queue this up for 5.11. > Thank you very much. > Thank you Daniel for running all these tests. np. Glad I can contribute to the -rt project :)
On Thu, Dec 17, 2020 at 09:55:08AM -0700, Jens Axboe wrote:
> Yeah, I think we're good at this point. I'll queue this up for 5.11.
If I am not complete mistaken you queued v2 of patch 3. Sebastian sent out
a v3 (slightly hidden):
https://lore.kernel.org/linux-block/20201214202030.izhm4byeznfjoobe@linutronix.de/
which includes the changes Christoph asked for.
On 12/17/20 11:16 AM, Daniel Wagner wrote: > On Thu, Dec 17, 2020 at 09:55:08AM -0700, Jens Axboe wrote: >> Yeah, I think we're good at this point. I'll queue this up for 5.11. > > If I am not complete mistaken you queued v2 of patch 3. Sebastian sent out > a v3 (slightly hidden): > > https://lore.kernel.org/linux-block/20201214202030.izhm4byeznfjoobe@linutronix.de/ > > which includes the changes Christoph asked for. Not only slightly hidden, b4 gets me v2 as well. Which isn't surprising, since it's just a patch in a reply. I'll fix it up, but would've been great if a v3 series had been posted, or at least a v3 of patch 3 in that thread sent properly.
On Thu, Dec 17, 2020 at 11:22:58AM -0700, Jens Axboe wrote: > Not only slightly hidden, b4 gets me v2 as well. Which isn't surprising, > since it's just a patch in a reply. I'll fix it up, but would've been > great if a v3 series had been posted, or at least a v3 of patch 3 in > that thread sent properly. Yep. BTW, if you like you can add my Reviewed-by: Daniel Wagner <dwagner@suse.de>
On 12/17/20 11:41 AM, Daniel Wagner wrote: > On Thu, Dec 17, 2020 at 11:22:58AM -0700, Jens Axboe wrote: >> Not only slightly hidden, b4 gets me v2 as well. Which isn't surprising, >> since it's just a patch in a reply. I'll fix it up, but would've been >> great if a v3 series had been posted, or at least a v3 of patch 3 in >> that thread sent properly. > > Yep. > > BTW, if you like you can add my > > Reviewed-by: Daniel Wagner <dwagner@suse.de> To the series, or just that one patch?
On 17.12.20 19:46, Jens Axboe wrote: > On 12/17/20 11:41 AM, Daniel Wagner wrote: >> On Thu, Dec 17, 2020 at 11:22:58AM -0700, Jens Axboe wrote: >>> Not only slightly hidden, b4 gets me v2 as well. Which isn't surprising, >>> since it's just a patch in a reply. I'll fix it up, but would've been >>> great if a v3 series had been posted, or at least a v3 of patch 3 in >>> that thread sent properly. >> >> Yep. >> >> BTW, if you like you can add my >> >> Reviewed-by: Daniel Wagner <dwagner@suse.de> > > To the series, or just that one patch? to the series but if I am late for this, it's also okau if I miss out to the party :)
On 12/17/20 12:07 PM, Daniel Wagner wrote: > On 17.12.20 19:46, Jens Axboe wrote: >> On 12/17/20 11:41 AM, Daniel Wagner wrote: >>> On Thu, Dec 17, 2020 at 11:22:58AM -0700, Jens Axboe wrote: >>>> Not only slightly hidden, b4 gets me v2 as well. Which isn't surprising, >>>> since it's just a patch in a reply. I'll fix it up, but would've been >>>> great if a v3 series had been posted, or at least a v3 of patch 3 in >>>> that thread sent properly. >>> >>> Yep. >>> >>> BTW, if you like you can add my >>> >>> Reviewed-by: Daniel Wagner <dwagner@suse.de> >> >> To the series, or just that one patch? > > to the series but if I am late for this, it's also okau if I miss out to > the party :) Well, had to update #3 anyway, so done.
> Well, had to update #3 anyway, so done.
Thanks!
diff --git a/block/blk-mq.c b/block/blk-mq.c index 7091bb097c63f..3c0e94913d874 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -628,19 +628,7 @@ static void __blk_mq_complete_request_remote(void *data) { struct request *rq = data; - /* - * For most of single queue controllers, there is only one irq vector - * for handling I/O completion, and the only irq's affinity is set - * to all possible CPUs. On most of ARCHs, this affinity means the irq - * is handled on one specific CPU. - * - * So complete I/O requests in softirq context in case of single queue - * devices to avoid degrading I/O performance due to irqsoff latency. - */ - if (rq->q->nr_hw_queues == 1) - blk_mq_trigger_softirq(rq); - else - rq->q->mq_ops->complete(rq); + blk_mq_trigger_softirq(rq); } static inline bool blk_mq_complete_need_ipi(struct request *rq)
Controllers with multiple queues have their IRQ-handelers pinned to a CPU. The core shouldn't need to complete the request on a remote CPU. Remove this case and always raise the softirq to complete the request. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- block/blk-mq.c | 14 +------------- 1 file changed, 1 insertion(+), 13 deletions(-)