Message ID | cover.1563782844.git.baolin.wang@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | Add MMC packed function | expand |
Hi, On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > > Hi All, > > Now some SD/MMC controllers can support packed command or packed request, > that means it can package multiple requests to host controller to be handled > at one time, which can improve the I/O performence. Thus this patchset is > used to add the MMC packed function to support packed request or packed > command. > > In this patch set, I implemented the SD host ADMA3 transfer mode to support > packed request. The ADMA3 transfer mode can process a multi-block data transfer > by using a pair of command descriptor and ADMA2 descriptor. In future we can > easily expand the MMC packed function to support packed command. > > Below are some comparison data between packed request and non-packed request > with fio tool. The fio command I used is like below with changing the > '--rw' parameter and enabling the direct IO flag to measure the actual hardware > transfer speed. > > ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > > My eMMC card working at HS400 Enhanced strobe mode: > [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > > 1. Non-packed request > I tested 3 times for each case and output a average speed. > > 1) Sequential read: > Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > Average speed: 28.7MiB/s > > 2) Random read: > Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s > Average speed: 14.3MiB/s > > 3) Sequential write: > Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s > Average speed: 24.7MiB/s > > 4) Random write: > Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s > Average speed: 19.2MiB/s > > 2. Packed request > In packed request mode, I set the host controller can package maximum 10 > requests at one time (Actually I can increase the package number), and I > enabled read/write packed request mode. Also I tested 3 times for each > case and output a average speed. > > 1) Sequential read: > Speed: 165MiB/s, 167MiB/s, 164MiB/s > Average speed: 165.3MiB/s > > 2) Random read: > Speed: 147MiB/s, 141MiB/s, 144MiB/s > Average speed: 144MiB/s > > 3) Sequential write: > Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s > Average speed: 89MiB/s > > 4) Random write: > Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s > Average speed: 90.4MiB/s > > Form above data, we can see the packed request can improve the performance greatly. > Any comments are welcome. Thanks a lot. Any comments for this patch set? Thanks. > > Baolin Wang (7): > blk-mq: Export blk_mq_hctx_has_pending() function > mmc: core: Add MMC packed request function > mmc: host: sdhci: Introduce ADMA3 transfer mode > mmc: host: sdhci: Factor out the command configuration > mmc: host: sdhci: Remove redundant sg_count member of struct > sdhci_host > mmc: host: sdhci: Add MMC packed request support > mmc: host: sdhci-sprd: Add MMC packed request support > > block/blk-mq.c | 3 +- > drivers/mmc/core/Kconfig | 2 + > drivers/mmc/core/Makefile | 1 + > drivers/mmc/core/block.c | 71 +++++- > drivers/mmc/core/block.h | 3 +- > drivers/mmc/core/core.c | 51 ++++ > drivers/mmc/core/core.h | 3 + > drivers/mmc/core/packed.c | 478 ++++++++++++++++++++++++++++++++++++++ > drivers/mmc/core/queue.c | 28 ++- > drivers/mmc/host/Kconfig | 1 + > drivers/mmc/host/sdhci-sprd.c | 22 +- > drivers/mmc/host/sdhci.c | 513 +++++++++++++++++++++++++++++++++++------ > drivers/mmc/host/sdhci.h | 59 ++++- > include/linux/blk-mq.h | 1 + > include/linux/mmc/core.h | 1 + > include/linux/mmc/host.h | 3 + > include/linux/mmc/packed.h | 123 ++++++++++ > 17 files changed, 1286 insertions(+), 77 deletions(-) > create mode 100644 drivers/mmc/core/packed.c > create mode 100644 include/linux/mmc/packed.h > > -- > 1.7.9.5 >
On 12/08/19 8:20 AM, Baolin Wang wrote: > Hi, > > On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: >> >> Hi All, >> >> Now some SD/MMC controllers can support packed command or packed request, >> that means it can package multiple requests to host controller to be handled >> at one time, which can improve the I/O performence. Thus this patchset is >> used to add the MMC packed function to support packed request or packed >> command. >> >> In this patch set, I implemented the SD host ADMA3 transfer mode to support >> packed request. The ADMA3 transfer mode can process a multi-block data transfer >> by using a pair of command descriptor and ADMA2 descriptor. In future we can >> easily expand the MMC packed function to support packed command. >> >> Below are some comparison data between packed request and non-packed request >> with fio tool. The fio command I used is like below with changing the >> '--rw' parameter and enabling the direct IO flag to measure the actual hardware >> transfer speed. >> >> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read >> >> My eMMC card working at HS400 Enhanced strobe mode: >> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 >> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB >> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB >> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB >> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) >> >> 1. Non-packed request >> I tested 3 times for each case and output a average speed. >> >> 1) Sequential read: >> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s >> Average speed: 28.7MiB/s This seems surprising low for a HS400ES card. Do you know why that is? >> >> 2) Random read: >> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s >> Average speed: 14.3MiB/s >> >> 3) Sequential write: >> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s >> Average speed: 24.7MiB/s >> >> 4) Random write: >> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s >> Average speed: 19.2MiB/s >> >> 2. Packed request >> In packed request mode, I set the host controller can package maximum 10 >> requests at one time (Actually I can increase the package number), and I >> enabled read/write packed request mode. Also I tested 3 times for each >> case and output a average speed. >> >> 1) Sequential read: >> Speed: 165MiB/s, 167MiB/s, 164MiB/s >> Average speed: 165.3MiB/s >> >> 2) Random read: >> Speed: 147MiB/s, 141MiB/s, 144MiB/s >> Average speed: 144MiB/s >> >> 3) Sequential write: >> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s >> Average speed: 89MiB/s >> >> 4) Random write: >> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s >> Average speed: 90.4MiB/s >> >> Form above data, we can see the packed request can improve the performance greatly. >> Any comments are welcome. Thanks a lot. > > Any comments for this patch set? Thanks. Did you consider adapting the CQE interface?
Hi Adrian, On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: > > On 12/08/19 8:20 AM, Baolin Wang wrote: > > Hi, > > > > On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > >> > >> Hi All, > >> > >> Now some SD/MMC controllers can support packed command or packed request, > >> that means it can package multiple requests to host controller to be handled > >> at one time, which can improve the I/O performence. Thus this patchset is > >> used to add the MMC packed function to support packed request or packed > >> command. > >> > >> In this patch set, I implemented the SD host ADMA3 transfer mode to support > >> packed request. The ADMA3 transfer mode can process a multi-block data transfer > >> by using a pair of command descriptor and ADMA2 descriptor. In future we can > >> easily expand the MMC packed function to support packed command. > >> > >> Below are some comparison data between packed request and non-packed request > >> with fio tool. The fio command I used is like below with changing the > >> '--rw' parameter and enabling the direct IO flag to measure the actual hardware > >> transfer speed. > >> > >> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > >> > >> My eMMC card working at HS400 Enhanced strobe mode: > >> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > >> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > >> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > >> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > >> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > >> > >> 1. Non-packed request > >> I tested 3 times for each case and output a average speed. > >> > >> 1) Sequential read: > >> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > >> Average speed: 28.7MiB/s > > This seems surprising low for a HS400ES card. Do you know why that is? I've set the clock to 400M, but it seems the hardware did not output the corresponding clock. I will check my hardware. > >> > >> 2) Random read: > >> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s > >> Average speed: 14.3MiB/s > >> > >> 3) Sequential write: > >> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s > >> Average speed: 24.7MiB/s > >> > >> 4) Random write: > >> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s > >> Average speed: 19.2MiB/s > >> > >> 2. Packed request > >> In packed request mode, I set the host controller can package maximum 10 > >> requests at one time (Actually I can increase the package number), and I > >> enabled read/write packed request mode. Also I tested 3 times for each > >> case and output a average speed. > >> > >> 1) Sequential read: > >> Speed: 165MiB/s, 167MiB/s, 164MiB/s > >> Average speed: 165.3MiB/s > >> > >> 2) Random read: > >> Speed: 147MiB/s, 141MiB/s, 144MiB/s > >> Average speed: 144MiB/s > >> > >> 3) Sequential write: > >> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s > >> Average speed: 89MiB/s > >> > >> 4) Random write: > >> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s > >> Average speed: 90.4MiB/s > >> > >> Form above data, we can see the packed request can improve the performance greatly. > >> Any comments are welcome. Thanks a lot. > > > > Any comments for this patch set? Thanks. > > Did you consider adapting the CQE interface? I am not very familiar with CQE, since my controller did not support it. But the MMC packed function had introduced some callbacks to help for different controllers to do packed request, so I think it is easy to adapt the CQE interface.
On 12/08/19 12:44 PM, Baolin Wang wrote: > Hi Adrian, > > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: >> >> On 12/08/19 8:20 AM, Baolin Wang wrote: >>> Hi, >>> >>> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: >>>> >>>> Hi All, >>>> >>>> Now some SD/MMC controllers can support packed command or packed request, >>>> that means it can package multiple requests to host controller to be handled >>>> at one time, which can improve the I/O performence. Thus this patchset is >>>> used to add the MMC packed function to support packed request or packed >>>> command. >>>> >>>> In this patch set, I implemented the SD host ADMA3 transfer mode to support >>>> packed request. The ADMA3 transfer mode can process a multi-block data transfer >>>> by using a pair of command descriptor and ADMA2 descriptor. In future we can >>>> easily expand the MMC packed function to support packed command. >>>> >>>> Below are some comparison data between packed request and non-packed request >>>> with fio tool. The fio command I used is like below with changing the >>>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware >>>> transfer speed. >>>> >>>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read >>>> >>>> My eMMC card working at HS400 Enhanced strobe mode: >>>> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 >>>> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB >>>> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB >>>> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB >>>> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) >>>> >>>> 1. Non-packed request >>>> I tested 3 times for each case and output a average speed. >>>> >>>> 1) Sequential read: >>>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s >>>> Average speed: 28.7MiB/s >> >> This seems surprising low for a HS400ES card. Do you know why that is? > > I've set the clock to 400M, but it seems the hardware did not output > the corresponding clock. I will check my hardware. > >>>> >>>> 2) Random read: >>>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s >>>> Average speed: 14.3MiB/s >>>> >>>> 3) Sequential write: >>>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s >>>> Average speed: 24.7MiB/s >>>> >>>> 4) Random write: >>>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s >>>> Average speed: 19.2MiB/s >>>> >>>> 2. Packed request >>>> In packed request mode, I set the host controller can package maximum 10 >>>> requests at one time (Actually I can increase the package number), and I >>>> enabled read/write packed request mode. Also I tested 3 times for each >>>> case and output a average speed. >>>> >>>> 1) Sequential read: >>>> Speed: 165MiB/s, 167MiB/s, 164MiB/s >>>> Average speed: 165.3MiB/s >>>> >>>> 2) Random read: >>>> Speed: 147MiB/s, 141MiB/s, 144MiB/s >>>> Average speed: 144MiB/s >>>> >>>> 3) Sequential write: >>>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s >>>> Average speed: 89MiB/s >>>> >>>> 4) Random write: >>>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s >>>> Average speed: 90.4MiB/s >>>> >>>> Form above data, we can see the packed request can improve the performance greatly. >>>> Any comments are welcome. Thanks a lot. >>> >>> Any comments for this patch set? Thanks. >> >> Did you consider adapting the CQE interface? > > I am not very familiar with CQE, since my controller did not support > it. But the MMC packed function had introduced some callbacks to help > for different controllers to do packed request, so I think it is easy > to adapt the CQE interface. > I meant did you consider using the CQE interface instead of creating another one?
On Mon, 12 Aug 2019 at 18:52, Adrian Hunter <adrian.hunter@intel.com> wrote: > > On 12/08/19 12:44 PM, Baolin Wang wrote: > > Hi Adrian, > > > > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: > >> > >> On 12/08/19 8:20 AM, Baolin Wang wrote: > >>> Hi, > >>> > >>> On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > >>>> > >>>> Hi All, > >>>> > >>>> Now some SD/MMC controllers can support packed command or packed request, > >>>> that means it can package multiple requests to host controller to be handled > >>>> at one time, which can improve the I/O performence. Thus this patchset is > >>>> used to add the MMC packed function to support packed request or packed > >>>> command. > >>>> > >>>> In this patch set, I implemented the SD host ADMA3 transfer mode to support > >>>> packed request. The ADMA3 transfer mode can process a multi-block data transfer > >>>> by using a pair of command descriptor and ADMA2 descriptor. In future we can > >>>> easily expand the MMC packed function to support packed command. > >>>> > >>>> Below are some comparison data between packed request and non-packed request > >>>> with fio tool. The fio command I used is like below with changing the > >>>> '--rw' parameter and enabling the direct IO flag to measure the actual hardware > >>>> transfer speed. > >>>> > >>>> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > >>>> > >>>> My eMMC card working at HS400 Enhanced strobe mode: > >>>> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > >>>> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > >>>> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > >>>> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > >>>> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > >>>> > >>>> 1. Non-packed request > >>>> I tested 3 times for each case and output a average speed. > >>>> > >>>> 1) Sequential read: > >>>> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > >>>> Average speed: 28.7MiB/s > >> > >> This seems surprising low for a HS400ES card. Do you know why that is? > > > > I've set the clock to 400M, but it seems the hardware did not output > > the corresponding clock. I will check my hardware. > > > >>>> > >>>> 2) Random read: > >>>> Speed: 18.2MiB/s, 8.9MiB/s, 15.8MiB/s > >>>> Average speed: 14.3MiB/s > >>>> > >>>> 3) Sequential write: > >>>> Speed: 21.1MiB/s, 27.9MiB/s, 25MiB/s > >>>> Average speed: 24.7MiB/s > >>>> > >>>> 4) Random write: > >>>> Speed: 21.5MiB/s, 18.1MiB/s, 18.1MiB/s > >>>> Average speed: 19.2MiB/s > >>>> > >>>> 2. Packed request > >>>> In packed request mode, I set the host controller can package maximum 10 > >>>> requests at one time (Actually I can increase the package number), and I > >>>> enabled read/write packed request mode. Also I tested 3 times for each > >>>> case and output a average speed. > >>>> > >>>> 1) Sequential read: > >>>> Speed: 165MiB/s, 167MiB/s, 164MiB/s > >>>> Average speed: 165.3MiB/s > >>>> > >>>> 2) Random read: > >>>> Speed: 147MiB/s, 141MiB/s, 144MiB/s > >>>> Average speed: 144MiB/s > >>>> > >>>> 3) Sequential write: > >>>> Speed: 87.8MiB/s, 89.1MiB/s, 90.0MiB/s > >>>> Average speed: 89MiB/s > >>>> > >>>> 4) Random write: > >>>> Speed: 90.9MiB/s, 89.8MiB/s, 90.4MiB/s > >>>> Average speed: 90.4MiB/s > >>>> > >>>> Form above data, we can see the packed request can improve the performance greatly. > >>>> Any comments are welcome. Thanks a lot. > >>> > >>> Any comments for this patch set? Thanks. > >> > >> Did you consider adapting the CQE interface? > > > > I am not very familiar with CQE, since my controller did not support > > it. But the MMC packed function had introduced some callbacks to help > > for different controllers to do packed request, so I think it is easy > > to adapt the CQE interface. > > > > I meant did you consider using the CQE interface instead of creating another > one? Sorry for misunderstanding. I think the core/core.c modification can use the CQE interface, but there are some difference in core/block.c, and I think they are different mechanisms, also I want to keep avoid affecting CQE and normal transfer, so I think adding MMC packed related interfaces will be easy to read and maintain.
Hi Adrian, On Mon, 12 Aug 2019 at 17:44, Baolin Wang <baolin.wang@linaro.org> wrote: > > Hi Adrian, > > On Mon, 12 Aug 2019 at 16:59, Adrian Hunter <adrian.hunter@intel.com> wrote: > > > > On 12/08/19 8:20 AM, Baolin Wang wrote: > > > Hi, > > > > > > On Mon, 22 Jul 2019 at 21:10, Baolin Wang <baolin.wang@linaro.org> wrote: > > >> > > >> Hi All, > > >> > > >> Now some SD/MMC controllers can support packed command or packed request, > > >> that means it can package multiple requests to host controller to be handled > > >> at one time, which can improve the I/O performence. Thus this patchset is > > >> used to add the MMC packed function to support packed request or packed > > >> command. > > >> > > >> In this patch set, I implemented the SD host ADMA3 transfer mode to support > > >> packed request. The ADMA3 transfer mode can process a multi-block data transfer > > >> by using a pair of command descriptor and ADMA2 descriptor. In future we can > > >> easily expand the MMC packed function to support packed command. > > >> > > >> Below are some comparison data between packed request and non-packed request > > >> with fio tool. The fio command I used is like below with changing the > > >> '--rw' parameter and enabling the direct IO flag to measure the actual hardware > > >> transfer speed. > > >> > > >> ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read > > >> > > >> My eMMC card working at HS400 Enhanced strobe mode: > > >> [ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001 > > >> [ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB > > >> [ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB > > >> [ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB > > >> [ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0) > > >> > > >> 1. Non-packed request > > >> I tested 3 times for each case and output a average speed. > > >> > > >> 1) Sequential read: > > >> Speed: 28.9MiB/s, 26.4MiB/s, 30.9MiB/s > > >> Average speed: 28.7MiB/s > > > > This seems surprising low for a HS400ES card. Do you know why that is? > > I've set the clock to 400M, but it seems the hardware did not output > the corresponding clock. I will check my hardware. I've checked my hardware and did not find any problem. The reason of low speed is that I set the bs = 4k, when I changed the bs=1M, and the speed can go up to 251MiB/s. ./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=1M --size=512M --group_reporting --numjobs=20 --name=test_read READ: bw=251MiB/s (263MB/s), 251MiB/s-251MiB/s (263MB/s-263MB/s), io=10.0GiB (10.7GB), run=40826-40826msec