Message ID | 1526918153-26387-1-git-send-email-sergeygo@mellanox.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Mon, May 21, 2018 at 06:55:53PM +0300, Sergey Gorenko wrote: > The iSER driver reduces max_sectors. For example, if you load the > ib_iser module with max_sectors=1024, you will see that > /sys/class/block/<bdev>/queue/max_hw_sectors_kb is 508. It is an > incorrect value. The expected value is (max_sectors * sector_size) / > 1024 = 512. > > Reducing of max_sectors can cause performance degradation due to > unnecessary splitting of IO requests. > > The number of pages per MR has been fixed here, so there is no longer > any need to reduce max_sectors. > > Fixes: 9c674815d346 ("IB/iser: Fix max_sectors calculation") > Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com> > Reviewed-by: Israel Rukshin <israelr@mellanox.com> > Reviewed-by: Max Gurtovoy <maxg@mellanox.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Sagi Grimberg <sagi@grimberg.me> > --- > drivers/infiniband/ulp/iser/iscsi_iser.c | 12 +++++------- > drivers/infiniband/ulp/iser/iscsi_iser.h | 2 ++ > drivers/infiniband/ulp/iser/iser_initiator.c | 2 +- > drivers/infiniband/ulp/iser/iser_verbs.c | 21 ++++++++++++++++++--- > 4 files changed, 26 insertions(+), 11 deletions(-) Sagi, can you ack/nak this? Seems tricky. Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/21/2018 06:55 PM, Sergey Gorenko wrote: > The iSER driver reduces max_sectors. For example, if you load the > ib_iser module with max_sectors=1024, you will see that > /sys/class/block/<bdev>/queue/max_hw_sectors_kb is 508. It is an > incorrect value. The expected value is (max_sectors * sector_size) / > 1024 = 512. > > Reducing of max_sectors can cause performance degradation due to > unnecessary splitting of IO requests. > > The number of pages per MR has been fixed here, so there is no longer > any need to reduce max_sectors. Thanks Sergey, this looks good, Acked-by: Sagi Grimberg <sagi@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2018-05-31 at 12:01 +0300, Sagi Grimberg wrote: > On 05/21/2018 06:55 PM, Sergey Gorenko wrote: > > The iSER driver reduces max_sectors. For example, if you load the > > ib_iser module with max_sectors=1024, you will see that > > /sys/class/block/<bdev>/queue/max_hw_sectors_kb is 508. It is an > > incorrect value. The expected value is (max_sectors * sector_size) / > > 1024 = 512. > > > > Reducing of max_sectors can cause performance degradation due to > > unnecessary splitting of IO requests. > > > > The number of pages per MR has been fixed here, so there is no longer > > any need to reduce max_sectors. > > Thanks Sergey, this looks good, > > Acked-by: Sagi Grimberg <sagi@grimberg.me> > Thanks, applied to for-next.
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 0336643c2ed6..9a6434c31db2 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -665,19 +665,17 @@ static void iscsi_iser_cleanup_task(struct iscsi_task *task) goto free_host; } - /* - * FRs or FMRs can only map up to a (device) page per entry, but if the - * first entry is misaligned we'll end up using using two entries - * (head and tail) for a single page worth data, so we have to drop - * one segment from the calculation. - */ - max_fr_sectors = ((shost->sg_tablesize - 1) * PAGE_SIZE) >> 9; + max_fr_sectors = (shost->sg_tablesize * PAGE_SIZE) >> 9; shost->max_sectors = min(iser_max_sectors, max_fr_sectors); iser_dbg("iser_conn %p, sg_tablesize %u, max_sectors %u\n", iser_conn, shost->sg_tablesize, shost->max_sectors); + if (shost->max_sectors < iser_max_sectors) + iser_warn("max_sectors was reduced from %u to %u\n", + iser_max_sectors, shost->max_sectors); + if (cmds_max > max_cmds) { iser_info("cmds_max changed from %u to %u\n", cmds_max, max_cmds); diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index c1ae4aeae2f9..c30ceb08db21 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -498,6 +498,7 @@ struct ib_conn { * @rx_descs: rx buffers array (cyclic buffer) * @num_rx_descs: number of rx descriptors * @scsi_sg_tablesize: scsi host sg_tablesize + * @pages_per_mr: maximum pages available for registration */ struct iser_conn { struct ib_conn ib_conn; @@ -520,6 +521,7 @@ struct iser_conn { struct iser_rx_desc *rx_descs; u32 num_rx_descs; unsigned short scsi_sg_tablesize; + unsigned short pages_per_mr; bool snd_w_inv; }; diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index df49c4eb67f7..ca858d6bd37a 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -251,7 +251,7 @@ int iser_alloc_rx_descriptors(struct iser_conn *iser_conn, iser_conn->min_posted_rx = iser_conn->qp_max_recv_dtos >> 2; if (device->reg_ops->alloc_reg_res(ib_conn, session->scsi_cmds_max, - iser_conn->scsi_sg_tablesize)) + iser_conn->pages_per_mr)) goto create_rdma_reg_res_failed; if (iser_alloc_login_buf(iser_conn)) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 56b7240a3fc3..616d978cbf2b 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -703,19 +703,34 @@ static void iser_connect_error(struct rdma_cm_id *cma_id) unsigned int max_sectors) { struct iser_device *device = iser_conn->ib_conn.device; + struct ib_device_attr *attr = &device->ib_device->attrs; unsigned short sg_tablesize, sup_sg_tablesize; + unsigned short reserved_mr_pages; + + /* + * FRs without SG_GAPS or FMRs can only map up to a (device) page per + * entry, but if the first entry is misaligned we'll end up using two + * entries (head and tail) for a single page worth data, so one + * additional entry is required. + */ + if ((attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) && + (attr->device_cap_flags & IB_DEVICE_SG_GAPS_REG)) + reserved_mr_pages = 0; + else + reserved_mr_pages = 1; sg_tablesize = DIV_ROUND_UP(max_sectors * 512, SIZE_4K); - if (device->ib_device->attrs.device_cap_flags & - IB_DEVICE_MEM_MGT_EXTENSIONS) + if (attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) sup_sg_tablesize = min_t( uint, ISCSI_ISER_MAX_SG_TABLESIZE, - device->ib_device->attrs.max_fast_reg_page_list_len); + attr->max_fast_reg_page_list_len - reserved_mr_pages); else sup_sg_tablesize = ISCSI_ISER_MAX_SG_TABLESIZE; iser_conn->scsi_sg_tablesize = min(sg_tablesize, sup_sg_tablesize); + iser_conn->pages_per_mr = + iser_conn->scsi_sg_tablesize + reserved_mr_pages; } /**