Message ID | 20210603065024.1051-4-anand.a.khoje@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | IB/core: Obtaining subnet_prefix from cache in IB devices. | expand |
On Thu, Jun 03, 2021 at 12:20:24PM +0530, Anand Khoje wrote: > ib_query_port() calls device->ops.query_port() to get the port > attributes. The method of querying is device driver specific. > The same function calls device->ops.query_gid() to get the GID and > extract the subnet_prefix (gid_prefix). > > The GID and subnet_prefix are stored in a cache. But they do not get > read from the cache if the device is an Infiniband device. The > following change takes advantage of the cached subnet_prefix. > Testing with RDBMS has shown a significant improvement in performance > with this change. > > The function ib_cache_is_initialised() is introduced because > ib_query_port() gets called early in the stage when the cache is not > built while reading port immutable property. > > In that case, the default GID still gets read from HCA for IB link- > layer devices. > > Fixes: fad61ad ("IB/core: Add subnet prefix to port info") > Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> > Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> > --- > drivers/infiniband/core/cache.c | 7 ++++++- > drivers/infiniband/core/device.c | 9 +++++++++ > include/rdma/ib_cache.h | 6 ++++++ > include/rdma/ib_verbs.h | 6 ++++++ > 4 files changed, 27 insertions(+), 1 deletion(-) Can you please help me to understand how cache is updated? There are a lot of calls to ib_query_port() and I wonder how callers can get new GID after it was changed in already initialized cache. Thanks
> On 3 Jun 2021, at 11:07, Leon Romanovsky <leon@kernel.org> wrote: > > On Thu, Jun 03, 2021 at 12:20:24PM +0530, Anand Khoje wrote: >> ib_query_port() calls device->ops.query_port() to get the port >> attributes. The method of querying is device driver specific. >> The same function calls device->ops.query_gid() to get the GID and >> extract the subnet_prefix (gid_prefix). >> >> The GID and subnet_prefix are stored in a cache. But they do not get >> read from the cache if the device is an Infiniband device. The >> following change takes advantage of the cached subnet_prefix. >> Testing with RDBMS has shown a significant improvement in performance >> with this change. >> >> The function ib_cache_is_initialised() is introduced because >> ib_query_port() gets called early in the stage when the cache is not >> built while reading port immutable property. >> >> In that case, the default GID still gets read from HCA for IB link- >> layer devices. >> >> Fixes: fad61ad ("IB/core: Add subnet prefix to port info") >> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> >> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> >> --- >> drivers/infiniband/core/cache.c | 7 ++++++- >> drivers/infiniband/core/device.c | 9 +++++++++ >> include/rdma/ib_cache.h | 6 ++++++ >> include/rdma/ib_verbs.h | 6 ++++++ >> 4 files changed, 27 insertions(+), 1 deletion(-) > > Can you please help me to understand how cache is updated? > > There are a lot of calls to ib_query_port() and I wonder how callers can > get new GID after it was changed in already initialized cache. The cache is initialized when it is created, just before the bit IB_PORT_CACHE_INITIALIZED is set in flags. After commit d58c23c92548 ("IB/core: Only update PKEY and GID caches on respective events"), the GID portion of the cache is updated when a IB_EVENT_GID_CHANGE event is received. Before said commit, it was updated on any event. Thxs, Håkon
On Thu, Jun 03, 2021 at 09:29:32AM +0000, Haakon Bugge wrote: > > > > On 3 Jun 2021, at 11:07, Leon Romanovsky <leon@kernel.org> wrote: > > > > On Thu, Jun 03, 2021 at 12:20:24PM +0530, Anand Khoje wrote: > >> ib_query_port() calls device->ops.query_port() to get the port > >> attributes. The method of querying is device driver specific. > >> The same function calls device->ops.query_gid() to get the GID and > >> extract the subnet_prefix (gid_prefix). > >> > >> The GID and subnet_prefix are stored in a cache. But they do not get > >> read from the cache if the device is an Infiniband device. The > >> following change takes advantage of the cached subnet_prefix. > >> Testing with RDBMS has shown a significant improvement in performance > >> with this change. > >> > >> The function ib_cache_is_initialised() is introduced because > >> ib_query_port() gets called early in the stage when the cache is not > >> built while reading port immutable property. > >> > >> In that case, the default GID still gets read from HCA for IB link- > >> layer devices. > >> > >> Fixes: fad61ad ("IB/core: Add subnet prefix to port info") > >> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> > >> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> > >> --- > >> drivers/infiniband/core/cache.c | 7 ++++++- > >> drivers/infiniband/core/device.c | 9 +++++++++ > >> include/rdma/ib_cache.h | 6 ++++++ > >> include/rdma/ib_verbs.h | 6 ++++++ > >> 4 files changed, 27 insertions(+), 1 deletion(-) > > > > Can you please help me to understand how cache is updated? > > > > There are a lot of calls to ib_query_port() and I wonder how callers can > > get new GID after it was changed in already initialized cache. > > The cache is initialized when it is created, just before the bit IB_PORT_CACHE_INITIALIZED is set in flags. > > After commit d58c23c92548 ("IB/core: Only update PKEY and GID caches on respective events"), the GID portion of the cache is updated when a IB_EVENT_GID_CHANGE event is received. > > Before said commit, it was updated on any event. This part is clear to me, the missing piece is to understand what will happen if cache and GID are not in sync because of asynchronous nature of events. Thanks > > > Thxs, Håkon >
> On 3 Jun 2021, at 12:16, Leon Romanovsky <leon@kernel.org> wrote: > > On Thu, Jun 03, 2021 at 09:29:32AM +0000, Haakon Bugge wrote: >> >> >>> On 3 Jun 2021, at 11:07, Leon Romanovsky <leon@kernel.org> wrote: >>> >>> On Thu, Jun 03, 2021 at 12:20:24PM +0530, Anand Khoje wrote: >>>> ib_query_port() calls device->ops.query_port() to get the port >>>> attributes. The method of querying is device driver specific. >>>> The same function calls device->ops.query_gid() to get the GID and >>>> extract the subnet_prefix (gid_prefix). >>>> >>>> The GID and subnet_prefix are stored in a cache. But they do not get >>>> read from the cache if the device is an Infiniband device. The >>>> following change takes advantage of the cached subnet_prefix. >>>> Testing with RDBMS has shown a significant improvement in performance >>>> with this change. >>>> >>>> The function ib_cache_is_initialised() is introduced because >>>> ib_query_port() gets called early in the stage when the cache is not >>>> built while reading port immutable property. >>>> >>>> In that case, the default GID still gets read from HCA for IB link- >>>> layer devices. >>>> >>>> Fixes: fad61ad ("IB/core: Add subnet prefix to port info") >>>> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> >>>> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> >>>> --- >>>> drivers/infiniband/core/cache.c | 7 ++++++- >>>> drivers/infiniband/core/device.c | 9 +++++++++ >>>> include/rdma/ib_cache.h | 6 ++++++ >>>> include/rdma/ib_verbs.h | 6 ++++++ >>>> 4 files changed, 27 insertions(+), 1 deletion(-) >>> >>> Can you please help me to understand how cache is updated? >>> >>> There are a lot of calls to ib_query_port() and I wonder how callers can >>> get new GID after it was changed in already initialized cache. >> >> The cache is initialized when it is created, just before the bit IB_PORT_CACHE_INITIALIZED is set in flags. >> >> After commit d58c23c92548 ("IB/core: Only update PKEY and GID caches on respective events"), the GID portion of the cache is updated when a IB_EVENT_GID_CHANGE event is received. >> >> Before said commit, it was updated on any event. > > This part is clear to me, the missing piece is to understand what will > happen if cache and GID are not in sync because of asynchronous nature of > events. The calls to ib_query_port() are asynchronous with GID change. Consider the time line: Time HCA cache t0 GIDa GIDa t1 t2 GIDb GIDa t3 t4 GIDb GIDb t5 Prior to this commit, if ib_query_port() was called at t1 or at t3, two different GIDs would be retrieved. With this commit, if ib_query_port() was called at t3 or t5, two different GIDs would be retrieved. The scenario is the same, only skewed in time. Thxs, Håkon
On 6/3/2021 2:50 PM, Anand Khoje wrote: > External email: Use caution opening links or attachments > > > ib_query_port() calls device->ops.query_port() to get the port > attributes. The method of querying is device driver specific. > The same function calls device->ops.query_gid() to get the GID and > extract the subnet_prefix (gid_prefix). > > The GID and subnet_prefix are stored in a cache. But they do not get > read from the cache if the device is an Infiniband device. The > following change takes advantage of the cached subnet_prefix. > Testing with RDBMS has shown a significant improvement in performance > with this change. > > The function ib_cache_is_initialised() is introduced because > ib_query_port() gets called early in the stage when the cache is not > built while reading port immutable property. > > In that case, the default GID still gets read from HCA for IB link- > layer devices. > > Fixes: fad61ad ("IB/core: Add subnet prefix to port info") > Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> > Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> > --- > drivers/infiniband/core/cache.c | 7 ++++++- > drivers/infiniband/core/device.c | 9 +++++++++ > include/rdma/ib_cache.h | 6 ++++++ > include/rdma/ib_verbs.h | 6 ++++++ > 4 files changed, 27 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > index b6700ad..724ac0e 100644 > --- a/drivers/infiniband/core/cache.c > +++ b/drivers/infiniband/core/cache.c > @@ -1624,6 +1624,8 @@ int ib_cache_setup_one(struct ib_device *device) > err = ib_cache_update(device, p, true); > if (err) > return err; > + set_bit(IB_PORT_CACHE_INITIALIZED, > + &device->port_data[p].flags); > } > > return 0; > @@ -1639,8 +1641,11 @@ void ib_cache_release_one(struct ib_device *device) > * all the device's resources when the cache could no > * longer be accessed. > */ > - rdma_for_each_port (device, p) > + rdma_for_each_port (device, p) { > + clear_bit(IB_PORT_CACHE_INITIALIZED, > + &device->port_data[p].flags); > kfree(device->port_data[p].cache.pkey); > + } > > gid_table_release_one(device); > } Do we need to clear it in gid_table_cleanup_one()?
> On 3 Jun 2021, at 14:10, Mark Zhang <markzhang@nvidia.com> wrote: > > On 6/3/2021 2:50 PM, Anand Khoje wrote: >> External email: Use caution opening links or attachments >> ib_query_port() calls device->ops.query_port() to get the port >> attributes. The method of querying is device driver specific. >> The same function calls device->ops.query_gid() to get the GID and >> extract the subnet_prefix (gid_prefix). >> The GID and subnet_prefix are stored in a cache. But they do not get >> read from the cache if the device is an Infiniband device. The >> following change takes advantage of the cached subnet_prefix. >> Testing with RDBMS has shown a significant improvement in performance >> with this change. >> The function ib_cache_is_initialised() is introduced because >> ib_query_port() gets called early in the stage when the cache is not >> built while reading port immutable property. >> In that case, the default GID still gets read from HCA for IB link- >> layer devices. >> Fixes: fad61ad ("IB/core: Add subnet prefix to port info") >> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> >> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> >> --- >> drivers/infiniband/core/cache.c | 7 ++++++- >> drivers/infiniband/core/device.c | 9 +++++++++ >> include/rdma/ib_cache.h | 6 ++++++ >> include/rdma/ib_verbs.h | 6 ++++++ >> 4 files changed, 27 insertions(+), 1 deletion(-) >> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c >> index b6700ad..724ac0e 100644 >> --- a/drivers/infiniband/core/cache.c >> +++ b/drivers/infiniband/core/cache.c >> @@ -1624,6 +1624,8 @@ int ib_cache_setup_one(struct ib_device *device) >> err = ib_cache_update(device, p, true); >> if (err) >> return err; >> + set_bit(IB_PORT_CACHE_INITIALIZED, >> + &device->port_data[p].flags); >> } >> return 0; >> @@ -1639,8 +1641,11 @@ void ib_cache_release_one(struct ib_device *device) >> * all the device's resources when the cache could no >> * longer be accessed. >> */ >> - rdma_for_each_port (device, p) >> + rdma_for_each_port (device, p) { >> + clear_bit(IB_PORT_CACHE_INITIALIZED, >> + &device->port_data[p].flags); >> kfree(device->port_data[p].cache.pkey); >> + } >> gid_table_release_one(device); >> } > > Do we need to clear it in gid_table_cleanup_one()? Good point. Is it feasible that ib_query_port() can be called on a device that has been removed? If yes, we need it in gid_table_cleanup_one() as well. Thxs, Håkon
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index b6700ad..724ac0e 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -1624,6 +1624,8 @@ int ib_cache_setup_one(struct ib_device *device) err = ib_cache_update(device, p, true); if (err) return err; + set_bit(IB_PORT_CACHE_INITIALIZED, + &device->port_data[p].flags); } return 0; @@ -1639,8 +1641,11 @@ void ib_cache_release_one(struct ib_device *device) * all the device's resources when the cache could no * longer be accessed. */ - rdma_for_each_port (device, p) + rdma_for_each_port (device, p) { + clear_bit(IB_PORT_CACHE_INITIALIZED, + &device->port_data[p].flags); kfree(device->port_data[p].cache.pkey); + } gid_table_release_one(device); } diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index c2fa592..b3e20ac 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2060,6 +2060,15 @@ static int __ib_query_port(struct ib_device *device, IB_LINK_LAYER_INFINIBAND) return 0; + if (!ib_cache_is_initialised(device, port_num)) + goto query_gid_from_device; + + ib_get_cached_subnet_prefix(device, port_num, + &port_attr->subnet_prefix); + + return 0; + +query_gid_from_device: err = device->ops.query_gid(device, port_num, 0, &gid); if (err) return err; diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h index 226ae37..1526fc6 100644 --- a/include/rdma/ib_cache.h +++ b/include/rdma/ib_cache.h @@ -114,4 +114,10 @@ ssize_t rdma_query_gid_table(struct ib_device *device, struct ib_uverbs_gid_entry *entries, size_t max_entries); +static inline bool ib_cache_is_initialised(struct ib_device *device, + u8 port_num) +{ + return test_bit(IB_PORT_CACHE_INITIALIZED, + &device->port_data[port_num].flags); +} #endif /* _IB_CACHE_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 41cbec5..ad2a55e 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2169,6 +2169,10 @@ struct ib_port_immutable { u32 max_mad_size; }; +enum ib_port_data_flags { + IB_PORT_CACHE_INITIALIZED = 1 << 0, +}; + struct ib_port_data { struct ib_device *ib_dev; @@ -2178,6 +2182,8 @@ struct ib_port_data { spinlock_t netdev_lock; + unsigned long flags; + struct list_head pkey_list; struct ib_port_cache cache;