Message ID | 89baeee29503df46dd28a6a5edbad9ec1a1d86f1.1635055496.git.leonro@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Two IB/core fixes | expand |
On Sun, Oct 24, 2021 at 09:08:21AM +0300, Leon Romanovsky wrote: > From: Mark Zhang <markzhang@nvidia.com> > > Initialize the rdma_hw_stats "lock" field when do allocation, to fix the > warning below. Then we don't need to initialize it in sysfs, remove it. This is a fine cleanup, but this does not describe the bug properly, or have the right fixes line.. The issue is here: static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, struct ib_qp *qp, enum rdma_nl_counter_mode mode) { counter->stats = dev->ops.counter_alloc_stats(counter); if (!counter->stats) goto err_stats; Which does not init counter->stat's mutex. And trim the oops reports, don't include the usless ? fns, timestamps or other junk. Jason
On Mon, Oct 25, 2021 at 11:50:43AM -0300, Jason Gunthorpe wrote: > On Sun, Oct 24, 2021 at 09:08:21AM +0300, Leon Romanovsky wrote: > > From: Mark Zhang <markzhang@nvidia.com> > > > > Initialize the rdma_hw_stats "lock" field when do allocation, to fix the > > warning below. Then we don't need to initialize it in sysfs, remove it. > > This is a fine cleanup, but this does not describe the bug properly, > or have the right fixes line.. I think that this Fixes line should be instead. Fixes: 0a0800ce2a6a ("RDMA/core: Add a helper API rdma_free_hw_stats_struct") > > The issue is here: > > static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, > struct ib_qp *qp, > enum rdma_nl_counter_mode mode) > { > counter->stats = dev->ops.counter_alloc_stats(counter); > if (!counter->stats) > goto err_stats; > > Which does not init counter->stat's mutex. This is exactly what Mark is doing here. alloc_and_bind() -> dev->ops.counter_alloc_stats -> mlx5_ib_counter_alloc_stats -> do_alloc_stats() -> rdma_alloc_hw_stats_struct() -> mutex_init(&stats->lock); <- Mark's change. > > And trim the oops reports, don't include the usless ? fns, timestamps > or other junk. I don't like when people "beatify" kernel reports, many times whey are removing too much information. Thanks > > Jason
On Tue, Oct 26, 2021 at 11:27:57AM +0300, Leon Romanovsky wrote: > On Mon, Oct 25, 2021 at 11:50:43AM -0300, Jason Gunthorpe wrote: > > On Sun, Oct 24, 2021 at 09:08:21AM +0300, Leon Romanovsky wrote: > > > From: Mark Zhang <markzhang@nvidia.com> > > > > > > Initialize the rdma_hw_stats "lock" field when do allocation, to fix the > > > warning below. Then we don't need to initialize it in sysfs, remove it. > > > > This is a fine cleanup, but this does not describe the bug properly, > > or have the right fixes line.. > > I think that this Fixes line should be instead. > Fixes: 0a0800ce2a6a ("RDMA/core: Add a helper API rdma_free_hw_stats_struct") No, I don't think so, it should be the commit that added alloc_and_bind() > > The issue is here: > > > > static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, > > struct ib_qp *qp, > > enum rdma_nl_counter_mode mode) > > { > > counter->stats = dev->ops.counter_alloc_stats(counter); > > if (!counter->stats) > > goto err_stats; > > > > Which does not init counter->stat's mutex. > > This is exactly what Mark is doing here. > > alloc_and_bind() > -> dev->ops.counter_alloc_stats > -> mlx5_ib_counter_alloc_stats > -> do_alloc_stats() > -> rdma_alloc_hw_stats_struct() > -> mutex_init(&stats->lock); <- Mark's change. Yes, I know, the patch is fine, the commit message just needs to be accurate > > And trim the oops reports, don't include the usless ? fns, timestamps > > or other junk. > > I don't like when people "beatify" kernel reports, many times whey are > removing too much information. There is too much junk in the raw oops messages Jason
On Tue, Oct 26, 2021 at 09:05:54AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 26, 2021 at 11:27:57AM +0300, Leon Romanovsky wrote: > > On Mon, Oct 25, 2021 at 11:50:43AM -0300, Jason Gunthorpe wrote: > > > On Sun, Oct 24, 2021 at 09:08:21AM +0300, Leon Romanovsky wrote: > > > > From: Mark Zhang <markzhang@nvidia.com> > > > > > > > > Initialize the rdma_hw_stats "lock" field when do allocation, to fix the > > > > warning below. Then we don't need to initialize it in sysfs, remove it. > > > > > > This is a fine cleanup, but this does not describe the bug properly, > > > or have the right fixes line.. > > > > I think that this Fixes line should be instead. > > Fixes: 0a0800ce2a6a ("RDMA/core: Add a helper API rdma_free_hw_stats_struct") > > No, I don't think so, it should be the commit that added > alloc_and_bind() The alloc_and_bind() is a merge between other functions. The issue existed even before. It is worth to stop at some point to dig and IMHO proposed Fixes line is good enough (at least for me). > > > > The issue is here: > > > > > > static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, > > > struct ib_qp *qp, > > > enum rdma_nl_counter_mode mode) > > > { > > > counter->stats = dev->ops.counter_alloc_stats(counter); > > > if (!counter->stats) > > > goto err_stats; > > > > > > Which does not init counter->stat's mutex. > > > > This is exactly what Mark is doing here. > > > > alloc_and_bind() > > -> dev->ops.counter_alloc_stats > > -> mlx5_ib_counter_alloc_stats > > -> do_alloc_stats() > > -> rdma_alloc_hw_stats_struct() > > -> mutex_init(&stats->lock); <- Mark's change. > > Yes, I know, the patch is fine, the commit message just needs to be > accurate > > > > And trim the oops reports, don't include the usless ? fns, timestamps > > > or other junk. > > > > I don't like when people "beatify" kernel reports, many times whey are > > removing too much information. > > There is too much junk in the raw oops messages Right, but it is better to have more info than to much trimmed one. Thanks > > Jason
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 8626dfbf2199..a3f84b50c46a 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -911,7 +911,6 @@ alloc_hw_stats_device(struct ib_device *ibdev) if (!data->group.attrs) goto err_free_data; - mutex_init(&stats->lock); data->group.name = "hw_counters"; data->stats = stats; return data; @@ -1018,7 +1017,6 @@ alloc_hw_stats_port(struct ib_port *port, struct attribute_group *group) if (!group->attrs) goto err_free_data; - mutex_init(&stats->lock); group->name = "hw_counters"; data->stats = stats; return data; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 47cf273d0678..692d5ff657df 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -3002,6 +3002,7 @@ struct rdma_hw_stats *rdma_alloc_hw_stats_struct( stats->descs = descs; stats->num_counters = num_counters; stats->lifespan = msecs_to_jiffies(lifespan); + mutex_init(&stats->lock); return stats;