Message ID | 3211fe8a-33fb-37ca-e192-ad1f116f4acd@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | fix serverl issues reported by Coverity | expand |
Hi, I actually just ran into the NULL deref issue that is fixed here. Bu I have a question for the experts: what might cause libndctl to run into the NULL deref like below ? Program terminated with signal 11, Segmentation fault. #0 ndctl_pfn_get_bus (pfn=pfn@entry=0x0) at libndctl.c:5540 5540 return pfn->region->bus; (gdb) print pfn $1 = (struct ndctl_pfn *) 0x0 (gdb) frame 4 #4 0x000000000040ca70 in setup_namespace (region=region@entry=0x109d910, ndns=ndns@entry=0x10a7d40, p=p@entry=0x7ffd8ff73b90) at namespace.c:570 570 try(ndctl_dax, set_uuid, dax, uuid); (gdb) info locals __rc = <optimized out> dax = 0x0 What I did was to let 2 threads run "create-namespace all" in a tight loop, and 2 other threads run "destroy-namespace all" in a tight loop, while chasing an year old issue that randomly resurfaces - "nd_region region1: allocation underrun: 0x0 of 0x40000000 bytes" In addition, there are kmemleaks, # cat /sys/kernel/debug/kmemleak [..] unreferenced object 0xffff976bd46f6240 (size 64): comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00 .......... .7... ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00 ....8........... backtrace: [<00000000064003cf>] __kmalloc_track_caller+0x136/0x379 [<00000000d85e3c52>] krealloc+0x67/0x92 [<00000000d7d3ba8a>] __alloc_dev_dax_range+0x73/0x25c [<0000000027d58626>] devm_create_dev_dax+0x27d/0x416 [<00000000434abd43>] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core] [<0000000083726c1c>] dax_pmem_probe+0x10/0x1f [dax_pmem] [<00000000b5f2319c>] nvdimm_bus_probe+0x9d/0x340 [libnvdimm] [<00000000c055e544>] really_probe+0x230/0x48d [<000000006cabd38e>] driver_probe_device+0x122/0x13b [<0000000029c7b95a>] device_driver_attach+0x5b/0x60 [<0000000053e5659b>] bind_store+0xb7/0xc3 [<00000000d3bdaadc>] drv_attr_store+0x27/0x31 [<00000000949069c5>] sysfs_kf_write+0x4a/0x57 [<000000004a8b5adf>] kernfs_fop_write+0x150/0x1e5 [<00000000bded60f0>] __vfs_write+0x1b/0x34 [<00000000b92900f0>] vfs_write+0xd8/0x1d1 thanks, -jane On 11/24/2020 5:00 PM, Zhiqiang Liu wrote: > Changes: V1->V2 > - add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>. > > > Recently, we use Coverity to analysis the ndctl package. > Several issues should be resolved to make Coverity happy. > > Zhiqiang Liu (8): > namespace: check whether pfn|dax|btt is NULL in setup_namespace > lib/libndctl: fix memory leakage problem in add_bus > libdaxctl: fix memory leakage in add_dax_region() > dimm: fix potential fd leakage in dimm_action() > util/help: check whether strdup returns NULL in exec_man_konqueror > lib/inject: check whether cmd is created successfully > libndctl: check whether ndctl_btt_get_namespace returns NULL in > callers > namespace: check whether seed is NULL in validate_namespace_options > > daxctl/lib/libdaxctl.c | 3 +++ > ndctl/dimm.c | 12 +++++++----- > ndctl/lib/inject.c | 8 ++++++++ > ndctl/lib/libndctl.c | 1 + > ndctl/namespace.c | 23 ++++++++++++++++++----- > test/libndctl.c | 16 +++++++++++----- > test/parent-uuid.c | 2 +- > util/help.c | 8 +++++++- > util/json.c | 3 +++ > 9 files changed, 59 insertions(+), 17 deletions(-) >
On Wed, 2020-11-25 at 09:00 +0800, Zhiqiang Liu wrote: > Changes: V1->V2 > - add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>. > > > Recently, we use Coverity to analysis the ndctl package. > Several issues should be resolved to make Coverity happy. > > Zhiqiang Liu (8): > namespace: check whether pfn|dax|btt is NULL in setup_namespace > lib/libndctl: fix memory leakage problem in add_bus > libdaxctl: fix memory leakage in add_dax_region() > dimm: fix potential fd leakage in dimm_action() > util/help: check whether strdup returns NULL in exec_man_konqueror > lib/inject: check whether cmd is created successfully > libndctl: check whether ndctl_btt_get_namespace returns NULL in > callers > namespace: check whether seed is NULL in validate_namespace_options > > daxctl/lib/libdaxctl.c | 3 +++ > ndctl/dimm.c | 12 +++++++----- > ndctl/lib/inject.c | 8 ++++++++ > ndctl/lib/libndctl.c | 1 + > ndctl/namespace.c | 23 ++++++++++++++++++----- > test/libndctl.c | 16 +++++++++++----- > test/parent-uuid.c | 2 +- > util/help.c | 8 +++++++- > util/json.c | 3 +++ > 9 files changed, 59 insertions(+), 17 deletions(-) > Hi Zhiquiang, The patches look good, and I've applied them for v71. However one thing to note: If you're sending a v2, it is preferable to respin the whole series, even if you're only changing a subset of (even a single) patch in the series. That allows tools like 'b4' to just Do The Right Thing, and make sure all the latest patches are grabbed. In this case, especially, your cover letter promises 8 patches (0/8), but there is only one that follows. This confuses 'b4': ERROR: missing [2/8]! ERROR: missing [3/8]! ERROR: missing [4/8]! ...etc I've fixed it up manually for this, but just some things to consider for the future. Thanks, -Vishal
On 2020/12/17 11:41, Verma, Vishal L wrote: > On Wed, 2020-11-25 at 09:00 +0800, Zhiqiang Liu wrote: >> Changes: V1->V2 >> - add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>. >> >> >> Recently, we use Coverity to analysis the ndctl package. >> Several issues should be resolved to make Coverity happy. >> >> Zhiqiang Liu (8): >> namespace: check whether pfn|dax|btt is NULL in setup_namespace >> lib/libndctl: fix memory leakage problem in add_bus >> libdaxctl: fix memory leakage in add_dax_region() >> dimm: fix potential fd leakage in dimm_action() >> util/help: check whether strdup returns NULL in exec_man_konqueror >> lib/inject: check whether cmd is created successfully >> libndctl: check whether ndctl_btt_get_namespace returns NULL in >> callers >> namespace: check whether seed is NULL in validate_namespace_options >> >> daxctl/lib/libdaxctl.c | 3 +++ >> ndctl/dimm.c | 12 +++++++----- >> ndctl/lib/inject.c | 8 ++++++++ >> ndctl/lib/libndctl.c | 1 + >> ndctl/namespace.c | 23 ++++++++++++++++++----- >> test/libndctl.c | 16 +++++++++++----- >> test/parent-uuid.c | 2 +- >> util/help.c | 8 +++++++- >> util/json.c | 3 +++ >> 9 files changed, 59 insertions(+), 17 deletions(-) >> > Hi Zhiquiang, > > The patches look good, and I've applied them for v71. However one thing > to note: > > If you're sending a v2, it is preferable to respin the whole series, > even if you're only changing a subset of (even a single) patch in the > series. That allows tools like 'b4' to just Do The Right Thing, and make > sure all the latest patches are grabbed. > > In this case, especially, your cover letter promises 8 patches (0/8), > but there is only one that follows. This confuses 'b4': > > ERROR: missing [2/8]! > ERROR: missing [3/8]! > ERROR: missing [4/8]! > ...etc > > I've fixed it up manually for this, but just some things to consider for > the future. > Thanks for the suggestion。 Regards Zhiqiang Liu > Thanks, > -Vishal >
On Tue, Dec 8, 2020 at 4:20 PM Jane Chu <jane.chu@oracle.com> wrote: > > Hi, > > I actually just ran into the NULL deref issue that is fixed here. > > Bu I have a question for the experts: > what might cause libndctl to run into the NULL deref like below ? > > Program terminated with signal 11, Segmentation fault. > #0 ndctl_pfn_get_bus (pfn=pfn@entry=0x0) at libndctl.c:5540 > 5540 return pfn->region->bus; > > (gdb) print pfn > $1 = (struct ndctl_pfn *) 0x0 > (gdb) frame 4 > #4 0x000000000040ca70 in setup_namespace (region=region@entry=0x109d910, > ndns=ndns@entry=0x10a7d40, p=p@entry=0x7ffd8ff73b90) at namespace.c:570 > 570 try(ndctl_dax, set_uuid, dax, uuid); > (gdb) info locals > __rc = <optimized out> > dax = 0x0 > > What I did was to let 2 threads run "create-namespace all" in a tight > loop, and 2 other threads run "destroy-namespace all" in a tight loop, This will definitely cause libndctl to get confused the expectation is that only one ndctl instance is operating on one region at a time. What likely happened is TOCTOU in ndctl_region_get_pfn_seed() where the seed device name is destroyed by the time it tries to convert it to an ndctl object. The reason libndctl does not perform its own locking is to keep the library stateless and allow locking to imposed from a higher level. Two ndctl instances must not be allowed to operate in the same region otherwise the 2 libndctl instances will get out of sync. This is no different than 2 fdisk processes running at the same time, they are going to invalidate each other's view of the cached partition state. The fix is not for fdisk to implement locking internally instead it requires the admin to arrange for only one fdisk to run against one disk at a time. > while chasing an year old issue that randomly resurfaces - > "nd_region region1: allocation underrun: 0x0 of 0x40000000 bytes" It would be interesting to get more data on the sequence of allocations that lead up to this event, and a dump of the resource tree when this happens: for (i = 0; i < nd_region->ndr_mappings; i++) ndd = to_ndd(&nd_region->mapping[i]); for_each_dpa_resource(...) nd_dbg_dpa(...) > > In addition, there are kmemleaks, > # cat /sys/kernel/debug/kmemleak > [..] > unreferenced object 0xffff976bd46f6240 (size 64): > comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00 .......... .7... > ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00 ....8........... > backtrace: > [<00000000064003cf>] __kmalloc_track_caller+0x136/0x379 > [<00000000d85e3c52>] krealloc+0x67/0x92 > [<00000000d7d3ba8a>] __alloc_dev_dax_range+0x73/0x25c > [<0000000027d58626>] devm_create_dev_dax+0x27d/0x416 > [<00000000434abd43>] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core] > [<0000000083726c1c>] dax_pmem_probe+0x10/0x1f [dax_pmem] > [<00000000b5f2319c>] nvdimm_bus_probe+0x9d/0x340 [libnvdimm] > [<00000000c055e544>] really_probe+0x230/0x48d > [<000000006cabd38e>] driver_probe_device+0x122/0x13b > [<0000000029c7b95a>] device_driver_attach+0x5b/0x60 > [<0000000053e5659b>] bind_store+0xb7/0xc3 > [<00000000d3bdaadc>] drv_attr_store+0x27/0x31 > [<00000000949069c5>] sysfs_kf_write+0x4a/0x57 > [<000000004a8b5adf>] kernfs_fop_write+0x150/0x1e5 > [<00000000bded60f0>] __vfs_write+0x1b/0x34 > [<00000000b92900f0>] vfs_write+0xd8/0x1d1 Hmm... maybe this? diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 27513d311242..506549235e03 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1393,6 +1393,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) err_pgmap: free_dev_dax_ranges(dev_dax); err_range: + kfree(dev_dax->ranges); free_dev_dax_id(dev_dax); err_id: kfree(dev_dax);