Message ID | 20240522114346.42951-1-yangxingui@huawei.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | driver core: Add log when devtmpfs create node failed | expand |
On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote: > Currently, no exception information is output when devtmpfs create node > failed, so add log info for it. Why? Who is going to do something with this? > > Signed-off-by: Xingui Yang <yangxingui@huawei.com> > --- > drivers/base/core.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 5f4e03336e68..32a41e0472b2 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -3691,7 +3691,10 @@ int device_add(struct device *dev) > if (error) > goto SysEntryError; > > - devtmpfs_create_node(dev); > + error = devtmpfs_create_node(dev); > + if (error) > + pr_info("devtmpfs create node for %s failed: %d\n", > + dev_name(dev), error); Why is an error message pr_info()? And again, why is this needed? If this needs to be checked, why are you now checking it but ignoring the error? What would this help with? thanks, greg k-h
Hi, Greg On 2024/5/22 20:23, Greg KH wrote: > On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote: >> Currently, no exception information is output when devtmpfs create node >> failed, so add log info for it. > > Why? Who is going to do something with this? We execute the lsscsi command after the disk is connected, we occasionally find that some disks do not have dev nodes and these disks cannot be used. However, there is no abnormal log output during disk scanning. We analyze that it may be caused by the failure of devtmpfs create dev node, so the log is added here. The lscsi command query results and kernel logs as follows: [root@localhost]# lsscsi [9:0:4:0] disk ATA ST10000NM0086-2A SN05 - kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 link_rate=10(sata) kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1 (5000000000000900) kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731 kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0 phy_state=0xe kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0 down kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 link_rate=10(sata) kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max UDMA/133 kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ (depth 32), AA kernel: [586670.027471] ata19.00: configured for UDMA/133 kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 kernel: [586670.037541] sas: ata19: end_device-9:0: model:ST10000NM0086-2AA101 serial: ZA2B3PR2 kernel: [586670.100856] scsi 9:0:4:0: Direct-Access ATA ST10000NM0086-2A SN05 PQ: 0 ANSI: 5 kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB) kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0 kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0 kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0 already(1)! kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk > >> >> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >> --- >> drivers/base/core.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/base/core.c b/drivers/base/core.c >> index 5f4e03336e68..32a41e0472b2 100644 >> --- a/drivers/base/core.c >> +++ b/drivers/base/core.c >> @@ -3691,7 +3691,10 @@ int device_add(struct device *dev) >> if (error) >> goto SysEntryError; >> >> - devtmpfs_create_node(dev); >> + error = devtmpfs_create_node(dev); >> + if (error) >> + pr_info("devtmpfs create node for %s failed: %d\n", >> + dev_name(dev), error); > > Why is an error message pr_info()? Do you recommend using pr_err()? > > And again, why is this needed? If this needs to be checked, why are you > now checking it but ignoring the error? > > What would this help with? As above, we want to get the error info when the dev node fails to be created. We currently haven't figured out how to handle this exception well. But judging from the problems we are currently encountering, some may be because the corresponding dev node already exists, causing the creation to fail, but the node information is incorrect and the device cannot be used. as follows: [root@localhost]# ll /dev/sdk -rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk Do you have any suggestions? Thanks, Xingui
On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote: > Hi, Greg > > On 2024/5/22 20:23, Greg KH wrote: > > On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote: > > > Currently, no exception information is output when devtmpfs create node > > > failed, so add log info for it. > > > > Why? Who is going to do something with this? > We execute the lsscsi command after the disk is connected, we occasionally > find that some disks do not have dev nodes and these disks cannot be used. Ok, but why do you think that devtmpfs create failed? > However, there is no abnormal log output during disk scanning. We analyze > that it may be caused by the failure of devtmpfs create dev node, so the log > is added here. But is that the case? Why is devtmpfs failing? Shouldn't we fix that instead? > The lscsi command query results and kernel logs as follows: > > [root@localhost]# lsscsi > [9:0:4:0] disk ATA ST10000NM0086-2A SN05 - > > kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 > link_rate=10(sata) > kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1 > (5000000000000900) > kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731 > kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found > kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 > kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0 > phy_state=0xe > kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0 > down > kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 > link_rate=10(sata) > kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max > UDMA/133 > kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ > (depth 32), AA > kernel: [586670.027471] ata19.00: configured for UDMA/133 > kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: > 0 tries: 1 > kernel: [586670.037541] sas: ata19: end_device-9:0: > model:ST10000NM0086-2AA101 serial: ZA2B3PR2 > kernel: [586670.100856] scsi 9:0:4:0: Direct-Access ATA ST10000NM0086-2A > SN05 PQ: 0 ANSI: 5 > kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical > blocks: (10.0 TB/9.10 TiB) > kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks > kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off > kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache: > enabled, doesn't support DPO or FUA > kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0 > kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0 > kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0 > already(1)! > kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk Looks like sdk was found properly, what's the problem? > > > > > > > > > Signed-off-by: Xingui Yang <yangxingui@huawei.com> > > > --- > > > drivers/base/core.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > > index 5f4e03336e68..32a41e0472b2 100644 > > > --- a/drivers/base/core.c > > > +++ b/drivers/base/core.c > > > @@ -3691,7 +3691,10 @@ int device_add(struct device *dev) > > > if (error) > > > goto SysEntryError; > > > - devtmpfs_create_node(dev); > > > + error = devtmpfs_create_node(dev); > > > + if (error) > > > + pr_info("devtmpfs create node for %s failed: %d\n", > > > + dev_name(dev), error); > > > > Why is an error message pr_info()? > Do you recommend using pr_err()? Do not print errors at the information level :) > > And again, why is this needed? If this needs to be checked, why are you > > now checking it but ignoring the error? > > > > What would this help with? > As above, we want to get the error info when the dev node fails to be > created. We currently haven't figured out how to handle this exception well. > But judging from the problems we are currently encountering, some may be > because the corresponding dev node already exists, causing the creation to > fail, but the node information is incorrect and the device cannot be used. > as follows: > [root@localhost]# ll /dev/sdk > -rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk Looks like the device node is created to me. What is incorrect about it, the values? What is 'll' an alias for? And are you sure that other tools aren't getting the device node creation uevent and doing something with it in userspace? How do you know this is the kernel failing? Wait, is /dev/sdk really a device node and not a file? Perhaps something else wrote to it first, before it was created? And that's why devtmpfs couldn't create it. That sounds like a userspace error, nothing the kernel can do about it. thanks, greg k-h
Hi Greg, On 2024/5/23 15:25, Greg KH wrote: > On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote: >> Hi, Greg >> >> On 2024/5/22 20:23, Greg KH wrote: >>> On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote: >>>> Currently, no exception information is output when devtmpfs create node >>>> failed, so add log info for it. >>> >>> Why? Who is going to do something with this? >> We execute the lsscsi command after the disk is connected, we occasionally >> find that some disks do not have dev nodes and these disks cannot be used. > > Ok, but why do you think that devtmpfs create failed? I found that lsscsi will traverse the dev node and obtain device major and min. If no matching dev node is found, it will display "- ". > >> However, there is no abnormal log output during disk scanning. We analyze >> that it may be caused by the failure of devtmpfs create dev node, so the log >> is added here. > > But is that the case? Why is devtmpfs failing? Shouldn't we fix that > instead? My subsequent reply touches on these points. > >> The lscsi command query results and kernel logs as follows: >> >> [root@localhost]# lsscsi >> [9:0:4:0] disk ATA ST10000NM0086-2A SN05 - >> >> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 >> link_rate=10(sata) >> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1 >> (5000000000000900) >> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731 >> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found >> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 >> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0 >> phy_state=0xe >> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0 >> down >> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 >> link_rate=10(sata) >> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max >> UDMA/133 >> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ >> (depth 32), AA >> kernel: [586670.027471] ata19.00: configured for UDMA/133 >> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: >> 0 tries: 1 >> kernel: [586670.037541] sas: ata19: end_device-9:0: >> model:ST10000NM0086-2AA101 serial: ZA2B3PR2 >> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access ATA ST10000NM0086-2A >> SN05 PQ: 0 ANSI: 5 >> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical >> blocks: (10.0 TB/9.10 TiB) >> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks >> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off >> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache: >> enabled, doesn't support DPO or FUA >> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0 >> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0 >> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0 >> already(1)! >> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk > > Looks like sdk was found properly, what's the problem? Yes, this problem occurs occasionally. There is no exception log when scanning the disk, but the disk cannot be used. It has been confirmed that it is related to fio testing. When the dev node does not exist, fio may actively create this file. > >> >>> >>>> >>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>>> --- >>>> drivers/base/core.c | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/base/core.c b/drivers/base/core.c >>>> index 5f4e03336e68..32a41e0472b2 100644 >>>> --- a/drivers/base/core.c >>>> +++ b/drivers/base/core.c >>>> @@ -3691,7 +3691,10 @@ int device_add(struct device *dev) >>>> if (error) >>>> goto SysEntryError; >>>> - devtmpfs_create_node(dev); >>>> + error = devtmpfs_create_node(dev); >>>> + if (error) >>>> + pr_info("devtmpfs create node for %s failed: %d\n", >>>> + dev_name(dev), error); >>> >>> Why is an error message pr_info()? >> Do you recommend using pr_err()? > > Do not print errors at the information level :) Ok. > >>> And again, why is this needed? If this needs to be checked, why are you >>> now checking it but ignoring the error? >>> >>> What would this help with? >> As above, we want to get the error info when the dev node fails to be >> created. We currently haven't figured out how to handle this exception well. >> But judging from the problems we are currently encountering, some may be >> because the corresponding dev node already exists, causing the creation to >> fail, but the node information is incorrect and the device cannot be used. >> as follows: >> [root@localhost]# ll /dev/sdk >> -rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk > > Looks like the device node is created to me. What is incorrect about > it, the values? What is 'll' an alias for? And are you sure that other > tools aren't getting the device node creation uevent and doing something > with it in userspace? How do you know this is the kernel failing? > > Wait, is /dev/sdk really a device node and not a file? Perhaps > something else wrote to it first, before it was created? And that's why > devtmpfs couldn't create it. That sounds like a userspace error, > nothing the kernel can do about it. Yes, /dev/sdk is a file that may create by fio. and I tried to do the same and could reproduce the problem, and after adding the log, it shows that the file already exists. # ll /dev/sdc -rw-------. 1 root root 5 Jul 22 00:04 /dev/sdc [ 366.306045] sas: ata8: end_device-4:1: model:ST2000NM0055-1V4104 serial: ZC21FDJ3 [ 366.306991] scsi 4:0:8:0: Direct-Access ATA ST2000NM0055-1V4 TN05 PQ: 0 ANSI: 5 [ 366.307204] sd 4:0:8:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB) [ 366.307212] sd 4:0:8:0: [sdc] Write Protect is off [ 366.307213] sd 4:0:8:0: [sdc] Mode Sense: 00 3a 00 00 [ 366.307222] sd 4:0:8:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 366.307308] sd 4:0:8:0: Attached scsi generic sg2 type 0 [ 366.307354] sas: DONE DISCOVERY on port 1, pid:8, result:0 [ 366.307364] sas: sas_form_port: phy1 belongs to port1 already(1)! [ 366.307384] devtmpfs create node for sdc failed: -17 [ 366.307880] sd 4:0:8:0: [sdc] Attached SCSI disk # lsscsi [4:0:8:0] disk ATA ST2000NM0055-1V4 TN05 - If we want to solve this problem, should we delete the existing files first when creating a dev node? Or just print a prompt indicating that the dev node creation failed. Thanks, Xingui .
On Thu, May 23, 2024 at 05:23:07PM +0800, yangxingui wrote: > Hi Greg, > > On 2024/5/23 15:25, Greg KH wrote: > > On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote: > > > Hi, Greg > > > > > > On 2024/5/22 20:23, Greg KH wrote: > > > > On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote: > > > > > Currently, no exception information is output when devtmpfs create node > > > > > failed, so add log info for it. > > > > > > > > Why? Who is going to do something with this? > > > We execute the lsscsi command after the disk is connected, we occasionally > > > find that some disks do not have dev nodes and these disks cannot be used. > > > > Ok, but why do you think that devtmpfs create failed? > I found that lsscsi will traverse the dev node and obtain device major and > min. If no matching dev node is found, it will display "- ". > > > > > However, there is no abnormal log output during disk scanning. We analyze > > > that it may be caused by the failure of devtmpfs create dev node, so the log > > > is added here. > > > > But is that the case? Why is devtmpfs failing? Shouldn't we fix that > > instead? > My subsequent reply touches on these points. > > > > > The lscsi command query results and kernel logs as follows: > > > > > > [root@localhost]# lsscsi > > > [9:0:4:0] disk ATA ST10000NM0086-2A SN05 - > > > > > > kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 > > > link_rate=10(sata) > > > kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1 > > > (5000000000000900) > > > kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731 > > > kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found > > > kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 > > > kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0 > > > phy_state=0xe > > > kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0 > > > down > > > kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 > > > link_rate=10(sata) > > > kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max > > > UDMA/133 > > > kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ > > > (depth 32), AA > > > kernel: [586670.027471] ata19.00: configured for UDMA/133 > > > kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: > > > 0 tries: 1 > > > kernel: [586670.037541] sas: ata19: end_device-9:0: > > > model:ST10000NM0086-2AA101 serial: ZA2B3PR2 > > > kernel: [586670.100856] scsi 9:0:4:0: Direct-Access ATA ST10000NM0086-2A > > > SN05 PQ: 0 ANSI: 5 > > > kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical > > > blocks: (10.0 TB/9.10 TiB) > > > kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks > > > kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off > > > kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache: > > > enabled, doesn't support DPO or FUA > > > kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0 > > > kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0 > > > kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0 > > > already(1)! > > > kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk > > > > Looks like sdk was found properly, what's the problem? > > Yes, this problem occurs occasionally. There is no exception log when > scanning the disk, but the disk cannot be used. It has been confirmed that > it is related to fio testing. When the dev node does not exist, fio may > actively create this file. So that's a userspace issue. If a device node is to be created, and the file is already present with that name, yes, we will fail to create it as obviously userspace did not want us to do so. It's not the kernel's job to protect userspace from doing foolish things itself, right? :) > If we want to solve this problem, should we delete the existing files first > when creating a dev node? No. > Or just print a prompt indicating that the dev node creation failed. We can do that, but will that cause error messages to be printed out for normal situations today where userspace does this on purpose? Again, this isn't fixing the root problem here (which is userspace doing something it shouldn't be doing), adding kernel log messages might be just noise at this point in time given that it has been operating this way for many years, if not decades. thanks, greg k-h
On 2024/5/23 17:35, Greg KH wrote: > On Thu, May 23, 2024 at 05:23:07PM +0800, yangxingui wrote: >> Hi Greg, >> >> On 2024/5/23 15:25, Greg KH wrote: >>> On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote: >>>> Hi, Greg >>>> >>>> On 2024/5/22 20:23, Greg KH wrote: >>>>> On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote: >>>>>> Currently, no exception information is output when devtmpfs create node >>>>>> failed, so add log info for it. >>>>> >>>>> Why? Who is going to do something with this? >>>> We execute the lsscsi command after the disk is connected, we occasionally >>>> find that some disks do not have dev nodes and these disks cannot be used. >>> >>> Ok, but why do you think that devtmpfs create failed? >> I found that lsscsi will traverse the dev node and obtain device major and >> min. If no matching dev node is found, it will display "- ". >>> >>>> However, there is no abnormal log output during disk scanning. We analyze >>>> that it may be caused by the failure of devtmpfs create dev node, so the log >>>> is added here. >>> >>> But is that the case? Why is devtmpfs failing? Shouldn't we fix that >>> instead? >> My subsequent reply touches on these points. >>> >>>> The lscsi command query results and kernel logs as follows: >>>> >>>> [root@localhost]# lsscsi >>>> [9:0:4:0] disk ATA ST10000NM0086-2A SN05 - >>>> >>>> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 >>>> link_rate=10(sata) >>>> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1 >>>> (5000000000000900) >>>> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731 >>>> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found >>>> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 >>>> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0 >>>> phy_state=0xe >>>> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0 >>>> down >>>> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 >>>> link_rate=10(sata) >>>> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max >>>> UDMA/133 >>>> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ >>>> (depth 32), AA >>>> kernel: [586670.027471] ata19.00: configured for UDMA/133 >>>> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: >>>> 0 tries: 1 >>>> kernel: [586670.037541] sas: ata19: end_device-9:0: >>>> model:ST10000NM0086-2AA101 serial: ZA2B3PR2 >>>> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access ATA ST10000NM0086-2A >>>> SN05 PQ: 0 ANSI: 5 >>>> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical >>>> blocks: (10.0 TB/9.10 TiB) >>>> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks >>>> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off >>>> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache: >>>> enabled, doesn't support DPO or FUA >>>> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0 >>>> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0 >>>> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0 >>>> already(1)! >>>> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk >>> >>> Looks like sdk was found properly, what's the problem? >> >> Yes, this problem occurs occasionally. There is no exception log when >> scanning the disk, but the disk cannot be used. It has been confirmed that >> it is related to fio testing. When the dev node does not exist, fio may >> actively create this file. > > So that's a userspace issue. If a device node is to be created, and the > file is already present with that name, yes, we will fail to create it > as obviously userspace did not want us to do so. > > It's not the kernel's job to protect userspace from doing foolish things > itself, right? :) Yes. > >> If we want to solve this problem, should we delete the existing files first >> when creating a dev node? > > No. Ok. > >> Or just print a prompt indicating that the dev node creation failed. > > We can do that, but will that cause error messages to be printed out for > normal situations today where userspace does this on purpose? > > Again, this isn't fixing the root problem here (which is userspace doing > something it shouldn't be doing), adding kernel log messages might be > just noise at this point in time given that it has been operating this > way for many years, if not decades. Yes, there is currently no fix for the problem, and it doesn't usually happen. Once it occurs, the device will be unavailable and difficult to locate. In addition, there are many possibilities for the failure of devtmpfs to create a dev node, including currently recognized scenarios and memory allocation failures, etc. Thanks, Xingui .
Hello, kernel test robot noticed "sysfs: cannot create duplicate filename '/devices/virtual/misc/btrfs-control'" on: commit: 691a646007c85103cbd339985b08bd336230f611 ("[PATCH] driver core: Add log when devtmpfs create node failed") url: https://github.com/intel-lab-lkp/linux/commits/Xingui-Yang/driver-core-Add-log-when-devtmpfs-create-node-failed/20240522-194825 base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git 880a746fa3ea5916a012fa320fdfbcd3f331bea3 patch link: https://lore.kernel.org/all/20240522114346.42951-1-yangxingui@huawei.com/ patch subject: [PATCH] driver core: Add log when devtmpfs create node failed in testcase: xfstests version: xfstests-x86_64-98379713-1_20240527 with following parameters: disk: 4HDD fs: btrfs test: generic-group-16 compiler: gcc-13 test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Skylake) with 28G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202405291031.54c11515-oliver.sang@intel.com The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240529/202405291031.54c11515-oliver.sang@intel.com [ 40.509648][ T1244] sysfs: cannot create duplicate filename '/devices/virtual/misc/btrfs-control' [ 40.518489][ T1244] CPU: 0 PID: 1244 Comm: modprobe Not tainted 6.9.0-rc5-00014-g691a646007c8 #1 [ 40.527223][ T1244] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016 [ 40.535263][ T1244] Call Trace: [ 40.538384][ T1244] <TASK> [ 40.541162][ T1244] dump_stack_lvl+0x53/0x70 [ 40.545496][ T1244] sysfs_warn_dup+0x75/0x90 [ 40.549830][ T1244] sysfs_create_dir_ns+0x208/0x280 [ 40.554766][ T1244] ? __pfx_sysfs_create_dir_ns+0x10/0x10 [ 40.560221][ T1244] ? _raw_spin_lock+0x85/0xe0 [ 40.564727][ T1244] kobject_add_internal+0x272/0x890 [ 40.569752][ T1244] kobject_add+0x13b/0x200 [ 40.573999][ T1244] ? __pfx_kobject_add+0x10/0x10 [ 40.578762][ T1244] ? __pfx_mutex_unlock+0x10/0x10 [ 40.583611][ T1244] ? kobject_get+0x50/0xe0 [ 40.587856][ T1244] ? kobject_put+0x50/0xd0 [ 40.592101][ T1244] device_add+0x218/0x1220 [ 40.596348][ T1244] ? __pfx_device_add+0x10/0x10 [ 40.601028][ T1244] device_create_groups_vargs+0x1be/0x230 [ 40.606570][ T1244] device_create_with_groups+0xab/0xe0 [ 40.611854][ T1244] ? __pfx_device_create_with_groups+0x10/0x10 [ 40.617828][ T1244] ? mutex_lock+0xa2/0xf0 [ 40.621987][ T1244] ? __pfx_mutex_lock+0x10/0x10 [ 40.626664][ T1244] ? __kmem_cache_create+0x37/0x60 [ 40.631602][ T1244] misc_register+0x1d3/0x680 [ 40.636022][ T1244] init_btrfs_fs+0xa6/0x260 [btrfs] [ 40.641139][ T1244] ? __pfx_init_btrfs_fs+0x10/0x10 [btrfs] [ 40.646852][ T1244] do_one_initcall+0xa4/0x3a0 [ 40.651361][ T1244] ? __pfx_do_one_initcall+0x10/0x10 [ 40.656483][ T1244] ? kasan_unpoison+0x44/0x70 [ 40.660997][ T1244] do_init_module+0x238/0x740 [ 40.665513][ T1244] init_module_from_file+0xd1/0x130 [ 40.670543][ T1244] ? __pfx_init_module_from_file+0x10/0x10 [ 40.676180][ T1244] ? __pfx_userfaultfd_unmap_complete+0x10/0x10 [ 40.682252][ T1244] ? __pfx__raw_spin_lock+0x10/0x10 [ 40.687282][ T1244] idempotent_init_module+0x23b/0x650 [ 40.692478][ T1244] ? __pfx_idempotent_init_module+0x10/0x10 [ 40.698193][ T1244] ? __fget_light+0x57/0x410 [ 40.702610][ T1244] ? security_capable+0x71/0xb0 [ 40.707290][ T1244] __x64_sys_finit_module+0xbe/0x130 [ 40.712399][ T1244] do_syscall_64+0x60/0x170 [ 40.716731][ T1244] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 40.722447][ T1244] RIP: 0033:0x7f3baa34e719 [ 40.726694][ T1244] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48 [ 40.746041][ T1244] RSP: 002b:00007ffd80ff8248 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 40.754257][ T1244] RAX: ffffffffffffffda RBX: 00005612a1b66d60 RCX: 00007f3baa34e719 [ 40.762039][ T1244] RDX: 0000000000000000 RSI: 00005612915e94a0 RDI: 0000000000000000 [ 40.769820][ T1244] RBP: 00005612915e94a0 R08: 0000000000000000 R09: 00005612a1b69c60 [ 40.777601][ T1244] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000040000 [ 40.785382][ T1244] R13: 0000000000000000 R14: 00005612a1b67030 R15: 0000000000000000 [ 40.793165][ T1244] </TASK> [ 40.796080][ T1244] kobject: kobject_add_internal failed for btrfs-control with -EEXIST, don't try to register things with the same name in the same directory. [ 41.142508][ T1243] request_module fs-btrfs succeeded, but still no fs?
diff --git a/drivers/base/core.c b/drivers/base/core.c index 5f4e03336e68..32a41e0472b2 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -3691,7 +3691,10 @@ int device_add(struct device *dev) if (error) goto SysEntryError; - devtmpfs_create_node(dev); + error = devtmpfs_create_node(dev); + if (error) + pr_info("devtmpfs create node for %s failed: %d\n", + dev_name(dev), error); } /* Notify clients of device addition. This call must come
Currently, no exception information is output when devtmpfs create node failed, so add log info for it. Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/base/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)