diff mbox series

driver core: Add log when devtmpfs create node failed

Message ID 20240522114346.42951-1-yangxingui@huawei.com (mailing list archive)
State Rejected
Headers show
Series driver core: Add log when devtmpfs create node failed | expand

Commit Message

yangxingui May 22, 2024, 11:43 a.m. UTC
Currently, no exception information is output when devtmpfs create node
failed, so add log info for it.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
---
 drivers/base/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Greg KH May 22, 2024, 12:23 p.m. UTC | #1
On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
> Currently, no exception information is output when devtmpfs create node
> failed, so add log info for it.

Why?  Who is going to do something with this?

> 
> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
> ---
>  drivers/base/core.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5f4e03336e68..32a41e0472b2 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3691,7 +3691,10 @@ int device_add(struct device *dev)
>  		if (error)
>  			goto SysEntryError;
>  
> -		devtmpfs_create_node(dev);
> +		error = devtmpfs_create_node(dev);
> +		if (error)
> +			pr_info("devtmpfs create node for %s failed: %d\n",
> +				dev_name(dev), error);

Why is an error message pr_info()?

And again, why is this needed?  If this needs to be checked, why are you
now checking it but ignoring the error?

What would this help with?

thanks,

greg k-h
yangxingui May 23, 2024, 1:50 a.m. UTC | #2
Hi, Greg

On 2024/5/22 20:23, Greg KH wrote:
> On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
>> Currently, no exception information is output when devtmpfs create node
>> failed, so add log info for it.
> 
> Why?  Who is going to do something with this?
We execute the lsscsi command after the disk is connected, we 
occasionally find that some disks do not have dev nodes and these disks 
cannot be used.
However, there is no abnormal log output during disk scanning. We 
analyze that it may be caused by the failure of devtmpfs create dev 
node, so the log is added here.
The lscsi command query results and kernel logs as follows:

[root@localhost]# lsscsi
[9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-

kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 
link_rate=10(sata)
kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1 
(5000000000000900)
kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0 
phy_state=0xe
kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0 
down
kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0 
link_rate=10(sata)
kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, 
max UDMA/133
kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 
NCQ (depth 32), AA
kernel: [586670.027471] ata19.00: configured for UDMA/133
kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 
failed: 0 tries: 1
kernel: [586670.037541] sas: ata19: end_device-9:0: 
model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA 
ST10000NM0086-2A SN05 PQ: 0 ANSI: 5
kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical 
blocks: (10.0 TB/9.10 TiB)
kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0 
already(1)!
kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk

> 
>>
>> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
>> ---
>>   drivers/base/core.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 5f4e03336e68..32a41e0472b2 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -3691,7 +3691,10 @@ int device_add(struct device *dev)
>>   		if (error)
>>   			goto SysEntryError;
>>   
>> -		devtmpfs_create_node(dev);
>> +		error = devtmpfs_create_node(dev);
>> +		if (error)
>> +			pr_info("devtmpfs create node for %s failed: %d\n",
>> +				dev_name(dev), error);
> 
> Why is an error message pr_info()?
Do you recommend using pr_err()?
> 
> And again, why is this needed?  If this needs to be checked, why are you
> now checking it but ignoring the error?
> 
> What would this help with?
As above, we want to get the error info when the dev node fails to be 
created. We currently haven't figured out how to handle this exception 
well. But judging from the problems we are currently encountering, some 
may be because the corresponding dev node already exists, causing the 
creation to fail, but the node information is incorrect and the device 
cannot be used. as follows:
[root@localhost]# ll /dev/sdk
-rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk

Do you have any suggestions?

Thanks,
Xingui
Greg KH May 23, 2024, 7:25 a.m. UTC | #3
On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
> Hi, Greg
> 
> On 2024/5/22 20:23, Greg KH wrote:
> > On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
> > > Currently, no exception information is output when devtmpfs create node
> > > failed, so add log info for it.
> > 
> > Why?  Who is going to do something with this?
> We execute the lsscsi command after the disk is connected, we occasionally
> find that some disks do not have dev nodes and these disks cannot be used.

Ok, but why do you think that devtmpfs create failed?

> However, there is no abnormal log output during disk scanning. We analyze
> that it may be caused by the failure of devtmpfs create dev node, so the log
> is added here.

But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
instead?

> The lscsi command query results and kernel logs as follows:
> 
> [root@localhost]# lsscsi
> [9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-
> 
> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
> link_rate=10(sata)
> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
> (5000000000000900)
> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
> phy_state=0xe
> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
> down
> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
> link_rate=10(sata)
> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
> UDMA/133
> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
> (depth 32), AA
> kernel: [586670.027471] ata19.00: configured for UDMA/133
> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
> 0 tries: 1
> kernel: [586670.037541] sas: ata19: end_device-9:0:
> model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
> SN05 PQ: 0 ANSI: 5
> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
> blocks: (10.0 TB/9.10 TiB)
> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
> already(1)!
> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk

Looks like sdk was found properly, what's the problem?

> 
> > 
> > > 
> > > Signed-off-by: Xingui Yang <yangxingui@huawei.com>
> > > ---
> > >   drivers/base/core.c | 5 ++++-
> > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 5f4e03336e68..32a41e0472b2 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -3691,7 +3691,10 @@ int device_add(struct device *dev)
> > >   		if (error)
> > >   			goto SysEntryError;
> > > -		devtmpfs_create_node(dev);
> > > +		error = devtmpfs_create_node(dev);
> > > +		if (error)
> > > +			pr_info("devtmpfs create node for %s failed: %d\n",
> > > +				dev_name(dev), error);
> > 
> > Why is an error message pr_info()?
> Do you recommend using pr_err()?

Do not print errors at the information level :)

> > And again, why is this needed?  If this needs to be checked, why are you
> > now checking it but ignoring the error?
> > 
> > What would this help with?
> As above, we want to get the error info when the dev node fails to be
> created. We currently haven't figured out how to handle this exception well.
> But judging from the problems we are currently encountering, some may be
> because the corresponding dev node already exists, causing the creation to
> fail, but the node information is incorrect and the device cannot be used.
> as follows:
> [root@localhost]# ll /dev/sdk
> -rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk

Looks like the device node is created to me.  What is incorrect about
it, the values?  What is 'll' an alias for?  And are you sure that other
tools aren't getting the device node creation uevent and doing something
with it in userspace?  How do you know this is the kernel failing?

Wait, is /dev/sdk really a device node and not a file?  Perhaps
something else wrote to it first, before it was created?  And that's why
devtmpfs couldn't create it.  That sounds like a userspace error,
nothing the kernel can do about it.

thanks,

greg k-h
yangxingui May 23, 2024, 9:23 a.m. UTC | #4
Hi Greg,

On 2024/5/23 15:25, Greg KH wrote:
> On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
>> Hi, Greg
>>
>> On 2024/5/22 20:23, Greg KH wrote:
>>> On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
>>>> Currently, no exception information is output when devtmpfs create node
>>>> failed, so add log info for it.
>>>
>>> Why?  Who is going to do something with this?
>> We execute the lsscsi command after the disk is connected, we occasionally
>> find that some disks do not have dev nodes and these disks cannot be used.
> 
> Ok, but why do you think that devtmpfs create failed?
I found that lsscsi will traverse the dev node and obtain device major 
and min. If no matching dev node is found, it will display "-       ".
> 
>> However, there is no abnormal log output during disk scanning. We analyze
>> that it may be caused by the failure of devtmpfs create dev node, so the log
>> is added here.
> 
> But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
> instead?
My subsequent reply touches on these points.
> 
>> The lscsi command query results and kernel logs as follows:
>>
>> [root@localhost]# lsscsi
>> [9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-
>>
>> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
>> link_rate=10(sata)
>> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
>> (5000000000000900)
>> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
>> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
>> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
>> phy_state=0xe
>> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
>> down
>> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
>> link_rate=10(sata)
>> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
>> UDMA/133
>> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
>> (depth 32), AA
>> kernel: [586670.027471] ata19.00: configured for UDMA/133
>> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
>> 0 tries: 1
>> kernel: [586670.037541] sas: ata19: end_device-9:0:
>> model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
>> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
>> SN05 PQ: 0 ANSI: 5
>> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
>> blocks: (10.0 TB/9.10 TiB)
>> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
>> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
>> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
>> enabled, doesn't support DPO or FUA
>> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
>> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
>> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
>> already(1)!
>> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk
> 
> Looks like sdk was found properly, what's the problem?

Yes, this problem occurs occasionally. There is no exception log when 
scanning the disk, but the disk cannot be used. It has been confirmed 
that it is related to fio testing. When the dev node does not exist, fio 
may actively create this file.

> 
>>
>>>
>>>>
>>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
>>>> ---
>>>>    drivers/base/core.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>> index 5f4e03336e68..32a41e0472b2 100644
>>>> --- a/drivers/base/core.c
>>>> +++ b/drivers/base/core.c
>>>> @@ -3691,7 +3691,10 @@ int device_add(struct device *dev)
>>>>    		if (error)
>>>>    			goto SysEntryError;
>>>> -		devtmpfs_create_node(dev);
>>>> +		error = devtmpfs_create_node(dev);
>>>> +		if (error)
>>>> +			pr_info("devtmpfs create node for %s failed: %d\n",
>>>> +				dev_name(dev), error);
>>>
>>> Why is an error message pr_info()?
>> Do you recommend using pr_err()?
> 
> Do not print errors at the information level :)
Ok.
> 
>>> And again, why is this needed?  If this needs to be checked, why are you
>>> now checking it but ignoring the error?
>>>
>>> What would this help with?
>> As above, we want to get the error info when the dev node fails to be
>> created. We currently haven't figured out how to handle this exception well.
>> But judging from the problems we are currently encountering, some may be
>> because the corresponding dev node already exists, causing the creation to
>> fail, but the node information is incorrect and the device cannot be used.
>> as follows:
>> [root@localhost]# ll /dev/sdk
>> -rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk
> 
> Looks like the device node is created to me.  What is incorrect about
> it, the values?  What is 'll' an alias for?  And are you sure that other
> tools aren't getting the device node creation uevent and doing something
> with it in userspace?  How do you know this is the kernel failing?
> 
> Wait, is /dev/sdk really a device node and not a file?  Perhaps
> something else wrote to it first, before it was created?  And that's why
> devtmpfs couldn't create it.  That sounds like a userspace error,
> nothing the kernel can do about it.
Yes, /dev/sdk is a file that may create by fio. and I tried to do the 
same and could reproduce the problem, and after adding the log, it shows 
that the file already exists.

# ll /dev/sdc
-rw-------. 1 root root 5 Jul 22 00:04 /dev/sdc

[  366.306045] sas: ata8: end_device-4:1: model:ST2000NM0055-1V4104 
serial:            ZC21FDJ3
[  366.306991] scsi 4:0:8:0: Direct-Access     ATA      ST2000NM0055-1V4 
TN05 PQ: 0 ANSI: 5
[  366.307204] sd 4:0:8:0: [sdc] 3907029168 512-byte logical blocks: 
(2.00 TB/1.82 TiB)
[  366.307212] sd 4:0:8:0: [sdc] Write Protect is off
[  366.307213] sd 4:0:8:0: [sdc] Mode Sense: 00 3a 00 00
[  366.307222] sd 4:0:8:0: [sdc] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[  366.307308] sd 4:0:8:0: Attached scsi generic sg2 type 0
[  366.307354] sas: DONE DISCOVERY on port 1, pid:8, result:0
[  366.307364] sas: sas_form_port: phy1 belongs to port1 already(1)!
[  366.307384] devtmpfs create node for sdc failed: -17
[  366.307880] sd 4:0:8:0: [sdc] Attached SCSI disk

# lsscsi
[4:0:8:0]    disk    ATA      ST2000NM0055-1V4 TN05  -


If we want to solve this problem, should we delete the existing files 
first when creating a dev node? Or just print a prompt indicating that 
the dev node creation failed.

Thanks,
Xingui
.
Greg KH May 23, 2024, 9:35 a.m. UTC | #5
On Thu, May 23, 2024 at 05:23:07PM +0800, yangxingui wrote:
> Hi Greg,
> 
> On 2024/5/23 15:25, Greg KH wrote:
> > On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
> > > Hi, Greg
> > > 
> > > On 2024/5/22 20:23, Greg KH wrote:
> > > > On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
> > > > > Currently, no exception information is output when devtmpfs create node
> > > > > failed, so add log info for it.
> > > > 
> > > > Why?  Who is going to do something with this?
> > > We execute the lsscsi command after the disk is connected, we occasionally
> > > find that some disks do not have dev nodes and these disks cannot be used.
> > 
> > Ok, but why do you think that devtmpfs create failed?
> I found that lsscsi will traverse the dev node and obtain device major and
> min. If no matching dev node is found, it will display "-       ".
> > 
> > > However, there is no abnormal log output during disk scanning. We analyze
> > > that it may be caused by the failure of devtmpfs create dev node, so the log
> > > is added here.
> > 
> > But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
> > instead?
> My subsequent reply touches on these points.
> > 
> > > The lscsi command query results and kernel logs as follows:
> > > 
> > > [root@localhost]# lsscsi
> > > [9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-
> > > 
> > > kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
> > > link_rate=10(sata)
> > > kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
> > > (5000000000000900)
> > > kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
> > > kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
> > > kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> > > kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
> > > phy_state=0xe
> > > kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
> > > down
> > > kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
> > > link_rate=10(sata)
> > > kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
> > > UDMA/133
> > > kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
> > > (depth 32), AA
> > > kernel: [586670.027471] ata19.00: configured for UDMA/133
> > > kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
> > > 0 tries: 1
> > > kernel: [586670.037541] sas: ata19: end_device-9:0:
> > > model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
> > > kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
> > > SN05 PQ: 0 ANSI: 5
> > > kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
> > > blocks: (10.0 TB/9.10 TiB)
> > > kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
> > > kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
> > > kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
> > > enabled, doesn't support DPO or FUA
> > > kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
> > > kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
> > > kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
> > > already(1)!
> > > kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk
> > 
> > Looks like sdk was found properly, what's the problem?
> 
> Yes, this problem occurs occasionally. There is no exception log when
> scanning the disk, but the disk cannot be used. It has been confirmed that
> it is related to fio testing. When the dev node does not exist, fio may
> actively create this file.

So that's a userspace issue.  If a device node is to be created, and the
file is already present with that name, yes, we will fail to create it
as obviously userspace did not want us to do so.

It's not the kernel's job to protect userspace from doing foolish things
itself, right?  :)

> If we want to solve this problem, should we delete the existing files first
> when creating a dev node?

No.

> Or just print a prompt indicating that the dev node creation failed.

We can do that, but will that cause error messages to be printed out for
normal situations today where userspace does this on purpose?

Again, this isn't fixing the root problem here (which is userspace doing
something it shouldn't be doing), adding kernel log messages might be
just noise at this point in time given that it has been operating this
way for many years, if not decades.

thanks,

greg k-h
yangxingui May 23, 2024, 11:01 a.m. UTC | #6
On 2024/5/23 17:35, Greg KH wrote:
> On Thu, May 23, 2024 at 05:23:07PM +0800, yangxingui wrote:
>> Hi Greg,
>>
>> On 2024/5/23 15:25, Greg KH wrote:
>>> On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
>>>> Hi, Greg
>>>>
>>>> On 2024/5/22 20:23, Greg KH wrote:
>>>>> On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
>>>>>> Currently, no exception information is output when devtmpfs create node
>>>>>> failed, so add log info for it.
>>>>>
>>>>> Why?  Who is going to do something with this?
>>>> We execute the lsscsi command after the disk is connected, we occasionally
>>>> find that some disks do not have dev nodes and these disks cannot be used.
>>>
>>> Ok, but why do you think that devtmpfs create failed?
>> I found that lsscsi will traverse the dev node and obtain device major and
>> min. If no matching dev node is found, it will display "-       ".
>>>
>>>> However, there is no abnormal log output during disk scanning. We analyze
>>>> that it may be caused by the failure of devtmpfs create dev node, so the log
>>>> is added here.
>>>
>>> But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
>>> instead?
>> My subsequent reply touches on these points.
>>>
>>>> The lscsi command query results and kernel logs as follows:
>>>>
>>>> [root@localhost]# lsscsi
>>>> [9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-
>>>>
>>>> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
>>>> link_rate=10(sata)
>>>> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
>>>> (5000000000000900)
>>>> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
>>>> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
>>>> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>>> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
>>>> phy_state=0xe
>>>> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
>>>> down
>>>> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
>>>> link_rate=10(sata)
>>>> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
>>>> UDMA/133
>>>> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
>>>> (depth 32), AA
>>>> kernel: [586670.027471] ata19.00: configured for UDMA/133
>>>> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
>>>> 0 tries: 1
>>>> kernel: [586670.037541] sas: ata19: end_device-9:0:
>>>> model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
>>>> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
>>>> SN05 PQ: 0 ANSI: 5
>>>> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
>>>> blocks: (10.0 TB/9.10 TiB)
>>>> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
>>>> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
>>>> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
>>>> enabled, doesn't support DPO or FUA
>>>> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
>>>> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
>>>> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
>>>> already(1)!
>>>> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk
>>>
>>> Looks like sdk was found properly, what's the problem?
>>
>> Yes, this problem occurs occasionally. There is no exception log when
>> scanning the disk, but the disk cannot be used. It has been confirmed that
>> it is related to fio testing. When the dev node does not exist, fio may
>> actively create this file.
> 
> So that's a userspace issue.  If a device node is to be created, and the
> file is already present with that name, yes, we will fail to create it
> as obviously userspace did not want us to do so.
> 
> It's not the kernel's job to protect userspace from doing foolish things
> itself, right?  :)
Yes.
> 
>> If we want to solve this problem, should we delete the existing files first
>> when creating a dev node?
> 
> No.
Ok.
> 
>> Or just print a prompt indicating that the dev node creation failed.
> 
> We can do that, but will that cause error messages to be printed out for
> normal situations today where userspace does this on purpose?
> 
> Again, this isn't fixing the root problem here (which is userspace doing
> something it shouldn't be doing), adding kernel log messages might be
> just noise at this point in time given that it has been operating this
> way for many years, if not decades.
Yes, there is currently no fix for the problem, and it doesn't usually 
happen. Once it occurs, the device will be unavailable and difficult to 
locate. In addition, there are many possibilities for the failure of 
devtmpfs to create a dev node, including currently recognized scenarios 
and memory allocation failures, etc.

Thanks,
Xingui
.
kernel test robot May 29, 2024, 2:49 a.m. UTC | #7
Hello,

kernel test robot noticed "sysfs: cannot create duplicate filename '/devices/virtual/misc/btrfs-control'" on:

commit: 691a646007c85103cbd339985b08bd336230f611 ("[PATCH] driver core: Add log when devtmpfs create node failed")
url: https://github.com/intel-lab-lkp/linux/commits/Xingui-Yang/driver-core-Add-log-when-devtmpfs-create-node-failed/20240522-194825
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git 880a746fa3ea5916a012fa320fdfbcd3f331bea3
patch link: https://lore.kernel.org/all/20240522114346.42951-1-yangxingui@huawei.com/
patch subject: [PATCH] driver core: Add log when devtmpfs create node failed

in testcase: xfstests
version: xfstests-x86_64-98379713-1_20240527
with following parameters:

	disk: 4HDD
	fs: btrfs
	test: generic-group-16



compiler: gcc-13
test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Skylake) with 28G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405291031.54c11515-oliver.sang@intel.com



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240529/202405291031.54c11515-oliver.sang@intel.com



[   40.509648][ T1244] sysfs: cannot create duplicate filename '/devices/virtual/misc/btrfs-control'
[   40.518489][ T1244] CPU: 0 PID: 1244 Comm: modprobe Not tainted 6.9.0-rc5-00014-g691a646007c8 #1
[   40.527223][ T1244] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016
[   40.535263][ T1244] Call Trace:
[   40.538384][ T1244]  <TASK>
[   40.541162][ T1244]  dump_stack_lvl+0x53/0x70
[   40.545496][ T1244]  sysfs_warn_dup+0x75/0x90
[   40.549830][ T1244]  sysfs_create_dir_ns+0x208/0x280
[   40.554766][ T1244]  ? __pfx_sysfs_create_dir_ns+0x10/0x10
[   40.560221][ T1244]  ? _raw_spin_lock+0x85/0xe0
[   40.564727][ T1244]  kobject_add_internal+0x272/0x890
[   40.569752][ T1244]  kobject_add+0x13b/0x200
[   40.573999][ T1244]  ? __pfx_kobject_add+0x10/0x10
[   40.578762][ T1244]  ? __pfx_mutex_unlock+0x10/0x10
[   40.583611][ T1244]  ? kobject_get+0x50/0xe0
[   40.587856][ T1244]  ? kobject_put+0x50/0xd0
[   40.592101][ T1244]  device_add+0x218/0x1220
[   40.596348][ T1244]  ? __pfx_device_add+0x10/0x10
[   40.601028][ T1244]  device_create_groups_vargs+0x1be/0x230
[   40.606570][ T1244]  device_create_with_groups+0xab/0xe0
[   40.611854][ T1244]  ? __pfx_device_create_with_groups+0x10/0x10
[   40.617828][ T1244]  ? mutex_lock+0xa2/0xf0
[   40.621987][ T1244]  ? __pfx_mutex_lock+0x10/0x10
[   40.626664][ T1244]  ? __kmem_cache_create+0x37/0x60
[   40.631602][ T1244]  misc_register+0x1d3/0x680
[   40.636022][ T1244]  init_btrfs_fs+0xa6/0x260 [btrfs]
[   40.641139][ T1244]  ? __pfx_init_btrfs_fs+0x10/0x10 [btrfs]
[   40.646852][ T1244]  do_one_initcall+0xa4/0x3a0
[   40.651361][ T1244]  ? __pfx_do_one_initcall+0x10/0x10
[   40.656483][ T1244]  ? kasan_unpoison+0x44/0x70
[   40.660997][ T1244]  do_init_module+0x238/0x740
[   40.665513][ T1244]  init_module_from_file+0xd1/0x130
[   40.670543][ T1244]  ? __pfx_init_module_from_file+0x10/0x10
[   40.676180][ T1244]  ? __pfx_userfaultfd_unmap_complete+0x10/0x10
[   40.682252][ T1244]  ? __pfx__raw_spin_lock+0x10/0x10
[   40.687282][ T1244]  idempotent_init_module+0x23b/0x650
[   40.692478][ T1244]  ? __pfx_idempotent_init_module+0x10/0x10
[   40.698193][ T1244]  ? __fget_light+0x57/0x410
[   40.702610][ T1244]  ? security_capable+0x71/0xb0
[   40.707290][ T1244]  __x64_sys_finit_module+0xbe/0x130
[   40.712399][ T1244]  do_syscall_64+0x60/0x170
[   40.716731][ T1244]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   40.722447][ T1244] RIP: 0033:0x7f3baa34e719
[   40.726694][ T1244] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
[   40.746041][ T1244] RSP: 002b:00007ffd80ff8248 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   40.754257][ T1244] RAX: ffffffffffffffda RBX: 00005612a1b66d60 RCX: 00007f3baa34e719
[   40.762039][ T1244] RDX: 0000000000000000 RSI: 00005612915e94a0 RDI: 0000000000000000
[   40.769820][ T1244] RBP: 00005612915e94a0 R08: 0000000000000000 R09: 00005612a1b69c60
[   40.777601][ T1244] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000040000
[   40.785382][ T1244] R13: 0000000000000000 R14: 00005612a1b67030 R15: 0000000000000000
[   40.793165][ T1244]  </TASK>
[   40.796080][ T1244] kobject: kobject_add_internal failed for btrfs-control with -EEXIST, don't try to register things with the same name in the same directory.
[   41.142508][ T1243] request_module fs-btrfs succeeded, but still no fs?
diff mbox series

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5f4e03336e68..32a41e0472b2 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3691,7 +3691,10 @@  int device_add(struct device *dev)
 		if (error)
 			goto SysEntryError;
 
-		devtmpfs_create_node(dev);
+		error = devtmpfs_create_node(dev);
+		if (error)
+			pr_info("devtmpfs create node for %s failed: %d\n",
+				dev_name(dev), error);
 	}
 
 	/* Notify clients of device addition.  This call must come