diff mbox series

PCI/AER: Fix the uninitialized aer_fifo

Message ID 1579767991-103898-1-git-send-email-liudongdong3@huawei.com (mailing list archive)
State Mainlined, archived
Commit d95f20c4f07020ebc605f3b46af4b6db9eb5fc99
Delegated to: Bjorn Helgaas
Headers show
Series PCI/AER: Fix the uninitialized aer_fifo | expand

Commit Message

Dongdong Liu Jan. 23, 2020, 8:26 a.m. UTC
Current code do not call INIT_KFIFO() to init aer_fifo. This will lead to
kfifo_put() sometimes return 0. This means the fifo was full. In fact, it
is not. It is easy to reproduce the problem by using aer_inject.
aer_inject -s :82:00.0 multiple-corr-nonfatal
The content of multiple-corr-nonfatal file is as below.
AER
COR RCVR
HL 0 1 2 3
AER
UNCOR POISON_TLP
HL 4 5 6 7

Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it")
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
---
 drivers/pci/pcie/aer.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Bjorn Helgaas Jan. 23, 2020, 10:25 p.m. UTC | #1
On Thu, Jan 23, 2020 at 04:26:31PM +0800, Dongdong Liu wrote:
> Current code do not call INIT_KFIFO() to init aer_fifo. This will lead to
> kfifo_put() sometimes return 0. This means the fifo was full. In fact, it
> is not. 

It's definitely a problem that we don't call INIT_KFIFO().  But I'm
curious about why this would only be a problem "sometimes".  The kfifo
is allocated with devm_kzalloc(), so it should be zero-filled and I
would think it would fail consistently, every time.  But I guess not?

> It is easy to reproduce the problem by using aer_inject.

I assume maybe you mean "aer-inject" (not "aer_inject"), from
https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/ ?
At least, that's what's mentioned in Documentation/PCI/pcieaer-howto.rst.

> aer_inject -s :82:00.0 multiple-corr-nonfatal
> The content of multiple-corr-nonfatal file is as below.
> AER
> COR RCVR
> HL 0 1 2 3
> AER
> UNCOR POISON_TLP
> HL 4 5 6 7
> 
> Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it")
> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
> ---
>  drivers/pci/pcie/aer.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1ca86f2..4a818b0 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1445,6 +1445,7 @@ static int aer_probe(struct pcie_device *dev)
>  		return -ENOMEM;
>  
>  	rpc->rpd = port;
> +	INIT_KFIFO(rpc->aer_fifo);
>  	set_service_data(dev, rpc);
>  
>  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> -- 
> 1.9.1
>
Dongdong Liu Jan. 29, 2020, 8:16 a.m. UTC | #2
Hi Bjorn

Many thanks for your review.
It's in the Spring Festival holiday, so reply later.

On 2020/1/24 上午6:25, Bjorn Helgaas wrote:
> On Thu, Jan 23, 2020 at 04:26:31PM +0800, Dongdong Liu wrote:
>> Current code do not call INIT_KFIFO() to init aer_fifo. This will lead to
>> kfifo_put() sometimes return 0. This means the fifo was full. In fact, it
>> is not.
> 
> It's definitely a problem that we don't call INIT_KFIFO().  But I'm
> curious about why this would only be a problem "sometimes".  The kfifo
> is allocated with devm_kzalloc(), so it should be zero-filled and I
> would think it would fail consistently, every time.  But I guess not?

Yes, It would fail consistently, every time when it appeared once.
But when do echo 15 > /proc/sys/kernel/printk,
"aer_inject -s 82:00.0 multiple-corr-nonfatal" executes correctly.
I think this is related with the time when to call kfifo_put() and 
kfifo_get().
case 1:
kfifo_put()--->kfifo_get()--->kfifo_put() //the fifo will not be full
case 2:
kfifo_put()--->kfifo_put()--->kfifo_get()
the fifo will be full when the second time to call kfifo_put();
> 
>> It is easy to reproduce the problem by using aer_inject.
> 
> I assume maybe you mean "aer-inject" (not "aer_inject"), from
> https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/ ?
> At least, that's what's mentioned in Documentation/PCI/pcieaer-howto.rst.

Yes, you are right, I mean aer-inject.

Thanks,
Dongdong
> 
>> aer_inject -s :82:00.0 multiple-corr-nonfatal
>> The content of multiple-corr-nonfatal file is as below.
>> AER
>> COR RCVR
>> HL 0 1 2 3
>> AER
>> UNCOR POISON_TLP
>> HL 4 5 6 7
>>
>> Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it")
>> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
>> ---
>>   drivers/pci/pcie/aer.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 1ca86f2..4a818b0 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1445,6 +1445,7 @@ static int aer_probe(struct pcie_device *dev)
>>   		return -ENOMEM;
>>   
>>   	rpc->rpd = port;
>> +	INIT_KFIFO(rpc->aer_fifo);
>>   	set_service_data(dev, rpc);
>>   
>>   	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
>> -- 
>> 1.9.1
>>
Bjorn Helgaas Jan. 29, 2020, 2:29 p.m. UTC | #3
On Thu, Jan 23, 2020 at 04:26:31PM +0800, Dongdong Liu wrote:
> Current code do not call INIT_KFIFO() to init aer_fifo. This will lead to
> kfifo_put() sometimes return 0. This means the fifo was full. In fact, it
> is not. It is easy to reproduce the problem by using aer_inject.
> aer_inject -s :82:00.0 multiple-corr-nonfatal
> The content of multiple-corr-nonfatal file is as below.
> AER
> COR RCVR
> HL 0 1 2 3
> AER
> UNCOR POISON_TLP
> HL 4 5 6 7
> 
> Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it")
> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>

Applied to pci/aer for v5.6, thanks!

I tweaked the commit log for s/aer_inject/aer-inject/

> ---
>  drivers/pci/pcie/aer.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1ca86f2..4a818b0 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1445,6 +1445,7 @@ static int aer_probe(struct pcie_device *dev)
>  		return -ENOMEM;
>  
>  	rpc->rpd = port;
> +	INIT_KFIFO(rpc->aer_fifo);
>  	set_service_data(dev, rpc);
>  
>  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> -- 
> 1.9.1
>
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1ca86f2..4a818b0 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1445,6 +1445,7 @@  static int aer_probe(struct pcie_device *dev)
 		return -ENOMEM;
 
 	rpc->rpd = port;
+	INIT_KFIFO(rpc->aer_fifo);
 	set_service_data(dev, rpc);
 
 	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,