Message ID | 20241030132352.154488-1-islituo@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | chcr_ktls: fix a possible null-pointer dereference in chcr_ktls_dev_add() | expand |
On Thu, Oct 31, 2024 at 12:23:52AM +1100, Tuo Li wrote: > There is a possible null-pointer dereference related to the wait-completion > synchronization mechanism in the function chcr_ktls_dev_add(). > > Consider the following execution scenario: > > chcr_ktls_cpl_act_open_rpl() //641 > u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //686 > if (u_ctx) { //687 > complete(&tx_info->completion); //704 > > The variable u_ctx is checked by an if statement at Line 687, which means > it can be NULL. Then, complete() is called at Line 704, which will wake > up wait_for_completion_xxx(). > > Consider the wait_for_completion_timeout() in chcr_ktls_dev_add(): > > chcr_ktls_dev_add() //412 > u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //432 > wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551 > xa_erase(&u_ctx->tid_list, tx_info->tid); //580 > > The variable u_ctx is dereferenced without being rechecked at Line 580 > after the wait_for_completion_timeout(), which can introduce a null-pointer > dereference. Besides, the variable u_ctx is also checked at Line 442 in > chcr_ktls_dev_add(), which indicates that u_ctx is likely to be NULL in > some execution contexts. > > To fix this possible null-pointer dereference, a NULL check is put ahead of > the call to xa_erase(). > > This potential bug was discovered using an experimental static analysis > tool developed by our team. The tool deduces complete() and > wait_for_completion() pairs using alias analysis. It then applies data > flow analysis to detect null-pointer dereferences across synchronization > points. > > Fixes: 65e302a9bd57 ("cxgb4/ch_ktls: Clear resources when pf4 device is removed") > Signed-off-by: Tuo Li <islituo@gmail.com> Hi Tuo Li, I do see that the checking of u_ctx is inconsistent, but it is not clear to me that is because one part is too defensive or, OTOH, there is a bug as you suggest. And I think that we need more analysis to determine which case it is. Also, if it is the case that there is a bug as you suggest, after a quick search, I think it also exists in at least one other place in this file. ...
Hi Simon, Thanks for your reply! It is very helpful. Any further feedback will be appreciated. Thank you very much! Sincerely, Tuo Li On 2024/11/5 0:07, Simon Horman wrote: > On Thu, Oct 31, 2024 at 12:23:52AM +1100, Tuo Li wrote: >> There is a possible null-pointer dereference related to the wait-completion >> synchronization mechanism in the function chcr_ktls_dev_add(). >> >> Consider the following execution scenario: >> >> chcr_ktls_cpl_act_open_rpl() //641 >> u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //686 >> if (u_ctx) { //687 >> complete(&tx_info->completion); //704 >> >> The variable u_ctx is checked by an if statement at Line 687, which means >> it can be NULL. Then, complete() is called at Line 704, which will wake >> up wait_for_completion_xxx(). >> >> Consider the wait_for_completion_timeout() in chcr_ktls_dev_add(): >> >> chcr_ktls_dev_add() //412 >> u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //432 >> wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551 >> xa_erase(&u_ctx->tid_list, tx_info->tid); //580 >> >> The variable u_ctx is dereferenced without being rechecked at Line 580 >> after the wait_for_completion_timeout(), which can introduce a null-pointer >> dereference. Besides, the variable u_ctx is also checked at Line 442 in >> chcr_ktls_dev_add(), which indicates that u_ctx is likely to be NULL in >> some execution contexts. >> >> To fix this possible null-pointer dereference, a NULL check is put ahead of >> the call to xa_erase(). >> >> This potential bug was discovered using an experimental static analysis >> tool developed by our team. The tool deduces complete() and >> wait_for_completion() pairs using alias analysis. It then applies data >> flow analysis to detect null-pointer dereferences across synchronization >> points. >> >> Fixes: 65e302a9bd57 ("cxgb4/ch_ktls: Clear resources when pf4 device is removed") >> Signed-off-by: Tuo Li <islituo@gmail.com> > > Hi Tuo Li, > > I do see that the checking of u_ctx is inconsistent, > but it is not clear to me that is because one part is too defensive > or, OTOH, there is a bug as you suggest. And I think that we need > more analysis to determine which case it is. > > Also, if it is the case that there is a bug as you suggest, after a quick > search, I think it also exists in at least one other place in this file. > > ...
On Tue, Nov 05, 2024 at 07:32:31AM +0800, Tuo Li wrote: > Hi Simon, > > Thanks for your reply! It is very helpful. > Any further feedback will be appreciated. Hi Tuo Li, 1. Please don't top-post on Linux mailing lists. 2. I think you need to do some careful analysis to understand if the condition you are concerned about can occur or not: can u_ctx ever be NULL in these code paths?
… > Consider the following execution scenario: > > chcr_ktls_cpl_act_open_rpl() //641 > u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //686 > if (u_ctx) { //687 > complete(&tx_info->completion); //704 > > The variable u_ctx is checked by an if statement at Line 687, which means > it can be NULL. Then, complete() is called at Line 704, which will wake > up wait_for_completion_xxx(). … To which software revision would you like to refer here? How does the presented information fit to a statement like the following? https://elixir.bootlin.com/linux/v6.12-rc6/source/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c#L442 if (u_ctx && u_ctx->detach) goto out; Would you eventually like to trace the control flow back any further for the data structure member “handle”? Regards, Markus
On 2024/11/8 21:34, Simon Horman wrote: > On Tue, Nov 05, 2024 at 07:32:31AM +0800, Tuo Li wrote: >> Hi Simon, >> >> Thanks for your reply! It is very helpful. >> Any further feedback will be appreciated. > > Hi Tuo Li, > > 1. Please don't top-post on Linux mailing lists. Apologies for the top-posting; I will avoid that in the future. > 2. I think you need to do some careful analysis to understand > if the condition you are concerned about can occur or not: > can u_ctx ever be NULL in these code paths? I have carefully reviewed the code paths, but I could not find any instance where the variable u_ctx is assigned a NULL value. It might be a defensive check to handle potential NULL values.
On 2024/11/8 23:00, Markus Elfring wrote: > … >> Consider the following execution scenario: >> >> chcr_ktls_cpl_act_open_rpl() //641 >> u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //686 >> if (u_ctx) { //687 >> complete(&tx_info->completion); //704 >> >> The variable u_ctx is checked by an if statement at Line 687, which means >> it can be NULL. Then, complete() is called at Line 704, which will wake >> up wait_for_completion_xxx(). > … > > To which software revision would you like to refer here? > > How does the presented information fit to a statement like the following? > https://elixir.bootlin.com/linux/v6.12-rc6/source/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c#L442 > > if (u_ctx && u_ctx->detach) > goto out; We have run our tool on Linux 6.11, and the line numbers correspond to the code in that version. > Would you eventually like to trace the control flow back any further > for the data structure member “handle”? > I have traced the control flow further for the data structure member 'handle,' but I have not found where the member is assigned a NULL value. I am not sure if I might have overlooked something.
> We have run our tool on Linux 6.11, and the line numbers correspond to the > code in that version. Would you like to share any source code analysis results for more recent software versions? Regards, Markus
On 2024/11/14 20:26, Markus Elfring wrote: >> We have run our tool on Linux 6.11, and the line numbers correspond to the >> code in that version. > > Would you like to share any source code analysis results for more recent software versions? Hi Elfring, Thanks for your reply. I ran our tool on Linux 6.12-rc7 (https://elixir.bootlin.com/linux/v6.12-rc7/source), and the same issue persists. The line number is identical to that on Linux 6.11. chcr_ktls_cpl_act_open_rpl() //641 u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //686 if (u_ctx) { //687 complete(&tx_info->completion); //704 chcr_ktls_dev_add() //412 u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //432 wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551 xa_erase(&u_ctx->tid_list, tx_info->tid); //580 Any further feedback would be appreciated! Sincerely, Tuo Li
… > chcr_ktls_cpl_act_open_rpl() //641 … > chcr_ktls_dev_add() //412… * How do you think about to improve your change description another bit? * Why do you try to refer to two different function implementations here? * Will further adjustment possibilities become interesting? Regards, Markus
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c index e8e460a92e0e..524c8e032bc8 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c @@ -577,7 +577,8 @@ static int chcr_ktls_dev_add(struct net_device *netdev, struct sock *sk, cxgb4_remove_tid(&tx_info->adap->tids, tx_info->tx_chan, tx_info->tid, tx_info->ip_family); - xa_erase(&u_ctx->tid_list, tx_info->tid); + if (u_ctx) + xa_erase(&u_ctx->tid_list, tx_info->tid); put_module: /* release module refcount */
There is a possible null-pointer dereference related to the wait-completion synchronization mechanism in the function chcr_ktls_dev_add(). Consider the following execution scenario: chcr_ktls_cpl_act_open_rpl() //641 u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //686 if (u_ctx) { //687 complete(&tx_info->completion); //704 The variable u_ctx is checked by an if statement at Line 687, which means it can be NULL. Then, complete() is called at Line 704, which will wake up wait_for_completion_xxx(). Consider the wait_for_completion_timeout() in chcr_ktls_dev_add(): chcr_ktls_dev_add() //412 u_ctx = adap->uld[CXGB4_ULD_KTLS].handle; //432 wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551 xa_erase(&u_ctx->tid_list, tx_info->tid); //580 The variable u_ctx is dereferenced without being rechecked at Line 580 after the wait_for_completion_timeout(), which can introduce a null-pointer dereference. Besides, the variable u_ctx is also checked at Line 442 in chcr_ktls_dev_add(), which indicates that u_ctx is likely to be NULL in some execution contexts. To fix this possible null-pointer dereference, a NULL check is put ahead of the call to xa_erase(). This potential bug was discovered using an experimental static analysis tool developed by our team. The tool deduces complete() and wait_for_completion() pairs using alias analysis. It then applies data flow analysis to detect null-pointer dereferences across synchronization points. Fixes: 65e302a9bd57 ("cxgb4/ch_ktls: Clear resources when pf4 device is removed") Signed-off-by: Tuo Li <islituo@gmail.com> --- drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)