diff mbox series

chcr_ktls: fix a possible null-pointer dereference in chcr_ktls_dev_add()

Message ID 20241030132352.154488-1-islituo@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series chcr_ktls: fix a possible null-pointer dereference in chcr_ktls_dev_add() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-03--21-00 (tests: 781)

Commit Message

Tuo Li Oct. 30, 2024, 1:23 p.m. UTC
There is a possible null-pointer dereference related to the wait-completion
synchronization mechanism in the function chcr_ktls_dev_add().

Consider the following execution scenario:

  chcr_ktls_cpl_act_open_rpl()   //641
    u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;   //686
    if (u_ctx) {  //687
    complete(&tx_info->completion);  //704

The variable u_ctx is checked by an if statement at Line 687, which means
it can be NULL. Then, complete() is called at Line 704, which will wake
up wait_for_completion_xxx().

Consider the wait_for_completion_timeout() in chcr_ktls_dev_add():

  chcr_ktls_dev_add()  //412
    u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;  //432
    wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551
    xa_erase(&u_ctx->tid_list, tx_info->tid);  //580

The variable u_ctx is dereferenced without being rechecked at Line 580
after the wait_for_completion_timeout(), which can introduce a null-pointer
dereference. Besides, the variable u_ctx is also checked at Line 442 in
chcr_ktls_dev_add(), which indicates that u_ctx is likely to be NULL in
some execution contexts.

To fix this possible null-pointer dereference, a NULL check is put ahead of
the call to xa_erase().

This potential bug was discovered using an experimental static analysis
tool developed by our team. The tool deduces complete() and
wait_for_completion() pairs using alias analysis. It then applies data
flow analysis to detect null-pointer dereferences across synchronization
points.

Fixes: 65e302a9bd57 ("cxgb4/ch_ktls: Clear resources when pf4 device is removed") 
Signed-off-by: Tuo Li <islituo@gmail.com>
---
 drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Simon Horman Nov. 4, 2024, 4:07 p.m. UTC | #1
On Thu, Oct 31, 2024 at 12:23:52AM +1100, Tuo Li wrote:
> There is a possible null-pointer dereference related to the wait-completion
> synchronization mechanism in the function chcr_ktls_dev_add().
> 
> Consider the following execution scenario:
> 
>   chcr_ktls_cpl_act_open_rpl()   //641
>     u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;   //686
>     if (u_ctx) {  //687
>     complete(&tx_info->completion);  //704
> 
> The variable u_ctx is checked by an if statement at Line 687, which means
> it can be NULL. Then, complete() is called at Line 704, which will wake
> up wait_for_completion_xxx().
> 
> Consider the wait_for_completion_timeout() in chcr_ktls_dev_add():
> 
>   chcr_ktls_dev_add()  //412
>     u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;  //432
>     wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551
>     xa_erase(&u_ctx->tid_list, tx_info->tid);  //580
> 
> The variable u_ctx is dereferenced without being rechecked at Line 580
> after the wait_for_completion_timeout(), which can introduce a null-pointer
> dereference. Besides, the variable u_ctx is also checked at Line 442 in
> chcr_ktls_dev_add(), which indicates that u_ctx is likely to be NULL in
> some execution contexts.
> 
> To fix this possible null-pointer dereference, a NULL check is put ahead of
> the call to xa_erase().
> 
> This potential bug was discovered using an experimental static analysis
> tool developed by our team. The tool deduces complete() and
> wait_for_completion() pairs using alias analysis. It then applies data
> flow analysis to detect null-pointer dereferences across synchronization
> points.
> 
> Fixes: 65e302a9bd57 ("cxgb4/ch_ktls: Clear resources when pf4 device is removed") 
> Signed-off-by: Tuo Li <islituo@gmail.com>

Hi Tuo Li,

I do see that the checking of u_ctx is inconsistent,
but it is not clear to me that is because one part is too defensive
or, OTOH, there is a bug as you suggest. And I think that we need
more analysis to determine which case it is.

Also, if it is the case that there is a bug as you suggest, after a quick
search, I think it also exists in at least one other place in this file.

...
Tuo Li Nov. 4, 2024, 11:32 p.m. UTC | #2
Hi Simon,

Thanks for your reply! It is very helpful.
Any further feedback will be appreciated.

Thank you very much!

Sincerely,
Tuo Li




On 2024/11/5 0:07, Simon Horman wrote:
> On Thu, Oct 31, 2024 at 12:23:52AM +1100, Tuo Li wrote:
>> There is a possible null-pointer dereference related to the wait-completion
>> synchronization mechanism in the function chcr_ktls_dev_add().
>>
>> Consider the following execution scenario:
>>
>>   chcr_ktls_cpl_act_open_rpl()   //641
>>     u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;   //686
>>     if (u_ctx) {  //687
>>     complete(&tx_info->completion);  //704
>>
>> The variable u_ctx is checked by an if statement at Line 687, which means
>> it can be NULL. Then, complete() is called at Line 704, which will wake
>> up wait_for_completion_xxx().
>>
>> Consider the wait_for_completion_timeout() in chcr_ktls_dev_add():
>>
>>   chcr_ktls_dev_add()  //412
>>     u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;  //432
>>     wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551
>>     xa_erase(&u_ctx->tid_list, tx_info->tid);  //580
>>
>> The variable u_ctx is dereferenced without being rechecked at Line 580
>> after the wait_for_completion_timeout(), which can introduce a null-pointer
>> dereference. Besides, the variable u_ctx is also checked at Line 442 in
>> chcr_ktls_dev_add(), which indicates that u_ctx is likely to be NULL in
>> some execution contexts.
>>
>> To fix this possible null-pointer dereference, a NULL check is put ahead of
>> the call to xa_erase().
>>
>> This potential bug was discovered using an experimental static analysis
>> tool developed by our team. The tool deduces complete() and
>> wait_for_completion() pairs using alias analysis. It then applies data
>> flow analysis to detect null-pointer dereferences across synchronization
>> points.
>>
>> Fixes: 65e302a9bd57 ("cxgb4/ch_ktls: Clear resources when pf4 device is removed")
>> Signed-off-by: Tuo Li <islituo@gmail.com>
>
> Hi Tuo Li,
>
> I do see that the checking of u_ctx is inconsistent,
> but it is not clear to me that is because one part is too defensive
> or, OTOH, there is a bug as you suggest. And I think that we need
> more analysis to determine which case it is.
>
> Also, if it is the case that there is a bug as you suggest, after a quick
> search, I think it also exists in at least one other place in this file.
>
> ...
Simon Horman Nov. 8, 2024, 1:34 p.m. UTC | #3
On Tue, Nov 05, 2024 at 07:32:31AM +0800, Tuo Li wrote:
> Hi Simon,
> 
> Thanks for your reply! It is very helpful.
> Any further feedback will be appreciated.

Hi Tuo Li,

1. Please don't top-post on Linux mailing lists.
2. I think you need to do some careful analysis to understand
   if the condition you are concerned about can occur or not:
   can u_ctx ever be NULL in these code paths?
Markus Elfring Nov. 8, 2024, 3 p.m. UTC | #4
> Consider the following execution scenario:
>
>   chcr_ktls_cpl_act_open_rpl()   //641
>     u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;   //686
>     if (u_ctx) {  //687
>     complete(&tx_info->completion);  //704
>
> The variable u_ctx is checked by an if statement at Line 687, which means
> it can be NULL. Then, complete() is called at Line 704, which will wake
> up wait_for_completion_xxx().
…

To which software revision would you like to refer here?

How does the presented information fit to a statement like the following?
https://elixir.bootlin.com/linux/v6.12-rc6/source/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c#L442

	if (u_ctx && u_ctx->detach)
		goto out;


Would you eventually like to trace the control flow back any further
for the data structure member “handle”?

Regards,
Markus
Tuo Li Nov. 14, 2024, 10:30 a.m. UTC | #5
On 2024/11/8 21:34, Simon Horman wrote:
> On Tue, Nov 05, 2024 at 07:32:31AM +0800, Tuo Li wrote:
>> Hi Simon,
>>
>> Thanks for your reply! It is very helpful.
>> Any further feedback will be appreciated.
> 
> Hi Tuo Li,
> 
> 1. Please don't top-post on Linux mailing lists.

Apologies for the top-posting; I will avoid that in the future.

> 2. I think you need to do some careful analysis to understand
>    if the condition you are concerned about can occur or not:
>    can u_ctx ever be NULL in these code paths?

I have carefully reviewed the code paths, but I could not find any instance
where the variable u_ctx is assigned a NULL value. It might be a defensive
check to handle potential NULL values.
Tuo Li Nov. 14, 2024, 10:37 a.m. UTC | #6
On 2024/11/8 23:00, Markus Elfring wrote:
> …
>> Consider the following execution scenario:
>>
>>   chcr_ktls_cpl_act_open_rpl()   //641
>>     u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;   //686
>>     if (u_ctx) {  //687
>>     complete(&tx_info->completion);  //704
>>
>> The variable u_ctx is checked by an if statement at Line 687, which means
>> it can be NULL. Then, complete() is called at Line 704, which will wake
>> up wait_for_completion_xxx().
> …
> 
> To which software revision would you like to refer here?
> 
> How does the presented information fit to a statement like the following?
> https://elixir.bootlin.com/linux/v6.12-rc6/source/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c#L442
> 
> 	if (u_ctx && u_ctx->detach)
> 		goto out;

We have run our tool on Linux 6.11, and the line numbers correspond to the
code in that version.

> Would you eventually like to trace the control flow back any further
> for the data structure member “handle”?
> 

I have traced the control flow further for the data structure member
'handle,' but I have not found where the member is assigned a NULL value. I
am not sure if I might have overlooked something.
Markus Elfring Nov. 14, 2024, 12:26 p.m. UTC | #7
> We have run our tool on Linux 6.11, and the line numbers correspond to the
> code in that version.

Would you like to share any source code analysis results for more recent software versions?

Regards,
Markus
Tuo Li Nov. 14, 2024, 2:22 p.m. UTC | #8
On 2024/11/14 20:26, Markus Elfring wrote:
>> We have run our tool on Linux 6.11, and the line numbers correspond to the
>> code in that version.
> 
> Would you like to share any source code analysis results for more recent software versions?

Hi Elfring,

Thanks for your reply.

I ran our tool on Linux 6.12-rc7
(https://elixir.bootlin.com/linux/v6.12-rc7/source), and the same issue
persists. The line number is identical to that on Linux 6.11.

  chcr_ktls_cpl_act_open_rpl()   //641
    u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;   //686
    if (u_ctx) {  //687
    complete(&tx_info->completion);  //704

  chcr_ktls_dev_add()  //412
    u_ctx = adap->uld[CXGB4_ULD_KTLS].handle;  //432
    wait_for_completion_timeout(&tx_info->completion, 30 * HZ); //551
    xa_erase(&u_ctx->tid_list, tx_info->tid);  //580

Any further feedback would be appreciated!

Sincerely,
Tuo Li
Markus Elfring Nov. 15, 2024, 9:15 a.m. UTC | #9
>   chcr_ktls_cpl_act_open_rpl()   //641>   chcr_ktls_dev_add()  //412…

* How do you think about to improve your change description another bit?

* Why do you try to refer to two different function implementations here?

* Will further adjustment possibilities become interesting?


Regards,
Markus
diff mbox series

Patch

diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
index e8e460a92e0e..524c8e032bc8 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
@@ -577,7 +577,8 @@  static int chcr_ktls_dev_add(struct net_device *netdev, struct sock *sk,
 	cxgb4_remove_tid(&tx_info->adap->tids, tx_info->tx_chan,
 			 tx_info->tid, tx_info->ip_family);
 
-	xa_erase(&u_ctx->tid_list, tx_info->tid);
+	if (u_ctx)
+		xa_erase(&u_ctx->tid_list, tx_info->tid);
 
 put_module:
 	/* release module refcount */