diff mbox series

[v2,1/2] THP: avoid lock when check whether THP is in deferred list

Message ID 20230425084627.3573866-2-fengwei.yin@intel.com (mailing list archive)
State New
Headers show
Series Reduce lock contention related with large folio | expand

Commit Message

Yin Fengwei April 25, 2023, 8:46 a.m. UTC
free_transhuge_page() acquires split queue lock then check
whether the THP was added to deferred list or not.

It's safe to check whether the THP is in deferred list or not.
   When code hit free_transhuge_page(), there is no one tries
   to update the folio's _deferred_list.

   If folio is not in deferred_list, it's safe to check without
   acquiring lock.

   If folio is in deferred_list, the other node in deferred_list
   adding/deleteing doesn't impact the return value of
   list_epmty(@folio->_deferred_list).

Running page_fault1 of will-it-scale + order 2 folio for anonymous
mapping with 96 processes on an Ice Lake 48C/96T test box, we could
see the 61% split_queue_lock contention:
-   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
    release_pages
   - 70.93% release_pages
      - 61.42% free_transhuge_page
         + 60.77% _raw_spin_lock_irqsave

With this patch applied, the split_queue_lock contention is less
than 1%.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
---
 mm/huge_memory.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

Comments

Kirill A. Shutemov April 25, 2023, 12:38 p.m. UTC | #1
On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not.
> 
> It's safe to check whether the THP is in deferred list or not.
>    When code hit free_transhuge_page(), there is no one tries
>    to update the folio's _deferred_list.
> 
>    If folio is not in deferred_list, it's safe to check without
>    acquiring lock.
> 
>    If folio is in deferred_list, the other node in deferred_list
>    adding/deleteing doesn't impact the return value of
>    list_epmty(@folio->_deferred_list).

Typo.

> 
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>     release_pages
>    - 70.93% release_pages
>       - 61.42% free_transhuge_page
>          + 60.77% _raw_spin_lock_irqsave
> 
> With this patch applied, the split_queue_lock contention is less
> than 1%.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/huge_memory.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..c620f1f12247 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> +	/*
> +	 * At this point, there is no one trying to queue the folio
> +	 * to deferred_list. folio->_deferred_list is not possible
> +	 * being updated.
> +	 *
> +	 * If folio is already added to deferred_list, add/delete to/from
> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> +	 * acquiring the lock.
> +	 *
> +	 * If folio is not in deferred_list, it's safe to check without
> +	 * acquiring the lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);

Recheck under lock?

>  		ds_queue->split_queue_len--;
>  		list_del(&folio->_deferred_list);
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }
>  
> -- 
> 2.30.2
> 
>
Huang, Ying April 26, 2023, 1:13 a.m. UTC | #2
Yin Fengwei <fengwei.yin@intel.com> writes:

> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not.
>
> It's safe to check whether the THP is in deferred list or not.
>    When code hit free_transhuge_page(), there is no one tries
>    to update the folio's _deferred_list.

I think that it's clearer to enumerate all places pages are added and
removed from deferred list.  Then we can find out whether there's code
path that may race with this.

Take a glance at the search result of `grep split_queue_lock -r mm`.  It
seems that deferred_split_scan() may race with free_transhuge_page(), so
we need to recheck with the lock held as Kirill pointed out.

Best Regards,
Huang, Ying

>    If folio is not in deferred_list, it's safe to check without
>    acquiring lock.
>
>    If folio is in deferred_list, the other node in deferred_list
>    adding/deleteing doesn't impact the return value of
>    list_epmty(@folio->_deferred_list).
>
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>     release_pages
>    - 70.93% release_pages
>       - 61.42% free_transhuge_page
>          + 60.77% _raw_spin_lock_irqsave
>
> With this patch applied, the split_queue_lock contention is less
> than 1%.
>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/huge_memory.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..c620f1f12247 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> +	/*
> +	 * At this point, there is no one trying to queue the folio
> +	 * to deferred_list. folio->_deferred_list is not possible
> +	 * being updated.
> +	 *
> +	 * If folio is already added to deferred_list, add/delete to/from
> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> +	 * acquiring the lock.
> +	 *
> +	 * If folio is not in deferred_list, it's safe to check without
> +	 * acquiring the lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>  		ds_queue->split_queue_len--;
>  		list_del(&folio->_deferred_list);
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }
Yin Fengwei April 26, 2023, 1:47 a.m. UTC | #3
On 4/25/23 20:38, Kirill A. Shutemov wrote:
> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
>>
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
> 
> Typo.
Oops. Will correct it in next version.

> 
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 
> Recheck under lock?
My understanding is even there is race, the race doesn't impact the
correctness of list_epmty(@folio->_deferred_list).
  - If the folio is not in deferred_list, list_empty() always return
    true.
  - If the folio is in deferred_list, even the element near the folio
    is being added/removed deferred_list, the list_empty() always return
    false.

There is one precondition:
  No other user adds/removes the folio to/from deferred_list concurrently.

I think it's true for free_transhuge_page() so recheck after take the lock
is not necessary. Thanks

Regards
Yin, Fengwei

> 
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
>
Yin Fengwei April 26, 2023, 1:48 a.m. UTC | #4
On 4/26/23 09:13, Huang, Ying wrote:
> Yin Fengwei <fengwei.yin@intel.com> writes:
> 
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
> 
> I think that it's clearer to enumerate all places pages are added and
> removed from deferred list.  Then we can find out whether there's code
> path that may race with this.
> 
> Take a glance at the search result of `grep split_queue_lock -r mm`.  It
> seems that deferred_split_scan() may race with free_transhuge_page(), so
> we need to recheck with the lock held as Kirill pointed out.
My understanding is the check after take the lock is not necessary. See
my reply to Kirill. Thanks.


Regards
Yin, Fengwei

> 
> Best Regards,
> Huang, Ying
> 
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
Yin Fengwei April 26, 2023, 2:08 a.m. UTC | #5
On 4/25/23 20:38, Kirill A. Shutemov wrote:
> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
>>
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
> 
> Typo.
> 
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 
> Recheck under lock?
Huang Ying pointed out the race with deferred_split_scan(). And Yes. Need
recheck under lock. Will update in next version.


Regards
Yin, Fengwei

> 
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
>
Ryan Roberts April 26, 2023, 8:11 a.m. UTC | #6
On 25/04/2023 09:46, Yin Fengwei wrote:
> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not.
> 
> It's safe to check whether the THP is in deferred list or not.
>    When code hit free_transhuge_page(), there is no one tries
>    to update the folio's _deferred_list.
> 
>    If folio is not in deferred_list, it's safe to check without
>    acquiring lock.
> 
>    If folio is in deferred_list, the other node in deferred_list
>    adding/deleteing doesn't impact the return value of
>    list_epmty(@folio->_deferred_list).
> 
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>     release_pages
>    - 70.93% release_pages
>       - 61.42% free_transhuge_page
>          + 60.77% _raw_spin_lock_irqsave
> 
> With this patch applied, the split_queue_lock contention is less
> than 1%.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/huge_memory.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..c620f1f12247 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> +	/*
> +	 * At this point, there is no one trying to queue the folio
> +	 * to deferred_list. folio->_deferred_list is not possible
> +	 * being updated.
> +	 *
> +	 * If folio is already added to deferred_list, add/delete to/from
> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> +	 * acquiring the lock.
> +	 *
> +	 * If folio is not in deferred_list, it's safe to check without
> +	 * acquiring the lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>  		ds_queue->split_queue_len--;
>  		list_del(&folio->_deferred_list);

I wonder if there is a race here? Could the folio have been in the deferred list
when checking, but then something removed it from the list before the lock is
taken? In this case, I guess split_queue_len would be out of sync with the
number of folios in the queue? Perhaps recheck list_empty() after taking the lock?

Thanks,
Ryan


> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }
>
Ryan Roberts April 26, 2023, 8:17 a.m. UTC | #7
On 26/04/2023 03:08, Yin Fengwei wrote:
> 
> 
> On 4/25/23 20:38, Kirill A. Shutemov wrote:
>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>> free_transhuge_page() acquires split queue lock then check
>>> whether the THP was added to deferred list or not.
>>>
>>> It's safe to check whether the THP is in deferred list or not.
>>>    When code hit free_transhuge_page(), there is no one tries
>>>    to update the folio's _deferred_list.
>>>
>>>    If folio is not in deferred_list, it's safe to check without
>>>    acquiring lock.
>>>
>>>    If folio is in deferred_list, the other node in deferred_list
>>>    adding/deleteing doesn't impact the return value of
>>>    list_epmty(@folio->_deferred_list).
>>
>> Typo.
>>
>>>
>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>> see the 61% split_queue_lock contention:
>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>     release_pages
>>>    - 70.93% release_pages
>>>       - 61.42% free_transhuge_page
>>>          + 60.77% _raw_spin_lock_irqsave
>>>
>>> With this patch applied, the split_queue_lock contention is less
>>> than 1%.
>>>
>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 032fb0ef9cd1..c620f1f12247 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>  	unsigned long flags;
>>>  
>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>> -	if (!list_empty(&folio->_deferred_list)) {
>>> +	/*
>>> +	 * At this point, there is no one trying to queue the folio
>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>> +	 * being updated.
>>> +	 *
>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>> +	 * acquiring the lock.
>>> +	 *
>>> +	 * If folio is not in deferred_list, it's safe to check without
>>> +	 * acquiring the lock.
>>> +	 */
>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>
>> Recheck under lock?
> Huang Ying pointed out the race with deferred_split_scan(). And Yes. Need
> recheck under lock. Will update in next version.

Oops sorry - I see this was already pointed out. Disregard my previous mail.

Thanks,
Ryan


> 
> 
> Regards
> Yin, Fengwei
> 
>>
>>>  		ds_queue->split_queue_len--;
>>>  		list_del(&folio->_deferred_list);
>>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>>  	}
>>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>>  	free_compound_page(page);
>>>  }
>>>  
>>> -- 
>>> 2.30.2
>>>
>>>
>>
Yin Fengwei April 28, 2023, 6:28 a.m. UTC | #8
Hi Kirill,

On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
>>
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
> 
> Typo.
> 
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 
> Recheck under lock?
In function deferred_split_scan(), there is following code block:
                if (folio_try_get(folio)) {
                        list_move(&folio->_deferred_list, &list);
                } else {
                        /* We lost race with folio_put() */
                        list_del_init(&folio->_deferred_list);
                        ds_queue->split_queue_len--;
                }

I am wondering what kind of "lost race with folio_put()" can be.

My understanding is that it's not necessary to handle this case here
because free_transhuge_page() will handle it once folio get zero ref.
But I must miss something here. Thanks.


Regards
Yin, Fengwei

> 
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
>
Kirill A. Shutemov April 28, 2023, 2:02 p.m. UTC | #9
On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
> Hi Kirill,
> 
> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
> > On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
> >> free_transhuge_page() acquires split queue lock then check
> >> whether the THP was added to deferred list or not.
> >>
> >> It's safe to check whether the THP is in deferred list or not.
> >>    When code hit free_transhuge_page(), there is no one tries
> >>    to update the folio's _deferred_list.
> >>
> >>    If folio is not in deferred_list, it's safe to check without
> >>    acquiring lock.
> >>
> >>    If folio is in deferred_list, the other node in deferred_list
> >>    adding/deleteing doesn't impact the return value of
> >>    list_epmty(@folio->_deferred_list).
> > 
> > Typo.
> > 
> >>
> >> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> >> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> >> see the 61% split_queue_lock contention:
> >> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
> >>     release_pages
> >>    - 70.93% release_pages
> >>       - 61.42% free_transhuge_page
> >>          + 60.77% _raw_spin_lock_irqsave
> >>
> >> With this patch applied, the split_queue_lock contention is less
> >> than 1%.
> >>
> >> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> >> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >>  mm/huge_memory.c | 19 ++++++++++++++++---
> >>  1 file changed, 16 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 032fb0ef9cd1..c620f1f12247 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
> >>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >>  	unsigned long flags;
> >>  
> >> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >> -	if (!list_empty(&folio->_deferred_list)) {
> >> +	/*
> >> +	 * At this point, there is no one trying to queue the folio
> >> +	 * to deferred_list. folio->_deferred_list is not possible
> >> +	 * being updated.
> >> +	 *
> >> +	 * If folio is already added to deferred_list, add/delete to/from
> >> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> >> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> >> +	 * acquiring the lock.
> >> +	 *
> >> +	 * If folio is not in deferred_list, it's safe to check without
> >> +	 * acquiring the lock.
> >> +	 */
> >> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> >> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> > 
> > Recheck under lock?
> In function deferred_split_scan(), there is following code block:
>                 if (folio_try_get(folio)) {
>                         list_move(&folio->_deferred_list, &list);
>                 } else {
>                         /* We lost race with folio_put() */
>                         list_del_init(&folio->_deferred_list);
>                         ds_queue->split_queue_len--;
>                 }
> 
> I am wondering what kind of "lost race with folio_put()" can be.
> 
> My understanding is that it's not necessary to handle this case here
> because free_transhuge_page() will handle it once folio get zero ref.
> But I must miss something here. Thanks.

free_transhuge_page() got when refcount is already zero. Both
deferred_split_scan() and free_transhuge_page() can see the page with zero
refcount. The check makes deferred_split_scan() to leave the page to the
free_transhuge_page().
Yin Fengwei April 29, 2023, 8:32 a.m. UTC | #10
Hi Kirill,

On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
> On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
>> Hi Kirill,
>>
>> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
>>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>>> free_transhuge_page() acquires split queue lock then check
>>>> whether the THP was added to deferred list or not.
>>>>
>>>> It's safe to check whether the THP is in deferred list or not.
>>>>    When code hit free_transhuge_page(), there is no one tries
>>>>    to update the folio's _deferred_list.
>>>>
>>>>    If folio is not in deferred_list, it's safe to check without
>>>>    acquiring lock.
>>>>
>>>>    If folio is in deferred_list, the other node in deferred_list
>>>>    adding/deleteing doesn't impact the return value of
>>>>    list_epmty(@folio->_deferred_list).
>>>
>>> Typo.
>>>
>>>>
>>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>>> see the 61% split_queue_lock contention:
>>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>>     release_pages
>>>>    - 70.93% release_pages
>>>>       - 61.42% free_transhuge_page
>>>>          + 60.77% _raw_spin_lock_irqsave
>>>>
>>>> With this patch applied, the split_queue_lock contention is less
>>>> than 1%.
>>>>
>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 032fb0ef9cd1..c620f1f12247 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>  	unsigned long flags;
>>>>  
>>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>> -	if (!list_empty(&folio->_deferred_list)) {
>>>> +	/*
>>>> +	 * At this point, there is no one trying to queue the folio
>>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>>> +	 * being updated.
>>>> +	 *
>>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>>> +	 * acquiring the lock.
>>>> +	 *
>>>> +	 * If folio is not in deferred_list, it's safe to check without
>>>> +	 * acquiring the lock.
>>>> +	 */
>>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>
>>> Recheck under lock?
>> In function deferred_split_scan(), there is following code block:
>>                 if (folio_try_get(folio)) {
>>                         list_move(&folio->_deferred_list, &list);
>>                 } else {
>>                         /* We lost race with folio_put() */
>>                         list_del_init(&folio->_deferred_list);
>>                         ds_queue->split_queue_len--;
>>                 }
>>
>> I am wondering what kind of "lost race with folio_put()" can be.
>>
>> My understanding is that it's not necessary to handle this case here
>> because free_transhuge_page() will handle it once folio get zero ref.
>> But I must miss something here. Thanks.
> 
> free_transhuge_page() got when refcount is already zero. Both
> deferred_split_scan() and free_transhuge_page() can see the page with zero
> refcount. The check makes deferred_split_scan() to leave the page to the
> free_transhuge_page().
> 
If deferred_split_scan() leaves the page to free_transhuge_page(), is it
necessary to do
        list_del_init(&folio->_deferred_list);
        ds_queue->split_queue_len--;

Can these two line be left to free_transhuge_page() either? Thanks.

Regards
Yin, Fengwei
Kirill A. Shutemov April 29, 2023, 8:46 a.m. UTC | #11
On Sat, Apr 29, 2023 at 04:32:34PM +0800, Yin, Fengwei wrote:
> Hi Kirill,
> 
> On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
> > On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
> >> Hi Kirill,
> >>
> >> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
> >>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
> >>>> free_transhuge_page() acquires split queue lock then check
> >>>> whether the THP was added to deferred list or not.
> >>>>
> >>>> It's safe to check whether the THP is in deferred list or not.
> >>>>    When code hit free_transhuge_page(), there is no one tries
> >>>>    to update the folio's _deferred_list.
> >>>>
> >>>>    If folio is not in deferred_list, it's safe to check without
> >>>>    acquiring lock.
> >>>>
> >>>>    If folio is in deferred_list, the other node in deferred_list
> >>>>    adding/deleteing doesn't impact the return value of
> >>>>    list_epmty(@folio->_deferred_list).
> >>>
> >>> Typo.
> >>>
> >>>>
> >>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> >>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> >>>> see the 61% split_queue_lock contention:
> >>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
> >>>>     release_pages
> >>>>    - 70.93% release_pages
> >>>>       - 61.42% free_transhuge_page
> >>>>          + 60.77% _raw_spin_lock_irqsave
> >>>>
> >>>> With this patch applied, the split_queue_lock contention is less
> >>>> than 1%.
> >>>>
> >>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> >>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>> ---
> >>>>  mm/huge_memory.c | 19 ++++++++++++++++---
> >>>>  1 file changed, 16 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index 032fb0ef9cd1..c620f1f12247 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
> >>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >>>>  	unsigned long flags;
> >>>>  
> >>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >>>> -	if (!list_empty(&folio->_deferred_list)) {
> >>>> +	/*
> >>>> +	 * At this point, there is no one trying to queue the folio
> >>>> +	 * to deferred_list. folio->_deferred_list is not possible
> >>>> +	 * being updated.
> >>>> +	 *
> >>>> +	 * If folio is already added to deferred_list, add/delete to/from
> >>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> >>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> >>>> +	 * acquiring the lock.
> >>>> +	 *
> >>>> +	 * If folio is not in deferred_list, it's safe to check without
> >>>> +	 * acquiring the lock.
> >>>> +	 */
> >>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> >>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >>>
> >>> Recheck under lock?
> >> In function deferred_split_scan(), there is following code block:
> >>                 if (folio_try_get(folio)) {
> >>                         list_move(&folio->_deferred_list, &list);
> >>                 } else {
> >>                         /* We lost race with folio_put() */
> >>                         list_del_init(&folio->_deferred_list);
> >>                         ds_queue->split_queue_len--;
> >>                 }
> >>
> >> I am wondering what kind of "lost race with folio_put()" can be.
> >>
> >> My understanding is that it's not necessary to handle this case here
> >> because free_transhuge_page() will handle it once folio get zero ref.
> >> But I must miss something here. Thanks.
> > 
> > free_transhuge_page() got when refcount is already zero. Both
> > deferred_split_scan() and free_transhuge_page() can see the page with zero
> > refcount. The check makes deferred_split_scan() to leave the page to the
> > free_transhuge_page().
> > 
> If deferred_split_scan() leaves the page to free_transhuge_page(), is it
> necessary to do
>         list_del_init(&folio->_deferred_list);
>         ds_queue->split_queue_len--;
> 
> Can these two line be left to free_transhuge_page() either? Thanks.

I *think* (my cache is cold on deferred split) we can. But since we
already hold the lock, why not take care of it? It makes your change more
efficient.
Yin Fengwei May 1, 2023, 5:50 a.m. UTC | #12
Hi Kirill,

On 4/29/2023 4:46 PM, Kirill A. Shutemov wrote:
> On Sat, Apr 29, 2023 at 04:32:34PM +0800, Yin, Fengwei wrote:
>> Hi Kirill,
>>
>> On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
>>> On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
>>>> Hi Kirill,
>>>>
>>>> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
>>>>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>>>>> free_transhuge_page() acquires split queue lock then check
>>>>>> whether the THP was added to deferred list or not.
>>>>>>
>>>>>> It's safe to check whether the THP is in deferred list or not.
>>>>>>    When code hit free_transhuge_page(), there is no one tries
>>>>>>    to update the folio's _deferred_list.
>>>>>>
>>>>>>    If folio is not in deferred_list, it's safe to check without
>>>>>>    acquiring lock.
>>>>>>
>>>>>>    If folio is in deferred_list, the other node in deferred_list
>>>>>>    adding/deleteing doesn't impact the return value of
>>>>>>    list_epmty(@folio->_deferred_list).
>>>>>
>>>>> Typo.
>>>>>
>>>>>>
>>>>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>>>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>>>>> see the 61% split_queue_lock contention:
>>>>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>>>>     release_pages
>>>>>>    - 70.93% release_pages
>>>>>>       - 61.42% free_transhuge_page
>>>>>>          + 60.77% _raw_spin_lock_irqsave
>>>>>>
>>>>>> With this patch applied, the split_queue_lock contention is less
>>>>>> than 1%.
>>>>>>
>>>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> ---
>>>>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 032fb0ef9cd1..c620f1f12247 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>>  	unsigned long flags;
>>>>>>  
>>>>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>>>> -	if (!list_empty(&folio->_deferred_list)) {
>>>>>> +	/*
>>>>>> +	 * At this point, there is no one trying to queue the folio
>>>>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>>>>> +	 * being updated.
>>>>>> +	 *
>>>>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>>>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>>>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>>>>> +	 * acquiring the lock.
>>>>>> +	 *
>>>>>> +	 * If folio is not in deferred_list, it's safe to check without
>>>>>> +	 * acquiring the lock.
>>>>>> +	 */
>>>>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>>>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>>>
>>>>> Recheck under lock?
>>>> In function deferred_split_scan(), there is following code block:
>>>>                 if (folio_try_get(folio)) {
>>>>                         list_move(&folio->_deferred_list, &list);
>>>>                 } else {
>>>>                         /* We lost race with folio_put() */
>>>>                         list_del_init(&folio->_deferred_list);
>>>>                         ds_queue->split_queue_len--;
>>>>                 }
>>>>
>>>> I am wondering what kind of "lost race with folio_put()" can be.
>>>>
>>>> My understanding is that it's not necessary to handle this case here
>>>> because free_transhuge_page() will handle it once folio get zero ref.
>>>> But I must miss something here. Thanks.
>>>
>>> free_transhuge_page() got when refcount is already zero. Both
>>> deferred_split_scan() and free_transhuge_page() can see the page with zero
>>> refcount. The check makes deferred_split_scan() to leave the page to the
>>> free_transhuge_page().
>>>
>> If deferred_split_scan() leaves the page to free_transhuge_page(), is it
>> necessary to do
>>         list_del_init(&folio->_deferred_list);
>>         ds_queue->split_queue_len--;
>>
>> Can these two line be left to free_transhuge_page() either? Thanks.
> 
> I *think* (my cache is cold on deferred split) we can. But since we
> already hold the lock, why not take care of it? It makes your change more
> efficient.
Thanks a lot for your confirmation. I just wanted to make sure I understand
the race here correctly (I didn't notice this part of code before Ying pointed
it out).


Regards
Yin, Fengwei

>
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..c620f1f12247 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2799,12 +2799,25 @@  void free_transhuge_page(struct page *page)
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 	unsigned long flags;
 
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-	if (!list_empty(&folio->_deferred_list)) {
+	/*
+	 * At this point, there is no one trying to queue the folio
+	 * to deferred_list. folio->_deferred_list is not possible
+	 * being updated.
+	 *
+	 * If folio is already added to deferred_list, add/delete to/from
+	 * deferred_list will not impact list_empty(&folio->_deferred_list).
+	 * It's safe to check list_empty(&folio->_deferred_list) without
+	 * acquiring the lock.
+	 *
+	 * If folio is not in deferred_list, it's safe to check without
+	 * acquiring the lock.
+	 */
+	if (data_race(!list_empty(&folio->_deferred_list))) {
+		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 		ds_queue->split_queue_len--;
 		list_del(&folio->_deferred_list);
+		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	}
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	free_compound_page(page);
 }