diff mbox series

[RFC,3/3] mm: Implement for_each_valid_pfn() for CONFIG_SPARSEMEM

Message ID 20250402201841.3245371-3-dwmw2@infradead.org (mailing list archive)
State New
Headers show
Series [RFC,1/3] mm: Introduce for_each_valid_pfn() and use it from reserve_bootmem_region() | expand

Commit Message

David Woodhouse April 2, 2025, 8:18 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

Introduce a pfn_first_valid() helper which takes a pointer to the PFN and
updates it to point to the first valid PFN starting from that point, and
returns true if a valid PFN was found.

This largely mirrors pfn_valid(), calling into a pfn_section_first_valid()
helper which is trivial for the !CONFIG_SPARSEMEM_VMEMMAP case, and in
the VMEMMAP case will skip to the next subsection as needed.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 include/linux/mmzone.h | 65 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

Comments

Mike Rapoport April 3, 2025, 6:24 a.m. UTC | #1
On Wed, Apr 02, 2025 at 09:18:41PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Introduce a pfn_first_valid() helper which takes a pointer to the PFN and
> updates it to point to the first valid PFN starting from that point, and
> returns true if a valid PFN was found.
> 
> This largely mirrors pfn_valid(), calling into a pfn_section_first_valid()
> helper which is trivial for the !CONFIG_SPARSEMEM_VMEMMAP case, and in
> the VMEMMAP case will skip to the next subsection as needed.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

with a small nit below

> +static inline bool first_valid_pfn(unsigned long *p_pfn)
> +{
> +	unsigned long pfn = *p_pfn;
> +	unsigned long nr = pfn_to_section_nr(pfn);
> +	struct mem_section *ms;
> +	bool ret = false;
> +
> +	ms = __pfn_to_section(pfn);
> +
> +	rcu_read_lock_sched();
> +
> +	while (!ret && nr <= __highest_present_section_nr) {

This could be just for(;;), we anyway break when ret becomes true or we get
past last present section.

> +		if (valid_section(ms) &&
> +		    (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
> +			ret = true;
> +			break;
> +		}
> +
> +		nr++;
> +		if (nr > __highest_present_section_nr)
> +			break;
> +
> +		pfn = section_nr_to_pfn(nr);
> +		ms = __pfn_to_section(pfn);
> +	}
> +
> +	rcu_read_unlock_sched();
> +
> +	*p_pfn = pfn;
> +
> +	return ret;
> +}
> +
> +#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn)	       \
> +	for ((_pfn) = (_start_pfn);			       \
> +	     first_valid_pfn(&(_pfn)) && (_pfn) < (_end_pfn);  \
> +	     (_pfn)++)
> +
>  #endif
>  
>  static inline int pfn_in_present_section(unsigned long pfn)
> -- 
> 2.49.0
>
David Woodhouse April 3, 2025, 7:07 a.m. UTC | #2
On Thu, 2025-04-03 at 09:24 +0300, Mike Rapoport wrote:
> On Wed, Apr 02, 2025 at 09:18:41PM +0100, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > Introduce a pfn_first_valid() helper which takes a pointer to the
> > PFN and
> > updates it to point to the first valid PFN starting from that
> > point, and
> > returns true if a valid PFN was found.
> > 
> > This largely mirrors pfn_valid(), calling into a
> > pfn_section_first_valid()
> > helper which is trivial for the !CONFIG_SPARSEMEM_VMEMMAP case, and
> > in
> > the VMEMMAP case will skip to the next subsection as needed.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> 
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Thanks.

> with a small nit below
> 
> > +static inline bool first_valid_pfn(unsigned long *p_pfn)
> > +{
> > +	unsigned long pfn = *p_pfn;
> > +	unsigned long nr = pfn_to_section_nr(pfn);
> > +	struct mem_section *ms;
> > +	bool ret = false;
> > +
> > +	ms = __pfn_to_section(pfn);
> > +
> > +	rcu_read_lock_sched();
> > +
> > +	while (!ret && nr <= __highest_present_section_nr) {
> 
> This could be just for(;;), we anyway break when ret becomes true or we get
> past last present section.

True for the 'ret' part but not *nicely* for the last present section.
If the original pfn is higher than the last present section it could
trigger that check before entering the loop.

Yes, in that case 'ms' will be NULL, valid_section(NULL) is false and
you're right that it'll make it through to the check in the loop
without crashing. So it would currently be harmless, but I didn't like
it. It's relying on the loop not to do the wrong thing with an input
which is arguably invalid.

I'll see if I can make it neater. I may drop the 'ret' variable
completely and just turn the match clause into unlock-and-return-true.
I *like* having a single unlock site. But I think I like simpler loop
code more than that.

FWIW I think the check for (PHYS_PFN(PFN_PHYS(pfn)) != pfn) at the
start of pfn_valid() a few lines above is similarly redundant. Because
if the high bits are set in the PFN then pfn_to_section_nr(pfn) is
surely going to be higher than NR_MEM_SECTIONS and it'll get thrown out
at the very next check, won't it?

I care because I didn't bother to duplicate that 'redundant' check in
my first_valid_pfn(), so if there's a reason for it that I'm missing, I
should take a closer look.

I'm also missing the reason why the FLATMEM code in memory_model.h does
'unsigned long pfn_offset = ARCH_PFN_OFFSET' and then uses its local
pfn_offset variable, instead of just using ARCH_PFN_OFFSET directly as
I do in the FLATMEM for_each_valid_pfn() macro.

> > +		if (valid_section(ms) &&
> > +		    (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
> > +			ret = true;
> > +			break;
> > +		}
> > +
> > +		nr++;
> > +		if (nr > __highest_present_section_nr)
> > +			break;
> > +
> > +		pfn = section_nr_to_pfn(nr);
> > +		ms = __pfn_to_section(pfn);
> > +	}
> > +
> > +	rcu_read_unlock_sched();
> > +
> > +	*p_pfn = pfn;
> > +
> > +	return ret;
> > +}
> > +
> > +#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn)	       \
> > +	for ((_pfn) = (_start_pfn);			       \
> > +	     first_valid_pfn(&(_pfn)) && (_pfn) < (_end_pfn);  \
> > +	     (_pfn)++)
> > +
> >  #endif
> >  
> >  static inline int pfn_in_present_section(unsigned long pfn)
> > -- 
> > 2.49.0
> > 
>
David Woodhouse April 3, 2025, 7:15 a.m. UTC | #3
On Thu, 2025-04-03 at 08:07 +0100, David Woodhouse wrote:
> 
> I'll see if I can make it neater. I may drop the 'ret' variable
> completely and just turn the match clause into unlock-and-return-true.
> I *like* having a single unlock site. But I think I like simpler loop
> code more than that.

That's better (IMO).

And I note that pfn_valid() already doesn't follow the modern fetish
for having only one unlock site even when it makes the surrounding code
more complex to do so.

static inline bool first_valid_pfn(unsigned long *p_pfn)
{
	unsigned long pfn = *p_pfn;
	unsigned long nr = pfn_to_section_nr(pfn);
	struct mem_section *ms;

	rcu_read_lock_sched();

	while (nr <= __highest_present_section_nr) {
		ms = __pfn_to_section(pfn);

		if (valid_section(ms) &&
		    (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
			*p_pfn = pfn;
			rcu_read_unlock_sched();
			return true;
		}

		/* Nothing left in this section? Skip to next section */
		nr++;
		pfn = section_nr_to_pfn(nr);
	}

	rcu_read_unlock_sched();

	return false;
}
Mike Rapoport April 3, 2025, 2:10 p.m. UTC | #4
On Thu, Apr 03, 2025 at 08:07:22AM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 09:24 +0300, Mike Rapoport wrote:
> > with a small nit below
> > 
> > > +static inline bool first_valid_pfn(unsigned long *p_pfn)
> > > +{
> > > +	unsigned long pfn = *p_pfn;
> > > +	unsigned long nr = pfn_to_section_nr(pfn);
> > > +	struct mem_section *ms;
> > > +	bool ret = false;
> > > +
> > > +	ms = __pfn_to_section(pfn);
> > > +
> > > +	rcu_read_lock_sched();
> > > +
> > > +	while (!ret && nr <= __highest_present_section_nr) {
> > 
> > This could be just for(;;), we anyway break when ret becomes true or we get
> > past last present section.
> 
> True for the 'ret' part but not *nicely* for the last present section.
> If the original pfn is higher than the last present section it could
> trigger that check before entering the loop.
> 
> Yes, in that case 'ms' will be NULL, valid_section(NULL) is false and
> you're right that it'll make it through to the check in the loop
> without crashing. So it would currently be harmless, but I didn't like
> it. It's relying on the loop not to do the wrong thing with an input
> which is arguably invalid.
> 
> I'll see if I can make it neater. I may drop the 'ret' variable
> completely and just turn the match clause into unlock-and-return-true.
> I *like* having a single unlock site. But I think I like simpler loop
> code more than that.
> 
> FWIW I think the check for (PHYS_PFN(PFN_PHYS(pfn)) != pfn) at the
> start of pfn_valid() a few lines above is similarly redundant. Because
> if the high bits are set in the PFN then pfn_to_section_nr(pfn) is
> surely going to be higher than NR_MEM_SECTIONS and it'll get thrown out
> at the very next check, won't it?

I believe the check for (PHYS_PFN(PFN_PHYS(pfn)) != pfn) got to the generic
version from arm64::pfn_valid() that historically supported both FLATMEM
and SPARSEMEM.

I can't think of a configuration in which (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
and pfn_to_section_nr(pfn) won't be higher than NR_MEM_SECTIONS, but with
all variants that arm64 has for PAGE_SHIFT and ARM64_PA_BITS I could miss
something.
 
> I care because I didn't bother to duplicate that 'redundant' check in
> my first_valid_pfn(), so if there's a reason for it that I'm missing, I
> should take a closer look.
> 
> I'm also missing the reason why the FLATMEM code in memory_model.h does
> 'unsigned long pfn_offset = ARCH_PFN_OFFSET' and then uses its local
> pfn_offset variable, instead of just using ARCH_PFN_OFFSET directly as
> I do in the FLATMEM for_each_valid_pfn() macro.

Don't remember now, but I surely had some $REASON for that :)
Mike Rapoport April 3, 2025, 2:13 p.m. UTC | #5
On Thu, Apr 03, 2025 at 08:15:41AM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 08:07 +0100, David Woodhouse wrote:
> > 
> > I'll see if I can make it neater. I may drop the 'ret' variable
> > completely and just turn the match clause into unlock-and-return-true.
> > I *like* having a single unlock site. But I think I like simpler loop
> > code more than that.
> 
> That's better (IMO).
> 
> And I note that pfn_valid() already doesn't follow the modern fetish
> for having only one unlock site even when it makes the surrounding code
> more complex to do so.
> 
> static inline bool first_valid_pfn(unsigned long *p_pfn)
> {
> 	unsigned long pfn = *p_pfn;
> 	unsigned long nr = pfn_to_section_nr(pfn);
> 	struct mem_section *ms;
> 
> 	rcu_read_lock_sched();
> 
> 	while (nr <= __highest_present_section_nr) {
> 		ms = __pfn_to_section(pfn);

Maybe move the declaration here:

 	struct mem_section *ms = __pfn_to_section(pfn);

> 
> 		if (valid_section(ms) &&
> 		    (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
> 			*p_pfn = pfn;
> 			rcu_read_unlock_sched();
> 			return true;
> 		}
> 
> 		/* Nothing left in this section? Skip to next section */
> 		nr++;
> 		pfn = section_nr_to_pfn(nr);
> 	}
> 
> 	rcu_read_unlock_sched();
> 
> 	return false;
> }
David Woodhouse April 3, 2025, 2:17 p.m. UTC | #6
On Thu, 2025-04-03 at 17:13 +0300, Mike Rapoport wrote:
>  
> > static inline bool first_valid_pfn(unsigned long *p_pfn)
> > {
> >  	unsigned long pfn = *p_pfn;
> >  	unsigned long nr = pfn_to_section_nr(pfn);
> >  	struct mem_section *ms;
> > 
> >  	rcu_read_lock_sched();
> > 
> >  	while (nr <= __highest_present_section_nr) {
> >  		ms = __pfn_to_section(pfn);
> 
> Maybe move the declaration here:
> 
>  	struct mem_section *ms = __pfn_to_section(pfn);

Ack.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn
Mike Rapoport April 3, 2025, 2:25 p.m. UTC | #7
On Thu, Apr 03, 2025 at 03:17:44PM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 17:13 +0300, Mike Rapoport wrote:
> >  
> > > static inline bool first_valid_pfn(unsigned long *p_pfn)
> > > {
> > >  	unsigned long pfn = *p_pfn;
> > >  	unsigned long nr = pfn_to_section_nr(pfn);
> > >  	struct mem_section *ms;
> > > 
> > >  	rcu_read_lock_sched();
> > > 
> > >  	while (nr <= __highest_present_section_nr) {
> > >  		ms = __pfn_to_section(pfn);
> > 
> > Maybe move the declaration here:
> > 
> >  	struct mem_section *ms = __pfn_to_section(pfn);
> 
> Ack.
> 
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn

Fine with me, keep the RB tag :)
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32ecb5cadbaf..a389d1857b85 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2074,11 +2074,37 @@  static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
 
 	return usage ? test_bit(idx, usage->subsection_map) : 0;
 }
+
+static inline bool pfn_section_first_valid(struct mem_section *ms, unsigned long *pfn)
+{
+	struct mem_section_usage *usage = READ_ONCE(ms->usage);
+	int idx = subsection_map_index(*pfn);
+	unsigned long bit;
+
+	if (!usage)
+		return false;
+
+	if (test_bit(idx, usage->subsection_map))
+		return true;
+
+	/* Find the next subsection that exists */
+	bit = find_next_bit(usage->subsection_map, SUBSECTIONS_PER_SECTION, idx);
+	if (bit == SUBSECTIONS_PER_SECTION)
+		return false;
+
+	*pfn = (*pfn & PAGE_SECTION_MASK) + (bit * PAGES_PER_SUBSECTION);
+	return true;
+}
 #else
 static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
 {
 	return 1;
 }
+
+static inline bool pfn_section_first_valid(struct mem_section *ms, unsigned long *pfn)
+{
+	return true;
+}
 #endif
 
 void sparse_init_early_section(int nid, struct page *map, unsigned long pnum,
@@ -2127,6 +2153,45 @@  static inline int pfn_valid(unsigned long pfn)
 
 	return ret;
 }
+
+static inline bool first_valid_pfn(unsigned long *p_pfn)
+{
+	unsigned long pfn = *p_pfn;
+	unsigned long nr = pfn_to_section_nr(pfn);
+	struct mem_section *ms;
+	bool ret = false;
+
+	ms = __pfn_to_section(pfn);
+
+	rcu_read_lock_sched();
+
+	while (!ret && nr <= __highest_present_section_nr) {
+		if (valid_section(ms) &&
+		    (early_section(ms) || pfn_section_first_valid(ms, &pfn))) {
+			ret = true;
+			break;
+		}
+
+		nr++;
+		if (nr > __highest_present_section_nr)
+			break;
+
+		pfn = section_nr_to_pfn(nr);
+		ms = __pfn_to_section(pfn);
+	}
+
+	rcu_read_unlock_sched();
+
+	*p_pfn = pfn;
+
+	return ret;
+}
+
+#define for_each_valid_pfn(_pfn, _start_pfn, _end_pfn)	       \
+	for ((_pfn) = (_start_pfn);			       \
+	     first_valid_pfn(&(_pfn)) && (_pfn) < (_end_pfn);  \
+	     (_pfn)++)
+
 #endif
 
 static inline int pfn_in_present_section(unsigned long pfn)