diff mbox series

xfs: require an rcu grace period before inode recycle

Message ID 20220121142454.1994916-1-bfoster@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series xfs: require an rcu grace period before inode recycle | expand

Commit Message

Brian Foster Jan. 21, 2022, 2:24 p.m. UTC
The XFS inode allocation algorithm aggressively reuses recently
freed inodes. This is historical behavior that has been in place for
quite some time, since XFS was imported to mainline Linux. Once the
VFS adopted RCUwalk path lookups (also some time ago), this behavior
became slightly incompatible because the inode recycle path doesn't
isolate concurrent access to the inode from the VFS.

This has recently manifested as problems in the VFS when XFS happens
to change the type or properties of a recently unlinked inode while
still involved in an RCU lookup. For example, if the VFS refers to a
previous incarnation of a symlink inode, obtains the ->get_link()
callback from inode_operations, and the latter happens to change to
a non-symlink type via a recycle event, the ->get_link() callback
pointer is reset to NULL and the lookup results in a crash.

To avoid this class of problem, isolate in-core inodes for recycling
with an RCU grace period. This is the same level of protection the
VFS expects for inactivated inodes that are never reused, and so
guarantees no further concurrent access before the type or
properties of the inode change. We don't want an unconditional
synchronize_rcu() event here because that would result in a
significant performance impact to mixed inode allocation workloads.

Fortunately, we can take advantage of the recently added deferred
inactivation mechanism to mitigate the need for an RCU wait in most
cases. Deferred inactivation queues and batches the on-disk freeing
of recently destroyed inodes, and so significantly increases the
likelihood that a grace period has elapsed by the time an inode is
freed and observable by the allocation code as a reuse candidate.
Capture the current RCU grace period cookie at inode destroy time
and refer to it at allocation time to conditionally wait for an RCU
grace period if one hadn't expired in the meantime.  Since only
unlinked inodes are recycle candidates and unlinked inodes always
require inactivation, we only need to poll and assign RCU state in
the inactivation codepath. Slightly adjust struct xfs_inode to fit
the new field into padding holes that conveniently preexist in the
same cacheline as the deferred inactivation list.

Finally, note that the ideal long term solution here is to
rearchitect bits of XFS' internal inode lifecycle management such
that this additional stall point is not required, but this requires
more thought, time and work to address. This approach restores
functional correctness in the meantime.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Hi all,

Here's the RCU fixup patch for inode reuse that I've been playing with,
re: the vfs patch discussion [1]. I've put it in pretty much the most
basic form, but I think there are a couple aspects worth thinking about:

1. Use and frequency of start_poll_synchronize_rcu() (vs.
get_state_synchronize_rcu()). The former is a bit more active than the
latter in that it triggers the start of a grace period, when necessary.
This currently invokes per inode, which is the ideal frequency in
theory, but could be reduced, associated with the xfs_inogegc thresholds
in some manner, etc., if there is good reason to do that.

2. The rcu cookie lifecycle. This variant updates it on inactivation
queue and nowhere else because the RCU docs imply that counter rollover
is not a significant problem. In practice, I think this means that if an
inode is stamped at least once, and the counter rolls over, future
(non-inactivation, non-unlinked) eviction -> repopulation cycles could
trigger rcu syncs. I think this would require repeated
eviction/reinstantiation cycles within a small window to be noticeable,
so I'm not sure how likely this is to occur. We could be more defensive
by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
at recycle time, unconditionally refresh at destroy time (using
get_state_synchronize_rcu() for non-inactivation), etc.

Otherwise testing is ongoing, but this version at least survives an
fstests regression run.

Brian

[1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/

 fs/xfs/xfs_icache.c | 11 +++++++++++
 fs/xfs/xfs_inode.h  |  3 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Jan. 21, 2022, 5:26 p.m. UTC | #1
On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> The XFS inode allocation algorithm aggressively reuses recently
> freed inodes. This is historical behavior that has been in place for
> quite some time, since XFS was imported to mainline Linux. Once the
> VFS adopted RCUwalk path lookups (also some time ago), this behavior
> became slightly incompatible because the inode recycle path doesn't
> isolate concurrent access to the inode from the VFS.
> 
> This has recently manifested as problems in the VFS when XFS happens
> to change the type or properties of a recently unlinked inode while
> still involved in an RCU lookup. For example, if the VFS refers to a
> previous incarnation of a symlink inode, obtains the ->get_link()
> callback from inode_operations, and the latter happens to change to
> a non-symlink type via a recycle event, the ->get_link() callback
> pointer is reset to NULL and the lookup results in a crash.

Hmm, so I guess what you're saying is that if the memory buffer
allocation in ->get_link is slow enough, some other thread can free the
inode, drop it, reallocate it, and reinstantiate it (not as a symlink
this time) all before ->get_link's memory allocation call returns, after
which Bad Things Happen(tm)?

Can the lookup thread end up with the wrong inode->i_ops too?

> To avoid this class of problem, isolate in-core inodes for recycling
> with an RCU grace period. This is the same level of protection the
> VFS expects for inactivated inodes that are never reused, and so
> guarantees no further concurrent access before the type or
> properties of the inode change. We don't want an unconditional
> synchronize_rcu() event here because that would result in a
> significant performance impact to mixed inode allocation workloads.
> 
> Fortunately, we can take advantage of the recently added deferred
> inactivation mechanism to mitigate the need for an RCU wait in most
> cases. Deferred inactivation queues and batches the on-disk freeing
> of recently destroyed inodes, and so significantly increases the
> likelihood that a grace period has elapsed by the time an inode is
> freed and observable by the allocation code as a reuse candidate.
> Capture the current RCU grace period cookie at inode destroy time
> and refer to it at allocation time to conditionally wait for an RCU
> grace period if one hadn't expired in the meantime.  Since only
> unlinked inodes are recycle candidates and unlinked inodes always
> require inactivation,

Any inode can become a recycle candidate (i.e. RECLAIMABLE but otherwise
idle) but I think your point here is that unlinked inodes that become
recycling candidates can cause lookup threads to trip over symlinks, and
that's why we need to assign RCU state and poll on it, right?

(That wasn't a challenge, I'm just making sure I understand this
correctly.)

> we only need to poll and assign RCU state in
> the inactivation codepath. Slightly adjust struct xfs_inode to fit
> the new field into padding holes that conveniently preexist in the
> same cacheline as the deferred inactivation list.
> 
> Finally, note that the ideal long term solution here is to
> rearchitect bits of XFS' internal inode lifecycle management such
> that this additional stall point is not required, but this requires
> more thought, time and work to address. This approach restores
> functional correctness in the meantime.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Hi all,
> 
> Here's the RCU fixup patch for inode reuse that I've been playing with,
> re: the vfs patch discussion [1]. I've put it in pretty much the most
> basic form, but I think there are a couple aspects worth thinking about:
> 
> 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> get_state_synchronize_rcu()). The former is a bit more active than the
> latter in that it triggers the start of a grace period, when necessary.
> This currently invokes per inode, which is the ideal frequency in
> theory, but could be reduced, associated with the xfs_inogegc thresholds
> in some manner, etc., if there is good reason to do that.

If you rm -rf $path, do each of the inodes get a separate rcu state, or
do they share?

> 2. The rcu cookie lifecycle. This variant updates it on inactivation
> queue and nowhere else because the RCU docs imply that counter rollover
> is not a significant problem. In practice, I think this means that if an
> inode is stamped at least once, and the counter rolls over, future
> (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> trigger rcu syncs. I think this would require repeated
> eviction/reinstantiation cycles within a small window to be noticeable,
> so I'm not sure how likely this is to occur. We could be more defensive
> by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> at recycle time, unconditionally refresh at destroy time (using
> get_state_synchronize_rcu() for non-inactivation), etc.
> 
> Otherwise testing is ongoing, but this version at least survives an
> fstests regression run.
> 
> Brian
> 
> [1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/
> 
>  fs/xfs/xfs_icache.c | 11 +++++++++++
>  fs/xfs/xfs_inode.h  |  3 ++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index d019c98eb839..4931daa45ca4 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -349,6 +349,16 @@ xfs_iget_recycle(
>  	spin_unlock(&ip->i_flags_lock);
>  	rcu_read_unlock();
>  
> +	/*
> +	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
> +	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
> +	 * inode until a grace period has elapsed from the time the previous
> +	 * version of the inode was destroyed. In most cases a grace period has
> +	 * already elapsed if the inode was (deferred) inactivated, but
> +	 * synchronize here as a last resort to guarantee correctness.
> +	 */
> +	cond_synchronize_rcu(ip->i_destroy_gp);
> +
>  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
>  	error = xfs_reinit_inode(mp, inode);
>  	if (error) {
> @@ -2019,6 +2029,7 @@ xfs_inodegc_queue(
>  	trace_xfs_inode_set_need_inactive(ip);
>  	spin_lock(&ip->i_flags_lock);
>  	ip->i_flags |= XFS_NEED_INACTIVE;
> +	ip->i_destroy_gp = start_poll_synchronize_rcu();

Hmm.  The description says that we only need the rcu synchronization
when we're freeing an inode after its link count drops to zero, because
that's the vector for (say) the VFS inode ops actually changing due to
free/inactivate/reallocate/recycle while someone else is doing a lookup.

I'm a bit puzzled why this unconditionally starts an rcu grace period,
instead of done only if i_nlink==0; and why we call cond_synchronize_rcu
above unconditionally instead of checking for i_mode==0 (or whatever
state the cached inode is left in after it's freed)?

--D

>  	spin_unlock(&ip->i_flags_lock);
>  
>  	gc = get_cpu_ptr(mp->m_inodegc);
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index c447bf04205a..2153e3edbb86 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -40,8 +40,9 @@ typedef struct xfs_inode {
>  	/* Transaction and locking information. */
>  	struct xfs_inode_log_item *i_itemp;	/* logging information */
>  	mrlock_t		i_lock;		/* inode lock */
> -	atomic_t		i_pincount;	/* inode pin count */
>  	struct llist_node	i_gclist;	/* deferred inactivation list */
> +	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
> +	atomic_t		i_pincount;	/* inode pin count */
>  
>  	/*
>  	 * Bitsets of inode metadata that have been checked and/or are sick.
> -- 
> 2.31.1
>
Brian Foster Jan. 21, 2022, 6:33 p.m. UTC | #2
On Fri, Jan 21, 2022 at 09:26:03AM -0800, Darrick J. Wong wrote:
> On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > The XFS inode allocation algorithm aggressively reuses recently
> > freed inodes. This is historical behavior that has been in place for
> > quite some time, since XFS was imported to mainline Linux. Once the
> > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > became slightly incompatible because the inode recycle path doesn't
> > isolate concurrent access to the inode from the VFS.
> > 
> > This has recently manifested as problems in the VFS when XFS happens
> > to change the type or properties of a recently unlinked inode while
> > still involved in an RCU lookup. For example, if the VFS refers to a
> > previous incarnation of a symlink inode, obtains the ->get_link()
> > callback from inode_operations, and the latter happens to change to
> > a non-symlink type via a recycle event, the ->get_link() callback
> > pointer is reset to NULL and the lookup results in a crash.
> 
> Hmm, so I guess what you're saying is that if the memory buffer
> allocation in ->get_link is slow enough, some other thread can free the
> inode, drop it, reallocate it, and reinstantiate it (not as a symlink
> this time) all before ->get_link's memory allocation call returns, after
> which Bad Things Happen(tm)?
> 
> Can the lookup thread end up with the wrong inode->i_ops too?
> 

We really don't need to even get into the XFS symlink code to reason
about the fundamental form of this issue. Consider that an RCU walk
starts, locates a symlink inode, meanwhile XFS recycles that inode into
something completely different, then the VFS loads and calls
->get_link() (which is now NULL) on said inode and explodes. So the
presumption is that the VFS uses RCU protection to rely on some form of
stability of the inode (i.e., that the inode memory isn't freed,
callback vectors don't change, etc.).

Validity of the symlink content is a variant of that class of problem,
likely already addressed by the recent inline symlink change, but that
doesn't address the broader issue.

> > To avoid this class of problem, isolate in-core inodes for recycling
> > with an RCU grace period. This is the same level of protection the
> > VFS expects for inactivated inodes that are never reused, and so
> > guarantees no further concurrent access before the type or
> > properties of the inode change. We don't want an unconditional
> > synchronize_rcu() event here because that would result in a
> > significant performance impact to mixed inode allocation workloads.
> > 
> > Fortunately, we can take advantage of the recently added deferred
> > inactivation mechanism to mitigate the need for an RCU wait in most
> > cases. Deferred inactivation queues and batches the on-disk freeing
> > of recently destroyed inodes, and so significantly increases the
> > likelihood that a grace period has elapsed by the time an inode is
> > freed and observable by the allocation code as a reuse candidate.
> > Capture the current RCU grace period cookie at inode destroy time
> > and refer to it at allocation time to conditionally wait for an RCU
> > grace period if one hadn't expired in the meantime.  Since only
> > unlinked inodes are recycle candidates and unlinked inodes always
> > require inactivation,
> 
> Any inode can become a recycle candidate (i.e. RECLAIMABLE but otherwise
> idle) but I think your point here is that unlinked inodes that become
> recycling candidates can cause lookup threads to trip over symlinks, and
> that's why we need to assign RCU state and poll on it, right?
> 

Good point. When I wrote the commit log I was thinking of recycled
inodes as "reincarnated" inodes, so that wording could probably be
improved. But yes, the code is written minimally/simply so I was trying
to document that it's unlinked -> freed -> reallocated inodes that we
really care about here.

WRT to symlinks, I was trying to use that as an example and not
necessarily as the general reason for the patch. I.e., the general
reason is that the VFS uses rcu protection for inode stability (just as
for the inode free path), and the symlink thing is just an example of
how things can go wrong in the current implementation without it.

> (That wasn't a challenge, I'm just making sure I understand this
> correctly.)
> 
> > we only need to poll and assign RCU state in
> > the inactivation codepath. Slightly adjust struct xfs_inode to fit
> > the new field into padding holes that conveniently preexist in the
> > same cacheline as the deferred inactivation list.
> > 
> > Finally, note that the ideal long term solution here is to
> > rearchitect bits of XFS' internal inode lifecycle management such
> > that this additional stall point is not required, but this requires
> > more thought, time and work to address. This approach restores
> > functional correctness in the meantime.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > Hi all,
> > 
> > Here's the RCU fixup patch for inode reuse that I've been playing with,
> > re: the vfs patch discussion [1]. I've put it in pretty much the most
> > basic form, but I think there are a couple aspects worth thinking about:
> > 
> > 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> > get_state_synchronize_rcu()). The former is a bit more active than the
> > latter in that it triggers the start of a grace period, when necessary.
> > This currently invokes per inode, which is the ideal frequency in
> > theory, but could be reduced, associated with the xfs_inogegc thresholds
> > in some manner, etc., if there is good reason to do that.
> 
> If you rm -rf $path, do each of the inodes get a separate rcu state, or
> do they share?
> 

My previous experiments on a teardown grace period had me thinking
batching would occur, but I don't recall which RCU call I was using at
the time so I'd probably have to throw a tracepoint in there to dump
some of the grace period values and double check to be sure. (If this is
not the case, that might be a good reason to tweak things as discussed
above).

> > 2. The rcu cookie lifecycle. This variant updates it on inactivation
> > queue and nowhere else because the RCU docs imply that counter rollover
> > is not a significant problem. In practice, I think this means that if an
> > inode is stamped at least once, and the counter rolls over, future
> > (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> > trigger rcu syncs. I think this would require repeated
> > eviction/reinstantiation cycles within a small window to be noticeable,
> > so I'm not sure how likely this is to occur. We could be more defensive
> > by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> > at recycle time, unconditionally refresh at destroy time (using
> > get_state_synchronize_rcu() for non-inactivation), etc.
> > 
> > Otherwise testing is ongoing, but this version at least survives an
> > fstests regression run.
> > 
> > Brian
> > 
> > [1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/
> > 
> >  fs/xfs/xfs_icache.c | 11 +++++++++++
> >  fs/xfs/xfs_inode.h  |  3 ++-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index d019c98eb839..4931daa45ca4 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -349,6 +349,16 @@ xfs_iget_recycle(
> >  	spin_unlock(&ip->i_flags_lock);
> >  	rcu_read_unlock();
> >  
> > +	/*
> > +	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
> > +	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
> > +	 * inode until a grace period has elapsed from the time the previous
> > +	 * version of the inode was destroyed. In most cases a grace period has
> > +	 * already elapsed if the inode was (deferred) inactivated, but
> > +	 * synchronize here as a last resort to guarantee correctness.
> > +	 */
> > +	cond_synchronize_rcu(ip->i_destroy_gp);
> > +
> >  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> >  	error = xfs_reinit_inode(mp, inode);
> >  	if (error) {
> > @@ -2019,6 +2029,7 @@ xfs_inodegc_queue(
> >  	trace_xfs_inode_set_need_inactive(ip);
> >  	spin_lock(&ip->i_flags_lock);
> >  	ip->i_flags |= XFS_NEED_INACTIVE;
> > +	ip->i_destroy_gp = start_poll_synchronize_rcu();
> 
> Hmm.  The description says that we only need the rcu synchronization
> when we're freeing an inode after its link count drops to zero, because
> that's the vector for (say) the VFS inode ops actually changing due to
> free/inactivate/reallocate/recycle while someone else is doing a lookup.
> 

Right..

> I'm a bit puzzled why this unconditionally starts an rcu grace period,
> instead of done only if i_nlink==0; and why we call cond_synchronize_rcu
> above unconditionally instead of checking for i_mode==0 (or whatever
> state the cached inode is left in after it's freed)?
> 

Just an attempt to start simple and/or make any performance
test/problems more blatant. I probably could have tagged this RFC. My
primary goal with this patch was to establish whether the general
approach is sane/viable/acceptable or we need to move in another
direction.

That aside, I think it's reasonable to have explicit logic around the
unlinked case if we want to keep it restricted to that, though I would
probably implement that as a conditional i_destroy_gp assignment and let
the consumer context key off whether that field is set rather than
attempt to infer unlinked logic (and then I guess reset it back to zero
so it doesn't leak across reincarnation). That also probably facilitates
a meaningful tracepoint to track the cases that do end up syncing, which
helps with your earlier question around batching, so I'll look into
those changes once I get through broader testing

Brian

> --D
> 
> >  	spin_unlock(&ip->i_flags_lock);
> >  
> >  	gc = get_cpu_ptr(mp->m_inodegc);
> > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > index c447bf04205a..2153e3edbb86 100644
> > --- a/fs/xfs/xfs_inode.h
> > +++ b/fs/xfs/xfs_inode.h
> > @@ -40,8 +40,9 @@ typedef struct xfs_inode {
> >  	/* Transaction and locking information. */
> >  	struct xfs_inode_log_item *i_itemp;	/* logging information */
> >  	mrlock_t		i_lock;		/* inode lock */
> > -	atomic_t		i_pincount;	/* inode pin count */
> >  	struct llist_node	i_gclist;	/* deferred inactivation list */
> > +	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
> > +	atomic_t		i_pincount;	/* inode pin count */
> >  
> >  	/*
> >  	 * Bitsets of inode metadata that have been checked and/or are sick.
> > -- 
> > 2.31.1
> > 
>
Paul E. McKenney Jan. 22, 2022, 5:30 a.m. UTC | #3
On Fri, Jan 21, 2022 at 01:33:46PM -0500, Brian Foster wrote:
> On Fri, Jan 21, 2022 at 09:26:03AM -0800, Darrick J. Wong wrote:
> > On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > > The XFS inode allocation algorithm aggressively reuses recently
> > > freed inodes. This is historical behavior that has been in place for
> > > quite some time, since XFS was imported to mainline Linux. Once the
> > > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > > became slightly incompatible because the inode recycle path doesn't
> > > isolate concurrent access to the inode from the VFS.
> > > 
> > > This has recently manifested as problems in the VFS when XFS happens
> > > to change the type or properties of a recently unlinked inode while
> > > still involved in an RCU lookup. For example, if the VFS refers to a
> > > previous incarnation of a symlink inode, obtains the ->get_link()
> > > callback from inode_operations, and the latter happens to change to
> > > a non-symlink type via a recycle event, the ->get_link() callback
> > > pointer is reset to NULL and the lookup results in a crash.
> > 
> > Hmm, so I guess what you're saying is that if the memory buffer
> > allocation in ->get_link is slow enough, some other thread can free the
> > inode, drop it, reallocate it, and reinstantiate it (not as a symlink
> > this time) all before ->get_link's memory allocation call returns, after
> > which Bad Things Happen(tm)?
> > 
> > Can the lookup thread end up with the wrong inode->i_ops too?
> > 
> 
> We really don't need to even get into the XFS symlink code to reason
> about the fundamental form of this issue. Consider that an RCU walk
> starts, locates a symlink inode, meanwhile XFS recycles that inode into
> something completely different, then the VFS loads and calls
> ->get_link() (which is now NULL) on said inode and explodes. So the
> presumption is that the VFS uses RCU protection to rely on some form of
> stability of the inode (i.e., that the inode memory isn't freed,
> callback vectors don't change, etc.).
> 
> Validity of the symlink content is a variant of that class of problem,
> likely already addressed by the recent inline symlink change, but that
> doesn't address the broader issue.
> 
> > > To avoid this class of problem, isolate in-core inodes for recycling
> > > with an RCU grace period. This is the same level of protection the
> > > VFS expects for inactivated inodes that are never reused, and so
> > > guarantees no further concurrent access before the type or
> > > properties of the inode change. We don't want an unconditional
> > > synchronize_rcu() event here because that would result in a
> > > significant performance impact to mixed inode allocation workloads.
> > > 
> > > Fortunately, we can take advantage of the recently added deferred
> > > inactivation mechanism to mitigate the need for an RCU wait in most
> > > cases. Deferred inactivation queues and batches the on-disk freeing
> > > of recently destroyed inodes, and so significantly increases the
> > > likelihood that a grace period has elapsed by the time an inode is
> > > freed and observable by the allocation code as a reuse candidate.
> > > Capture the current RCU grace period cookie at inode destroy time
> > > and refer to it at allocation time to conditionally wait for an RCU
> > > grace period if one hadn't expired in the meantime.  Since only
> > > unlinked inodes are recycle candidates and unlinked inodes always
> > > require inactivation,
> > 
> > Any inode can become a recycle candidate (i.e. RECLAIMABLE but otherwise
> > idle) but I think your point here is that unlinked inodes that become
> > recycling candidates can cause lookup threads to trip over symlinks, and
> > that's why we need to assign RCU state and poll on it, right?
> > 
> 
> Good point. When I wrote the commit log I was thinking of recycled
> inodes as "reincarnated" inodes, so that wording could probably be
> improved. But yes, the code is written minimally/simply so I was trying
> to document that it's unlinked -> freed -> reallocated inodes that we
> really care about here.
> 
> WRT to symlinks, I was trying to use that as an example and not
> necessarily as the general reason for the patch. I.e., the general
> reason is that the VFS uses rcu protection for inode stability (just as
> for the inode free path), and the symlink thing is just an example of
> how things can go wrong in the current implementation without it.
> 
> > (That wasn't a challenge, I'm just making sure I understand this
> > correctly.)
> > 
> > > we only need to poll and assign RCU state in
> > > the inactivation codepath. Slightly adjust struct xfs_inode to fit
> > > the new field into padding holes that conveniently preexist in the
> > > same cacheline as the deferred inactivation list.
> > > 
> > > Finally, note that the ideal long term solution here is to
> > > rearchitect bits of XFS' internal inode lifecycle management such
> > > that this additional stall point is not required, but this requires
> > > more thought, time and work to address. This approach restores
> > > functional correctness in the meantime.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > Hi all,
> > > 
> > > Here's the RCU fixup patch for inode reuse that I've been playing with,
> > > re: the vfs patch discussion [1]. I've put it in pretty much the most
> > > basic form, but I think there are a couple aspects worth thinking about:
> > > 
> > > 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> > > get_state_synchronize_rcu()). The former is a bit more active than the
> > > latter in that it triggers the start of a grace period, when necessary.
> > > This currently invokes per inode, which is the ideal frequency in
> > > theory, but could be reduced, associated with the xfs_inogegc thresholds
> > > in some manner, etc., if there is good reason to do that.
> > 
> > If you rm -rf $path, do each of the inodes get a separate rcu state, or
> > do they share?
> 
> My previous experiments on a teardown grace period had me thinking
> batching would occur, but I don't recall which RCU call I was using at
> the time so I'd probably have to throw a tracepoint in there to dump
> some of the grace period values and double check to be sure. (If this is
> not the case, that might be a good reason to tweak things as discussed
> above).

An RCU grace period typically takes some milliseconds to complete, so a
great many inodes would end up being tagged for the same grace period.
For example, if "rm -rf" could delete one file per microsecond, the
first few thousand files would be tagged with one grace period,
the next few thousand with the next grace period, and so on.

In the unlikely event that RCU was totally idle when the "rm -rf"
started, the very first file might get its own grace period, but
they would batch in the thousands thereafter.

On start_poll_synchronize_rcu() vs. get_state_synchronize_rcu(), if
there is always other RCU update activity, get_state_synchronize_rcu()
is just fine.  So if XFS does a call_rcu() or synchronize_rcu() every
so often, all you need here is get_state_synchronize_rcu()().

Another approach is to do a start_poll_synchronize_rcu() every 1,000
events, and use get_state_synchronize_rcu() otherwise.  And there are
a lot of possible variations on that theme.

But why not just try always doing start_poll_synchronize_rcu() and
only bother with get_state_synchronize_rcu() if that turns out to
be too slow?

> > > 2. The rcu cookie lifecycle. This variant updates it on inactivation
> > > queue and nowhere else because the RCU docs imply that counter rollover
> > > is not a significant problem. In practice, I think this means that if an
> > > inode is stamped at least once, and the counter rolls over, future
> > > (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> > > trigger rcu syncs. I think this would require repeated
> > > eviction/reinstantiation cycles within a small window to be noticeable,
> > > so I'm not sure how likely this is to occur. We could be more defensive
> > > by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> > > at recycle time, unconditionally refresh at destroy time (using
> > > get_state_synchronize_rcu() for non-inactivation), etc.

Even on a 32-bit system that is running RCU grace periods as fast as they
will go, it will take about 12 days to overflow that counter.  But if
you have an inode sitting on the list for that long, yes, you could
see unnecessary synchronous grace-period waits.

Would it help if there was an API that gave you a special cookie value
that cond_synchronize_rcu() and friends recognized as "already expired"?
That way if poll_state_synchronize_rcu() says that original cookie
has expired, you could replace that cookie value with one that would
stay expired.  Maybe a get_expired_synchronize_rcu() or some such?

							Thanx, Paul

> > > Otherwise testing is ongoing, but this version at least survives an
> > > fstests regression run.
> > > 
> > > Brian
> > > 
> > > [1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/
> > > 
> > >  fs/xfs/xfs_icache.c | 11 +++++++++++
> > >  fs/xfs/xfs_inode.h  |  3 ++-
> > >  2 files changed, 13 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index d019c98eb839..4931daa45ca4 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -349,6 +349,16 @@ xfs_iget_recycle(
> > >  	spin_unlock(&ip->i_flags_lock);
> > >  	rcu_read_unlock();
> > >  
> > > +	/*
> > > +	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
> > > +	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
> > > +	 * inode until a grace period has elapsed from the time the previous
> > > +	 * version of the inode was destroyed. In most cases a grace period has
> > > +	 * already elapsed if the inode was (deferred) inactivated, but
> > > +	 * synchronize here as a last resort to guarantee correctness.
> > > +	 */
> > > +	cond_synchronize_rcu(ip->i_destroy_gp);
> > > +
> > >  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> > >  	error = xfs_reinit_inode(mp, inode);
> > >  	if (error) {
> > > @@ -2019,6 +2029,7 @@ xfs_inodegc_queue(
> > >  	trace_xfs_inode_set_need_inactive(ip);
> > >  	spin_lock(&ip->i_flags_lock);
> > >  	ip->i_flags |= XFS_NEED_INACTIVE;
> > > +	ip->i_destroy_gp = start_poll_synchronize_rcu();
> > 
> > Hmm.  The description says that we only need the rcu synchronization
> > when we're freeing an inode after its link count drops to zero, because
> > that's the vector for (say) the VFS inode ops actually changing due to
> > free/inactivate/reallocate/recycle while someone else is doing a lookup.
> > 
> 
> Right..
> 
> > I'm a bit puzzled why this unconditionally starts an rcu grace period,
> > instead of done only if i_nlink==0; and why we call cond_synchronize_rcu
> > above unconditionally instead of checking for i_mode==0 (or whatever
> > state the cached inode is left in after it's freed)?
> > 
> 
> Just an attempt to start simple and/or make any performance
> test/problems more blatant. I probably could have tagged this RFC. My
> primary goal with this patch was to establish whether the general
> approach is sane/viable/acceptable or we need to move in another
> direction.
> 
> That aside, I think it's reasonable to have explicit logic around the
> unlinked case if we want to keep it restricted to that, though I would
> probably implement that as a conditional i_destroy_gp assignment and let
> the consumer context key off whether that field is set rather than
> attempt to infer unlinked logic (and then I guess reset it back to zero
> so it doesn't leak across reincarnation). That also probably facilitates
> a meaningful tracepoint to track the cases that do end up syncing, which
> helps with your earlier question around batching, so I'll look into
> those changes once I get through broader testing
> 
> Brian
> 
> > --D
> > 
> > >  	spin_unlock(&ip->i_flags_lock);
> > >  
> > >  	gc = get_cpu_ptr(mp->m_inodegc);
> > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > index c447bf04205a..2153e3edbb86 100644
> > > --- a/fs/xfs/xfs_inode.h
> > > +++ b/fs/xfs/xfs_inode.h
> > > @@ -40,8 +40,9 @@ typedef struct xfs_inode {
> > >  	/* Transaction and locking information. */
> > >  	struct xfs_inode_log_item *i_itemp;	/* logging information */
> > >  	mrlock_t		i_lock;		/* inode lock */
> > > -	atomic_t		i_pincount;	/* inode pin count */
> > >  	struct llist_node	i_gclist;	/* deferred inactivation list */
> > > +	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
> > > +	atomic_t		i_pincount;	/* inode pin count */
> > >  
> > >  	/*
> > >  	 * Bitsets of inode metadata that have been checked and/or are sick.
> > > -- 
> > > 2.31.1
> > > 
> > 
>
Paul E. McKenney Jan. 22, 2022, 4:55 p.m. UTC | #4
On Fri, Jan 21, 2022 at 09:30:19PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 21, 2022 at 01:33:46PM -0500, Brian Foster wrote:

[ . . . ]

> > My previous experiments on a teardown grace period had me thinking
> > batching would occur, but I don't recall which RCU call I was using at
> > the time so I'd probably have to throw a tracepoint in there to dump
> > some of the grace period values and double check to be sure. (If this is
> > not the case, that might be a good reason to tweak things as discussed
> > above).
> 
> An RCU grace period typically takes some milliseconds to complete, so a
> great many inodes would end up being tagged for the same grace period.
> For example, if "rm -rf" could delete one file per microsecond, the
> first few thousand files would be tagged with one grace period,
> the next few thousand with the next grace period, and so on.
> 
> In the unlikely event that RCU was totally idle when the "rm -rf"
> started, the very first file might get its own grace period, but
> they would batch in the thousands thereafter.
> 
> On start_poll_synchronize_rcu() vs. get_state_synchronize_rcu(), if
> there is always other RCU update activity, get_state_synchronize_rcu()
> is just fine.  So if XFS does a call_rcu() or synchronize_rcu() every
> so often, all you need here is get_state_synchronize_rcu()().
> 
> Another approach is to do a start_poll_synchronize_rcu() every 1,000
> events, and use get_state_synchronize_rcu() otherwise.  And there are
> a lot of possible variations on that theme.
> 
> But why not just try always doing start_poll_synchronize_rcu() and
> only bother with get_state_synchronize_rcu() if that turns out to
> be too slow?

Plus there are a few optimizations I could apply that would speed up
get_state_synchronize_rcu(), for example, reducing lock contention.
But I would of course have to see a need before increasing complexity.

							Thanx, Paul
Dave Chinner Jan. 23, 2022, 10:43 p.m. UTC | #5
On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> The XFS inode allocation algorithm aggressively reuses recently
> freed inodes. This is historical behavior that has been in place for
> quite some time, since XFS was imported to mainline Linux. Once the
> VFS adopted RCUwalk path lookups (also some time ago), this behavior
> became slightly incompatible because the inode recycle path doesn't
> isolate concurrent access to the inode from the VFS.
> 
> This has recently manifested as problems in the VFS when XFS happens
> to change the type or properties of a recently unlinked inode while
> still involved in an RCU lookup. For example, if the VFS refers to a
> previous incarnation of a symlink inode, obtains the ->get_link()
> callback from inode_operations, and the latter happens to change to
> a non-symlink type via a recycle event, the ->get_link() callback
> pointer is reset to NULL and the lookup results in a crash.
> 
> To avoid this class of problem, isolate in-core inodes for recycling
> with an RCU grace period. This is the same level of protection the
> VFS expects for inactivated inodes that are never reused, and so
> guarantees no further concurrent access before the type or
> properties of the inode change. We don't want an unconditional
> synchronize_rcu() event here because that would result in a
> significant performance impact to mixed inode allocation workloads.
> 
> Fortunately, we can take advantage of the recently added deferred
> inactivation mechanism to mitigate the need for an RCU wait in most
> cases. Deferred inactivation queues and batches the on-disk freeing
> of recently destroyed inodes, and so significantly increases the
> likelihood that a grace period has elapsed by the time an inode is
> freed and observable by the allocation code as a reuse candidate.
> Capture the current RCU grace period cookie at inode destroy time
> and refer to it at allocation time to conditionally wait for an RCU
> grace period if one hadn't expired in the meantime.  Since only
> unlinked inodes are recycle candidates and unlinked inodes always
> require inactivation, we only need to poll and assign RCU state in
> the inactivation codepath.

I think this assertion is incorrect.

Recycling can occur on any inode that has been evicted from the VFS
cache. i.e. while the inode is sitting in XFS_IRECLAIMABLE state
waiting for the background inodegc to run (every ~5s by default) a
->lookup from the VFS can occur and we find that same inode sitting
there in XFS_IRECLAIMABLE state. This lookup then hits the recycle
path.

In this case, even though we re-instantiate the inode into the same
identity, it goes through a transient state where the inode has it's
identity returned to the default initial "just allocated" VFS state
and this transient state can be visible from RCU lookups within the
RCU grace period the inode was evicted from. This means the RCU
lookup could see the inode with i_ops having been reset to
&empty_ops, which means any method called on the inode at this time
(e.g. ->get_link) will hit a NULL pointer dereference.

This requires multiple concurrent lookups on the same inode that
just got evicted, some which the RCU pathwalk finds the old stale
dentry/inode pair, and others that don't find that old pair. This is
much harder to trip over but, IIRC, we used to see this quite a lot
with NFS server workloads when multiple operations on a single inode
could come in from multiple clients and be processed in parallel by
knfsd threads. This was quite a hot path before the NFS server had an
open-file cache added to it, and it probably still is if the NFS
server OFC is not large enough for the working set of files being
accessed...

Hence we have to ensure that RCU lookups can't find an evicted inode
through anything other than xfs_iget() while we are re-instantiating
the VFS inode state in xfs_iget_recycle().  Hence the RCU state
sampling needs to be done unconditionally for all inodes going
through ->destroy_inode so we can ensure grace periods expire for
all inodes being recycled, not just those that required
inactivation...

Cheers,

Dave.
Brian Foster Jan. 24, 2022, 3:02 p.m. UTC | #6
On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> The XFS inode allocation algorithm aggressively reuses recently
> freed inodes. This is historical behavior that has been in place for
> quite some time, since XFS was imported to mainline Linux. Once the
> VFS adopted RCUwalk path lookups (also some time ago), this behavior
> became slightly incompatible because the inode recycle path doesn't
> isolate concurrent access to the inode from the VFS.
> 
> This has recently manifested as problems in the VFS when XFS happens
> to change the type or properties of a recently unlinked inode while
> still involved in an RCU lookup. For example, if the VFS refers to a
> previous incarnation of a symlink inode, obtains the ->get_link()
> callback from inode_operations, and the latter happens to change to
> a non-symlink type via a recycle event, the ->get_link() callback
> pointer is reset to NULL and the lookup results in a crash.
> 
> To avoid this class of problem, isolate in-core inodes for recycling
> with an RCU grace period. This is the same level of protection the
> VFS expects for inactivated inodes that are never reused, and so
> guarantees no further concurrent access before the type or
> properties of the inode change. We don't want an unconditional
> synchronize_rcu() event here because that would result in a
> significant performance impact to mixed inode allocation workloads.
> 
> Fortunately, we can take advantage of the recently added deferred
> inactivation mechanism to mitigate the need for an RCU wait in most
> cases. Deferred inactivation queues and batches the on-disk freeing
> of recently destroyed inodes, and so significantly increases the
> likelihood that a grace period has elapsed by the time an inode is
> freed and observable by the allocation code as a reuse candidate.
> Capture the current RCU grace period cookie at inode destroy time
> and refer to it at allocation time to conditionally wait for an RCU
> grace period if one hadn't expired in the meantime.  Since only
> unlinked inodes are recycle candidates and unlinked inodes always
> require inactivation, we only need to poll and assign RCU state in
> the inactivation codepath. Slightly adjust struct xfs_inode to fit
> the new field into padding holes that conveniently preexist in the
> same cacheline as the deferred inactivation list.
> 
> Finally, note that the ideal long term solution here is to
> rearchitect bits of XFS' internal inode lifecycle management such
> that this additional stall point is not required, but this requires
> more thought, time and work to address. This approach restores
> functional correctness in the meantime.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Hi all,
> 
> Here's the RCU fixup patch for inode reuse that I've been playing with,
> re: the vfs patch discussion [1]. I've put it in pretty much the most
> basic form, but I think there are a couple aspects worth thinking about:
> 
> 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> get_state_synchronize_rcu()). The former is a bit more active than the
> latter in that it triggers the start of a grace period, when necessary.
> This currently invokes per inode, which is the ideal frequency in
> theory, but could be reduced, associated with the xfs_inogegc thresholds
> in some manner, etc., if there is good reason to do that.
> 
> 2. The rcu cookie lifecycle. This variant updates it on inactivation
> queue and nowhere else because the RCU docs imply that counter rollover
> is not a significant problem. In practice, I think this means that if an
> inode is stamped at least once, and the counter rolls over, future
> (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> trigger rcu syncs. I think this would require repeated
> eviction/reinstantiation cycles within a small window to be noticeable,
> so I'm not sure how likely this is to occur. We could be more defensive
> by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> at recycle time, unconditionally refresh at destroy time (using
> get_state_synchronize_rcu() for non-inactivation), etc.
> 
> Otherwise testing is ongoing, but this version at least survives an
> fstests regression run.
> 

FYI, I modified my repeated alloc/free test to do some batching and form
it into something more able to measure the potential side effect / cost
of the grace period sync. The test is a single threaded, file alloc/free
loop using a variable per iteration batch size. The test runs for ~60s
and reports how many total files were allocated/freed in that period
with the specified batch size. Note that this particular test ran
without any background workload. Results are as follows:

	files		baseline	test

	1		38480		38437
	4		126055		111080
	8		218299		134469
	16		306619		141968
	32		397909		152267
	64		418603		200875
	128		469077		289365
	256		684117		566016
	512		931328		878933
	1024		1126741		1118891

The first column shows the batch size of the test run while the second
and third show results (averaged across three test runs) for the
baseline (5.16.0-rc5) and test kernels. This basically shows that as the
inactivation queue more efficiently batches removals, the number of
stalls on the allocation side increase accordingly and thus slow the
task down. This becomes significant by around 8 files per alloc/free
iteration and seems to recover at around 512 files per iteration.
Outside of those values, the additional overhead appears to be mostly
masked.

I'm not sure how realistic this sort of symmetric/predictable workload
is in the wild, but this is more designed to show potential impact of
the change. The delay cost can be shifted to the remove side to some
degree if we wanted to go that route. E.g., a quick experiment to add an
rcu sync in the inactivation path right before the inode is freed allows
this test to behave much more in line with baseline up through about the
256 file mark, after which point results start to fall off as I suspect
we start to measure stalls in the remove side.

That's just a test of a quick hack, however. Since there is no real
urgency to inactivate an unlinked inode (it has no potential users until
it's freed), I suspect that result can be further optimized to absorb
the cost of an rcu delay by deferring the steps that make the inode
available for reallocation in the first place. In theory if that can be
made completely asynchronous, then there is no real latency cost at all
because nothing can use the inode until it's ultimately free on disk.
However in reality we must have thresholds and whatnot to ensure the
outstanding queue cannot grow out of control. My previous experiments
suggest that an RCU delay on the inactivation side is measureable via a
simple 'rm -rf' with the current thresholds, but can be mitigated if the
pipeline/thresholds are tuned up a bit to accomodate the added delay.
This has more complexity and tradeoffs, but IMO, this is something we
should be thinking about at least as a next step to something like this
patch.

Brian

> Brian
> 
> [1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/
> 
>  fs/xfs/xfs_icache.c | 11 +++++++++++
>  fs/xfs/xfs_inode.h  |  3 ++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index d019c98eb839..4931daa45ca4 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -349,6 +349,16 @@ xfs_iget_recycle(
>  	spin_unlock(&ip->i_flags_lock);
>  	rcu_read_unlock();
>  
> +	/*
> +	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
> +	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
> +	 * inode until a grace period has elapsed from the time the previous
> +	 * version of the inode was destroyed. In most cases a grace period has
> +	 * already elapsed if the inode was (deferred) inactivated, but
> +	 * synchronize here as a last resort to guarantee correctness.
> +	 */
> +	cond_synchronize_rcu(ip->i_destroy_gp);
> +
>  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
>  	error = xfs_reinit_inode(mp, inode);
>  	if (error) {
> @@ -2019,6 +2029,7 @@ xfs_inodegc_queue(
>  	trace_xfs_inode_set_need_inactive(ip);
>  	spin_lock(&ip->i_flags_lock);
>  	ip->i_flags |= XFS_NEED_INACTIVE;
> +	ip->i_destroy_gp = start_poll_synchronize_rcu();
>  	spin_unlock(&ip->i_flags_lock);
>  
>  	gc = get_cpu_ptr(mp->m_inodegc);
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index c447bf04205a..2153e3edbb86 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -40,8 +40,9 @@ typedef struct xfs_inode {
>  	/* Transaction and locking information. */
>  	struct xfs_inode_log_item *i_itemp;	/* logging information */
>  	mrlock_t		i_lock;		/* inode lock */
> -	atomic_t		i_pincount;	/* inode pin count */
>  	struct llist_node	i_gclist;	/* deferred inactivation list */
> +	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
> +	atomic_t		i_pincount;	/* inode pin count */
>  
>  	/*
>  	 * Bitsets of inode metadata that have been checked and/or are sick.
> -- 
> 2.31.1
>
Brian Foster Jan. 24, 2022, 3:06 p.m. UTC | #7
On Mon, Jan 24, 2022 at 09:43:46AM +1100, Dave Chinner wrote:
> On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > The XFS inode allocation algorithm aggressively reuses recently
> > freed inodes. This is historical behavior that has been in place for
> > quite some time, since XFS was imported to mainline Linux. Once the
> > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > became slightly incompatible because the inode recycle path doesn't
> > isolate concurrent access to the inode from the VFS.
> > 
> > This has recently manifested as problems in the VFS when XFS happens
> > to change the type or properties of a recently unlinked inode while
> > still involved in an RCU lookup. For example, if the VFS refers to a
> > previous incarnation of a symlink inode, obtains the ->get_link()
> > callback from inode_operations, and the latter happens to change to
> > a non-symlink type via a recycle event, the ->get_link() callback
> > pointer is reset to NULL and the lookup results in a crash.
> > 
> > To avoid this class of problem, isolate in-core inodes for recycling
> > with an RCU grace period. This is the same level of protection the
> > VFS expects for inactivated inodes that are never reused, and so
> > guarantees no further concurrent access before the type or
> > properties of the inode change. We don't want an unconditional
> > synchronize_rcu() event here because that would result in a
> > significant performance impact to mixed inode allocation workloads.
> > 
> > Fortunately, we can take advantage of the recently added deferred
> > inactivation mechanism to mitigate the need for an RCU wait in most
> > cases. Deferred inactivation queues and batches the on-disk freeing
> > of recently destroyed inodes, and so significantly increases the
> > likelihood that a grace period has elapsed by the time an inode is
> > freed and observable by the allocation code as a reuse candidate.
> > Capture the current RCU grace period cookie at inode destroy time
> > and refer to it at allocation time to conditionally wait for an RCU
> > grace period if one hadn't expired in the meantime.  Since only
> > unlinked inodes are recycle candidates and unlinked inodes always
> > require inactivation, we only need to poll and assign RCU state in
> > the inactivation codepath.
> 
> I think this assertion is incorrect.
> 
> Recycling can occur on any inode that has been evicted from the VFS
> cache. i.e. while the inode is sitting in XFS_IRECLAIMABLE state
> waiting for the background inodegc to run (every ~5s by default) a
> ->lookup from the VFS can occur and we find that same inode sitting
> there in XFS_IRECLAIMABLE state. This lookup then hits the recycle
> path.
> 

See my reply to Darrick wrt to the poor wording. I'm aware of the
eviction -> recycle case, just didn't think we needed to deal with it
here.

> In this case, even though we re-instantiate the inode into the same
> identity, it goes through a transient state where the inode has it's
> identity returned to the default initial "just allocated" VFS state
> and this transient state can be visible from RCU lookups within the
> RCU grace period the inode was evicted from. This means the RCU
> lookup could see the inode with i_ops having been reset to
> &empty_ops, which means any method called on the inode at this time
> (e.g. ->get_link) will hit a NULL pointer dereference.
> 

Hmm, good point.

> This requires multiple concurrent lookups on the same inode that
> just got evicted, some which the RCU pathwalk finds the old stale
> dentry/inode pair, and others that don't find that old pair. This is
> much harder to trip over but, IIRC, we used to see this quite a lot
> with NFS server workloads when multiple operations on a single inode
> could come in from multiple clients and be processed in parallel by
> knfsd threads. This was quite a hot path before the NFS server had an
> open-file cache added to it, and it probably still is if the NFS
> server OFC is not large enough for the working set of files being
> accessed...
> 
> Hence we have to ensure that RCU lookups can't find an evicted inode
> through anything other than xfs_iget() while we are re-instantiating
> the VFS inode state in xfs_iget_recycle().  Hence the RCU state
> sampling needs to be done unconditionally for all inodes going
> through ->destroy_inode so we can ensure grace periods expire for
> all inodes being recycled, not just those that required
> inactivation...
> 

Yeah, that makes sense. So this means we don't want to filter to
unlinked inodes, but OTOH Paul's feedback suggests the RCU calls should
be fairly efficient on a per-inode basis. On top of that, the
non-unlinked eviction case doesn't have such a direct impact on a mixed
workload the way the unlinked case does (i.e. inactivation populating a
free inode record for the next inode allocation to discover), so this is
probably less significant of a change.

Personally, my general takeaway from the just posted test results is
that we really should be thinking about how to shift the allocation path
cost away into the inactivation side, even if not done from the start.
This changes things a bit because we know we need an rcu sync in the
iget path for the (non-unlinnked) eviction case regardless, so perhaps
the right approach is to get the basic functional fix in place to start,
then revisit potential optimizations in the inactivation path for the
unlinked inode case. IOW, a conditional, asynchronous rcu delay in the
inactivation path (only) for unlinked inodes doesn't remove the need for
an iget rcu sync in general, but it would still improve inode allocation
performance if we ensure those inodes aren't reallocatable until a grace
period has elapsed. We just have to implement it in a way that doesn't
unreasonably impact sustained removal performance. Thoughts?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Brian Foster Jan. 24, 2022, 3:12 p.m. UTC | #8
On Fri, Jan 21, 2022 at 09:30:19PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 21, 2022 at 01:33:46PM -0500, Brian Foster wrote:
> > On Fri, Jan 21, 2022 at 09:26:03AM -0800, Darrick J. Wong wrote:
> > > On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > > > The XFS inode allocation algorithm aggressively reuses recently
> > > > freed inodes. This is historical behavior that has been in place for
> > > > quite some time, since XFS was imported to mainline Linux. Once the
> > > > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > > > became slightly incompatible because the inode recycle path doesn't
> > > > isolate concurrent access to the inode from the VFS.
> > > > 
> > > > This has recently manifested as problems in the VFS when XFS happens
> > > > to change the type or properties of a recently unlinked inode while
> > > > still involved in an RCU lookup. For example, if the VFS refers to a
> > > > previous incarnation of a symlink inode, obtains the ->get_link()
> > > > callback from inode_operations, and the latter happens to change to
> > > > a non-symlink type via a recycle event, the ->get_link() callback
> > > > pointer is reset to NULL and the lookup results in a crash.
> > > 
> > > Hmm, so I guess what you're saying is that if the memory buffer
> > > allocation in ->get_link is slow enough, some other thread can free the
> > > inode, drop it, reallocate it, and reinstantiate it (not as a symlink
> > > this time) all before ->get_link's memory allocation call returns, after
> > > which Bad Things Happen(tm)?
> > > 
> > > Can the lookup thread end up with the wrong inode->i_ops too?
> > > 
> > 
> > We really don't need to even get into the XFS symlink code to reason
> > about the fundamental form of this issue. Consider that an RCU walk
> > starts, locates a symlink inode, meanwhile XFS recycles that inode into
> > something completely different, then the VFS loads and calls
> > ->get_link() (which is now NULL) on said inode and explodes. So the
> > presumption is that the VFS uses RCU protection to rely on some form of
> > stability of the inode (i.e., that the inode memory isn't freed,
> > callback vectors don't change, etc.).
> > 
> > Validity of the symlink content is a variant of that class of problem,
> > likely already addressed by the recent inline symlink change, but that
> > doesn't address the broader issue.
> > 
> > > > To avoid this class of problem, isolate in-core inodes for recycling
> > > > with an RCU grace period. This is the same level of protection the
> > > > VFS expects for inactivated inodes that are never reused, and so
> > > > guarantees no further concurrent access before the type or
> > > > properties of the inode change. We don't want an unconditional
> > > > synchronize_rcu() event here because that would result in a
> > > > significant performance impact to mixed inode allocation workloads.
> > > > 
> > > > Fortunately, we can take advantage of the recently added deferred
> > > > inactivation mechanism to mitigate the need for an RCU wait in most
> > > > cases. Deferred inactivation queues and batches the on-disk freeing
> > > > of recently destroyed inodes, and so significantly increases the
> > > > likelihood that a grace period has elapsed by the time an inode is
> > > > freed and observable by the allocation code as a reuse candidate.
> > > > Capture the current RCU grace period cookie at inode destroy time
> > > > and refer to it at allocation time to conditionally wait for an RCU
> > > > grace period if one hadn't expired in the meantime.  Since only
> > > > unlinked inodes are recycle candidates and unlinked inodes always
> > > > require inactivation,
> > > 
> > > Any inode can become a recycle candidate (i.e. RECLAIMABLE but otherwise
> > > idle) but I think your point here is that unlinked inodes that become
> > > recycling candidates can cause lookup threads to trip over symlinks, and
> > > that's why we need to assign RCU state and poll on it, right?
> > > 
> > 
> > Good point. When I wrote the commit log I was thinking of recycled
> > inodes as "reincarnated" inodes, so that wording could probably be
> > improved. But yes, the code is written minimally/simply so I was trying
> > to document that it's unlinked -> freed -> reallocated inodes that we
> > really care about here.
> > 
> > WRT to symlinks, I was trying to use that as an example and not
> > necessarily as the general reason for the patch. I.e., the general
> > reason is that the VFS uses rcu protection for inode stability (just as
> > for the inode free path), and the symlink thing is just an example of
> > how things can go wrong in the current implementation without it.
> > 
> > > (That wasn't a challenge, I'm just making sure I understand this
> > > correctly.)
> > > 
> > > > we only need to poll and assign RCU state in
> > > > the inactivation codepath. Slightly adjust struct xfs_inode to fit
> > > > the new field into padding holes that conveniently preexist in the
> > > > same cacheline as the deferred inactivation list.
> > > > 
> > > > Finally, note that the ideal long term solution here is to
> > > > rearchitect bits of XFS' internal inode lifecycle management such
> > > > that this additional stall point is not required, but this requires
> > > > more thought, time and work to address. This approach restores
> > > > functional correctness in the meantime.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > 
> > > > Hi all,
> > > > 
> > > > Here's the RCU fixup patch for inode reuse that I've been playing with,
> > > > re: the vfs patch discussion [1]. I've put it in pretty much the most
> > > > basic form, but I think there are a couple aspects worth thinking about:
> > > > 
> > > > 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> > > > get_state_synchronize_rcu()). The former is a bit more active than the
> > > > latter in that it triggers the start of a grace period, when necessary.
> > > > This currently invokes per inode, which is the ideal frequency in
> > > > theory, but could be reduced, associated with the xfs_inogegc thresholds
> > > > in some manner, etc., if there is good reason to do that.
> > > 
> > > If you rm -rf $path, do each of the inodes get a separate rcu state, or
> > > do they share?
> > 
> > My previous experiments on a teardown grace period had me thinking
> > batching would occur, but I don't recall which RCU call I was using at
> > the time so I'd probably have to throw a tracepoint in there to dump
> > some of the grace period values and double check to be sure. (If this is
> > not the case, that might be a good reason to tweak things as discussed
> > above).
> 
> An RCU grace period typically takes some milliseconds to complete, so a
> great many inodes would end up being tagged for the same grace period.
> For example, if "rm -rf" could delete one file per microsecond, the
> first few thousand files would be tagged with one grace period,
> the next few thousand with the next grace period, and so on.
> 
> In the unlikely event that RCU was totally idle when the "rm -rf"
> started, the very first file might get its own grace period, but
> they would batch in the thousands thereafter.
> 

Great, thanks for the info.

> On start_poll_synchronize_rcu() vs. get_state_synchronize_rcu(), if
> there is always other RCU update activity, get_state_synchronize_rcu()
> is just fine.  So if XFS does a call_rcu() or synchronize_rcu() every
> so often, all you need here is get_state_synchronize_rcu()().
> 
> Another approach is to do a start_poll_synchronize_rcu() every 1,000
> events, and use get_state_synchronize_rcu() otherwise.  And there are
> a lot of possible variations on that theme.
> 
> But why not just try always doing start_poll_synchronize_rcu() and
> only bother with get_state_synchronize_rcu() if that turns out to
> be too slow?
> 

Ack, that makes sense to me. We use call_rcu() to free inode memory and
obviously will have a sync in the lookup path after this patch, but that
is a consequence of the polling we add at the same time. I'm not sure
that's enough activity on our own so I'd probably prefer to keep things
simple, use the start_poll_*() variant from the start, and then consider
further start/get filtering like you describe above if it ever becomes a
problem.

> > > > 2. The rcu cookie lifecycle. This variant updates it on inactivation
> > > > queue and nowhere else because the RCU docs imply that counter rollover
> > > > is not a significant problem. In practice, I think this means that if an
> > > > inode is stamped at least once, and the counter rolls over, future
> > > > (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> > > > trigger rcu syncs. I think this would require repeated
> > > > eviction/reinstantiation cycles within a small window to be noticeable,
> > > > so I'm not sure how likely this is to occur. We could be more defensive
> > > > by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> > > > at recycle time, unconditionally refresh at destroy time (using
> > > > get_state_synchronize_rcu() for non-inactivation), etc.
> 
> Even on a 32-bit system that is running RCU grace periods as fast as they
> will go, it will take about 12 days to overflow that counter.  But if
> you have an inode sitting on the list for that long, yes, you could
> see unnecessary synchronous grace-period waits.
> 
> Would it help if there was an API that gave you a special cookie value
> that cond_synchronize_rcu() and friends recognized as "already expired"?
> That way if poll_state_synchronize_rcu() says that original cookie
> has expired, you could replace that cookie value with one that would
> stay expired.  Maybe a get_expired_synchronize_rcu() or some such?
> 

Hmm.. so I think this would be helpful if we were to stamp the inode
conditionally (i.e. unlinked inodes only) on eviction because then we
wouldn't have to worry about clearing the cookie if said inode happens
to be reallocated and then run through one or more eviction -> recycle
sequences after a rollover of the grace period counter. With that sort
of scheme, the inode could be sitting in cache for who knows how long
with a counter that was conditionally synced against many days (or
weeks?) prior, from whenever it was initially reallocated.

However, as Dave points out that we probably want to poll RCU state on
every inode eviction, I suspect that means this is less of an issue. An
inode must be evicted for it to become a recycle candidate, and so if we
update the inode unconditionally on every eviction, then I think the
recycle code should always see the most recent cookie value and we don't
have to worry much about clearing it.

I think it's technically possible for an inode to sit in an inactivation
queue for that sort of time period, but that would probably require the
filesystem go idle or drop to low enough activity that a spurious rcu
sync here or there is probably not a big deal. So all in all, I suspect
if we already had such a special cookie variant of the API that was
otherwise functionally equivalent, I'd probably use it to cover that
potential case, but it's not clear to me atm that this use case
necessarily warrants introduction of such an API...

Brian

> 							Thanx, Paul
> 
> > > > Otherwise testing is ongoing, but this version at least survives an
> > > > fstests regression run.
> > > > 
> > > > Brian
> > > > 
> > > > [1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/
> > > > 
> > > >  fs/xfs/xfs_icache.c | 11 +++++++++++
> > > >  fs/xfs/xfs_inode.h  |  3 ++-
> > > >  2 files changed, 13 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > index d019c98eb839..4931daa45ca4 100644
> > > > --- a/fs/xfs/xfs_icache.c
> > > > +++ b/fs/xfs/xfs_icache.c
> > > > @@ -349,6 +349,16 @@ xfs_iget_recycle(
> > > >  	spin_unlock(&ip->i_flags_lock);
> > > >  	rcu_read_unlock();
> > > >  
> > > > +	/*
> > > > +	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
> > > > +	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
> > > > +	 * inode until a grace period has elapsed from the time the previous
> > > > +	 * version of the inode was destroyed. In most cases a grace period has
> > > > +	 * already elapsed if the inode was (deferred) inactivated, but
> > > > +	 * synchronize here as a last resort to guarantee correctness.
> > > > +	 */
> > > > +	cond_synchronize_rcu(ip->i_destroy_gp);
> > > > +
> > > >  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> > > >  	error = xfs_reinit_inode(mp, inode);
> > > >  	if (error) {
> > > > @@ -2019,6 +2029,7 @@ xfs_inodegc_queue(
> > > >  	trace_xfs_inode_set_need_inactive(ip);
> > > >  	spin_lock(&ip->i_flags_lock);
> > > >  	ip->i_flags |= XFS_NEED_INACTIVE;
> > > > +	ip->i_destroy_gp = start_poll_synchronize_rcu();
> > > 
> > > Hmm.  The description says that we only need the rcu synchronization
> > > when we're freeing an inode after its link count drops to zero, because
> > > that's the vector for (say) the VFS inode ops actually changing due to
> > > free/inactivate/reallocate/recycle while someone else is doing a lookup.
> > > 
> > 
> > Right..
> > 
> > > I'm a bit puzzled why this unconditionally starts an rcu grace period,
> > > instead of done only if i_nlink==0; and why we call cond_synchronize_rcu
> > > above unconditionally instead of checking for i_mode==0 (or whatever
> > > state the cached inode is left in after it's freed)?
> > > 
> > 
> > Just an attempt to start simple and/or make any performance
> > test/problems more blatant. I probably could have tagged this RFC. My
> > primary goal with this patch was to establish whether the general
> > approach is sane/viable/acceptable or we need to move in another
> > direction.
> > 
> > That aside, I think it's reasonable to have explicit logic around the
> > unlinked case if we want to keep it restricted to that, though I would
> > probably implement that as a conditional i_destroy_gp assignment and let
> > the consumer context key off whether that field is set rather than
> > attempt to infer unlinked logic (and then I guess reset it back to zero
> > so it doesn't leak across reincarnation). That also probably facilitates
> > a meaningful tracepoint to track the cases that do end up syncing, which
> > helps with your earlier question around batching, so I'll look into
> > those changes once I get through broader testing
> > 
> > Brian
> > 
> > > --D
> > > 
> > > >  	spin_unlock(&ip->i_flags_lock);
> > > >  
> > > >  	gc = get_cpu_ptr(mp->m_inodegc);
> > > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > > index c447bf04205a..2153e3edbb86 100644
> > > > --- a/fs/xfs/xfs_inode.h
> > > > +++ b/fs/xfs/xfs_inode.h
> > > > @@ -40,8 +40,9 @@ typedef struct xfs_inode {
> > > >  	/* Transaction and locking information. */
> > > >  	struct xfs_inode_log_item *i_itemp;	/* logging information */
> > > >  	mrlock_t		i_lock;		/* inode lock */
> > > > -	atomic_t		i_pincount;	/* inode pin count */
> > > >  	struct llist_node	i_gclist;	/* deferred inactivation list */
> > > > +	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
> > > > +	atomic_t		i_pincount;	/* inode pin count */
> > > >  
> > > >  	/*
> > > >  	 * Bitsets of inode metadata that have been checked and/or are sick.
> > > > -- 
> > > > 2.31.1
> > > > 
> > > 
> > 
>
Paul E. McKenney Jan. 24, 2022, 4:40 p.m. UTC | #9
On Mon, Jan 24, 2022 at 10:12:45AM -0500, Brian Foster wrote:
> On Fri, Jan 21, 2022 at 09:30:19PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 21, 2022 at 01:33:46PM -0500, Brian Foster wrote:
> > > On Fri, Jan 21, 2022 at 09:26:03AM -0800, Darrick J. Wong wrote:
> > > > On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > > > > The XFS inode allocation algorithm aggressively reuses recently
> > > > > freed inodes. This is historical behavior that has been in place for
> > > > > quite some time, since XFS was imported to mainline Linux. Once the
> > > > > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > > > > became slightly incompatible because the inode recycle path doesn't
> > > > > isolate concurrent access to the inode from the VFS.
> > > > > 
> > > > > This has recently manifested as problems in the VFS when XFS happens
> > > > > to change the type or properties of a recently unlinked inode while
> > > > > still involved in an RCU lookup. For example, if the VFS refers to a
> > > > > previous incarnation of a symlink inode, obtains the ->get_link()
> > > > > callback from inode_operations, and the latter happens to change to
> > > > > a non-symlink type via a recycle event, the ->get_link() callback
> > > > > pointer is reset to NULL and the lookup results in a crash.
> > > > 
> > > > Hmm, so I guess what you're saying is that if the memory buffer
> > > > allocation in ->get_link is slow enough, some other thread can free the
> > > > inode, drop it, reallocate it, and reinstantiate it (not as a symlink
> > > > this time) all before ->get_link's memory allocation call returns, after
> > > > which Bad Things Happen(tm)?
> > > > 
> > > > Can the lookup thread end up with the wrong inode->i_ops too?
> > > > 
> > > 
> > > We really don't need to even get into the XFS symlink code to reason
> > > about the fundamental form of this issue. Consider that an RCU walk
> > > starts, locates a symlink inode, meanwhile XFS recycles that inode into
> > > something completely different, then the VFS loads and calls
> > > ->get_link() (which is now NULL) on said inode and explodes. So the
> > > presumption is that the VFS uses RCU protection to rely on some form of
> > > stability of the inode (i.e., that the inode memory isn't freed,
> > > callback vectors don't change, etc.).
> > > 
> > > Validity of the symlink content is a variant of that class of problem,
> > > likely already addressed by the recent inline symlink change, but that
> > > doesn't address the broader issue.
> > > 
> > > > > To avoid this class of problem, isolate in-core inodes for recycling
> > > > > with an RCU grace period. This is the same level of protection the
> > > > > VFS expects for inactivated inodes that are never reused, and so
> > > > > guarantees no further concurrent access before the type or
> > > > > properties of the inode change. We don't want an unconditional
> > > > > synchronize_rcu() event here because that would result in a
> > > > > significant performance impact to mixed inode allocation workloads.
> > > > > 
> > > > > Fortunately, we can take advantage of the recently added deferred
> > > > > inactivation mechanism to mitigate the need for an RCU wait in most
> > > > > cases. Deferred inactivation queues and batches the on-disk freeing
> > > > > of recently destroyed inodes, and so significantly increases the
> > > > > likelihood that a grace period has elapsed by the time an inode is
> > > > > freed and observable by the allocation code as a reuse candidate.
> > > > > Capture the current RCU grace period cookie at inode destroy time
> > > > > and refer to it at allocation time to conditionally wait for an RCU
> > > > > grace period if one hadn't expired in the meantime.  Since only
> > > > > unlinked inodes are recycle candidates and unlinked inodes always
> > > > > require inactivation,
> > > > 
> > > > Any inode can become a recycle candidate (i.e. RECLAIMABLE but otherwise
> > > > idle) but I think your point here is that unlinked inodes that become
> > > > recycling candidates can cause lookup threads to trip over symlinks, and
> > > > that's why we need to assign RCU state and poll on it, right?
> > > > 
> > > 
> > > Good point. When I wrote the commit log I was thinking of recycled
> > > inodes as "reincarnated" inodes, so that wording could probably be
> > > improved. But yes, the code is written minimally/simply so I was trying
> > > to document that it's unlinked -> freed -> reallocated inodes that we
> > > really care about here.
> > > 
> > > WRT to symlinks, I was trying to use that as an example and not
> > > necessarily as the general reason for the patch. I.e., the general
> > > reason is that the VFS uses rcu protection for inode stability (just as
> > > for the inode free path), and the symlink thing is just an example of
> > > how things can go wrong in the current implementation without it.
> > > 
> > > > (That wasn't a challenge, I'm just making sure I understand this
> > > > correctly.)
> > > > 
> > > > > we only need to poll and assign RCU state in
> > > > > the inactivation codepath. Slightly adjust struct xfs_inode to fit
> > > > > the new field into padding holes that conveniently preexist in the
> > > > > same cacheline as the deferred inactivation list.
> > > > > 
> > > > > Finally, note that the ideal long term solution here is to
> > > > > rearchitect bits of XFS' internal inode lifecycle management such
> > > > > that this additional stall point is not required, but this requires
> > > > > more thought, time and work to address. This approach restores
> > > > > functional correctness in the meantime.
> > > > > 
> > > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > > ---
> > > > > 
> > > > > Hi all,
> > > > > 
> > > > > Here's the RCU fixup patch for inode reuse that I've been playing with,
> > > > > re: the vfs patch discussion [1]. I've put it in pretty much the most
> > > > > basic form, but I think there are a couple aspects worth thinking about:
> > > > > 
> > > > > 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> > > > > get_state_synchronize_rcu()). The former is a bit more active than the
> > > > > latter in that it triggers the start of a grace period, when necessary.
> > > > > This currently invokes per inode, which is the ideal frequency in
> > > > > theory, but could be reduced, associated with the xfs_inogegc thresholds
> > > > > in some manner, etc., if there is good reason to do that.
> > > > 
> > > > If you rm -rf $path, do each of the inodes get a separate rcu state, or
> > > > do they share?
> > > 
> > > My previous experiments on a teardown grace period had me thinking
> > > batching would occur, but I don't recall which RCU call I was using at
> > > the time so I'd probably have to throw a tracepoint in there to dump
> > > some of the grace period values and double check to be sure. (If this is
> > > not the case, that might be a good reason to tweak things as discussed
> > > above).
> > 
> > An RCU grace period typically takes some milliseconds to complete, so a
> > great many inodes would end up being tagged for the same grace period.
> > For example, if "rm -rf" could delete one file per microsecond, the
> > first few thousand files would be tagged with one grace period,
> > the next few thousand with the next grace period, and so on.
> > 
> > In the unlikely event that RCU was totally idle when the "rm -rf"
> > started, the very first file might get its own grace period, but
> > they would batch in the thousands thereafter.
> > 
> 
> Great, thanks for the info.
> 
> > On start_poll_synchronize_rcu() vs. get_state_synchronize_rcu(), if
> > there is always other RCU update activity, get_state_synchronize_rcu()
> > is just fine.  So if XFS does a call_rcu() or synchronize_rcu() every
> > so often, all you need here is get_state_synchronize_rcu()().
> > 
> > Another approach is to do a start_poll_synchronize_rcu() every 1,000
> > events, and use get_state_synchronize_rcu() otherwise.  And there are
> > a lot of possible variations on that theme.
> > 
> > But why not just try always doing start_poll_synchronize_rcu() and
> > only bother with get_state_synchronize_rcu() if that turns out to
> > be too slow?
> > 
> 
> Ack, that makes sense to me. We use call_rcu() to free inode memory and
> obviously will have a sync in the lookup path after this patch, but that
> is a consequence of the polling we add at the same time. I'm not sure
> that's enough activity on our own so I'd probably prefer to keep things
> simple, use the start_poll_*() variant from the start, and then consider
> further start/get filtering like you describe above if it ever becomes a
> problem.
> 
> > > > > 2. The rcu cookie lifecycle. This variant updates it on inactivation
> > > > > queue and nowhere else because the RCU docs imply that counter rollover
> > > > > is not a significant problem. In practice, I think this means that if an
> > > > > inode is stamped at least once, and the counter rolls over, future
> > > > > (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> > > > > trigger rcu syncs. I think this would require repeated
> > > > > eviction/reinstantiation cycles within a small window to be noticeable,
> > > > > so I'm not sure how likely this is to occur. We could be more defensive
> > > > > by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> > > > > at recycle time, unconditionally refresh at destroy time (using
> > > > > get_state_synchronize_rcu() for non-inactivation), etc.
> > 
> > Even on a 32-bit system that is running RCU grace periods as fast as they
> > will go, it will take about 12 days to overflow that counter.  But if
> > you have an inode sitting on the list for that long, yes, you could
> > see unnecessary synchronous grace-period waits.
> > 
> > Would it help if there was an API that gave you a special cookie value
> > that cond_synchronize_rcu() and friends recognized as "already expired"?
> > That way if poll_state_synchronize_rcu() says that original cookie
> > has expired, you could replace that cookie value with one that would
> > stay expired.  Maybe a get_expired_synchronize_rcu() or some such?
> > 
> 
> Hmm.. so I think this would be helpful if we were to stamp the inode
> conditionally (i.e. unlinked inodes only) on eviction because then we
> wouldn't have to worry about clearing the cookie if said inode happens
> to be reallocated and then run through one or more eviction -> recycle
> sequences after a rollover of the grace period counter. With that sort
> of scheme, the inode could be sitting in cache for who knows how long
> with a counter that was conditionally synced against many days (or
> weeks?) prior, from whenever it was initially reallocated.
> 
> However, as Dave points out that we probably want to poll RCU state on
> every inode eviction, I suspect that means this is less of an issue. An
> inode must be evicted for it to become a recycle candidate, and so if we
> update the inode unconditionally on every eviction, then I think the
> recycle code should always see the most recent cookie value and we don't
> have to worry much about clearing it.
> 
> I think it's technically possible for an inode to sit in an inactivation
> queue for that sort of time period, but that would probably require the
> filesystem go idle or drop to low enough activity that a spurious rcu
> sync here or there is probably not a big deal. So all in all, I suspect
> if we already had such a special cookie variant of the API that was
> otherwise functionally equivalent, I'd probably use it to cover that
> potential case, but it's not clear to me atm that this use case
> necessarily warrants introduction of such an API...

If you need it, it happens to be easy to provide.  If you don't need it,
I am of course happy to avoid adding another RCU API member.  ;-)

							Thanx, Paul

> > > > > Otherwise testing is ongoing, but this version at least survives an
> > > > > fstests regression run.
> > > > > 
> > > > > Brian
> > > > > 
> > > > > [1] https://lore.kernel.org/linux-fsdevel/164180589176.86426.501271559065590169.stgit@mickey.themaw.net/
> > > > > 
> > > > >  fs/xfs/xfs_icache.c | 11 +++++++++++
> > > > >  fs/xfs/xfs_inode.h  |  3 ++-
> > > > >  2 files changed, 13 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > > index d019c98eb839..4931daa45ca4 100644
> > > > > --- a/fs/xfs/xfs_icache.c
> > > > > +++ b/fs/xfs/xfs_icache.c
> > > > > @@ -349,6 +349,16 @@ xfs_iget_recycle(
> > > > >  	spin_unlock(&ip->i_flags_lock);
> > > > >  	rcu_read_unlock();
> > > > >  
> > > > > +	/*
> > > > > +	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
> > > > > +	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
> > > > > +	 * inode until a grace period has elapsed from the time the previous
> > > > > +	 * version of the inode was destroyed. In most cases a grace period has
> > > > > +	 * already elapsed if the inode was (deferred) inactivated, but
> > > > > +	 * synchronize here as a last resort to guarantee correctness.
> > > > > +	 */
> > > > > +	cond_synchronize_rcu(ip->i_destroy_gp);
> > > > > +
> > > > >  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> > > > >  	error = xfs_reinit_inode(mp, inode);
> > > > >  	if (error) {
> > > > > @@ -2019,6 +2029,7 @@ xfs_inodegc_queue(
> > > > >  	trace_xfs_inode_set_need_inactive(ip);
> > > > >  	spin_lock(&ip->i_flags_lock);
> > > > >  	ip->i_flags |= XFS_NEED_INACTIVE;
> > > > > +	ip->i_destroy_gp = start_poll_synchronize_rcu();
> > > > 
> > > > Hmm.  The description says that we only need the rcu synchronization
> > > > when we're freeing an inode after its link count drops to zero, because
> > > > that's the vector for (say) the VFS inode ops actually changing due to
> > > > free/inactivate/reallocate/recycle while someone else is doing a lookup.
> > > > 
> > > 
> > > Right..
> > > 
> > > > I'm a bit puzzled why this unconditionally starts an rcu grace period,
> > > > instead of done only if i_nlink==0; and why we call cond_synchronize_rcu
> > > > above unconditionally instead of checking for i_mode==0 (or whatever
> > > > state the cached inode is left in after it's freed)?
> > > > 
> > > 
> > > Just an attempt to start simple and/or make any performance
> > > test/problems more blatant. I probably could have tagged this RFC. My
> > > primary goal with this patch was to establish whether the general
> > > approach is sane/viable/acceptable or we need to move in another
> > > direction.
> > > 
> > > That aside, I think it's reasonable to have explicit logic around the
> > > unlinked case if we want to keep it restricted to that, though I would
> > > probably implement that as a conditional i_destroy_gp assignment and let
> > > the consumer context key off whether that field is set rather than
> > > attempt to infer unlinked logic (and then I guess reset it back to zero
> > > so it doesn't leak across reincarnation). That also probably facilitates
> > > a meaningful tracepoint to track the cases that do end up syncing, which
> > > helps with your earlier question around batching, so I'll look into
> > > those changes once I get through broader testing
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > >  	spin_unlock(&ip->i_flags_lock);
> > > > >  
> > > > >  	gc = get_cpu_ptr(mp->m_inodegc);
> > > > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > > > index c447bf04205a..2153e3edbb86 100644
> > > > > --- a/fs/xfs/xfs_inode.h
> > > > > +++ b/fs/xfs/xfs_inode.h
> > > > > @@ -40,8 +40,9 @@ typedef struct xfs_inode {
> > > > >  	/* Transaction and locking information. */
> > > > >  	struct xfs_inode_log_item *i_itemp;	/* logging information */
> > > > >  	mrlock_t		i_lock;		/* inode lock */
> > > > > -	atomic_t		i_pincount;	/* inode pin count */
> > > > >  	struct llist_node	i_gclist;	/* deferred inactivation list */
> > > > > +	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
> > > > > +	atomic_t		i_pincount;	/* inode pin count */
> > > > >  
> > > > >  	/*
> > > > >  	 * Bitsets of inode metadata that have been checked and/or are sick.
> > > > > -- 
> > > > > 2.31.1
> > > > > 
> > > > 
> > > 
> > 
>
Dave Chinner Jan. 24, 2022, 10:08 p.m. UTC | #10
On Mon, Jan 24, 2022 at 10:02:27AM -0500, Brian Foster wrote:
> On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > The XFS inode allocation algorithm aggressively reuses recently
> > freed inodes. This is historical behavior that has been in place for
> > quite some time, since XFS was imported to mainline Linux. Once the
> > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > became slightly incompatible because the inode recycle path doesn't
> > isolate concurrent access to the inode from the VFS.
> > 
> > This has recently manifested as problems in the VFS when XFS happens
> > to change the type or properties of a recently unlinked inode while
> > still involved in an RCU lookup. For example, if the VFS refers to a
> > previous incarnation of a symlink inode, obtains the ->get_link()
> > callback from inode_operations, and the latter happens to change to
> > a non-symlink type via a recycle event, the ->get_link() callback
> > pointer is reset to NULL and the lookup results in a crash.
> > 
> > To avoid this class of problem, isolate in-core inodes for recycling
> > with an RCU grace period. This is the same level of protection the
> > VFS expects for inactivated inodes that are never reused, and so
> > guarantees no further concurrent access before the type or
> > properties of the inode change. We don't want an unconditional
> > synchronize_rcu() event here because that would result in a
> > significant performance impact to mixed inode allocation workloads.
> > 
> > Fortunately, we can take advantage of the recently added deferred
> > inactivation mechanism to mitigate the need for an RCU wait in most
> > cases. Deferred inactivation queues and batches the on-disk freeing
> > of recently destroyed inodes, and so significantly increases the
> > likelihood that a grace period has elapsed by the time an inode is
> > freed and observable by the allocation code as a reuse candidate.
> > Capture the current RCU grace period cookie at inode destroy time
> > and refer to it at allocation time to conditionally wait for an RCU
> > grace period if one hadn't expired in the meantime.  Since only
> > unlinked inodes are recycle candidates and unlinked inodes always
> > require inactivation, we only need to poll and assign RCU state in
> > the inactivation codepath. Slightly adjust struct xfs_inode to fit
> > the new field into padding holes that conveniently preexist in the
> > same cacheline as the deferred inactivation list.
> > 
> > Finally, note that the ideal long term solution here is to
> > rearchitect bits of XFS' internal inode lifecycle management such
> > that this additional stall point is not required, but this requires
> > more thought, time and work to address. This approach restores
> > functional correctness in the meantime.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > Hi all,
> > 
> > Here's the RCU fixup patch for inode reuse that I've been playing with,
> > re: the vfs patch discussion [1]. I've put it in pretty much the most
> > basic form, but I think there are a couple aspects worth thinking about:
> > 
> > 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> > get_state_synchronize_rcu()). The former is a bit more active than the
> > latter in that it triggers the start of a grace period, when necessary.
> > This currently invokes per inode, which is the ideal frequency in
> > theory, but could be reduced, associated with the xfs_inogegc thresholds
> > in some manner, etc., if there is good reason to do that.
> > 
> > 2. The rcu cookie lifecycle. This variant updates it on inactivation
> > queue and nowhere else because the RCU docs imply that counter rollover
> > is not a significant problem. In practice, I think this means that if an
> > inode is stamped at least once, and the counter rolls over, future
> > (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> > trigger rcu syncs. I think this would require repeated
> > eviction/reinstantiation cycles within a small window to be noticeable,
> > so I'm not sure how likely this is to occur. We could be more defensive
> > by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> > at recycle time, unconditionally refresh at destroy time (using
> > get_state_synchronize_rcu() for non-inactivation), etc.
> > 
> > Otherwise testing is ongoing, but this version at least survives an
> > fstests regression run.
> > 
> 
> FYI, I modified my repeated alloc/free test to do some batching and form
> it into something more able to measure the potential side effect / cost
> of the grace period sync. The test is a single threaded, file alloc/free
> loop using a variable per iteration batch size. The test runs for ~60s
> and reports how many total files were allocated/freed in that period
> with the specified batch size. Note that this particular test ran
> without any background workload. Results are as follows:
> 
> 	files		baseline	test
> 
> 	1		38480		38437
> 	4		126055		111080
> 	8		218299		134469
> 	16		306619		141968
> 	32		397909		152267
> 	64		418603		200875
> 	128		469077		289365
> 	256		684117		566016
> 	512		931328		878933
> 	1024		1126741		1118891

Can you post the test code, because 38,000 alloc/unlinks in 60s is
extremely slow for a single tight open-unlink-close loop. I'd be
expecting at least ~10,000 alloc/unlink iterations per second, not
650/second.

A quick test here with "batch size == 1" main loop on a vanilla
5.17-rc1 kernel:

        for (i = 0; i < iters; i++) {
                int fd = open(file, O_CREAT|O_RDWR, 0777);

                if (fd < 0) {
                        perror("open");
                        exit(1);
                }
                unlink(file);
                close(fd);
        }


$ time ./open-unlink 10000 /mnt/scratch/blah

real    0m0.962s
user    0m0.022s
sys     0m0.775s

Shows pretty much 10,000 alloc/unlinks a second without any specific
batching on my slow machine. And my "fast" machine (3yr old 2.1GHz
Xeons)

$ time sudo ./open-unlink 40000 /mnt/scratch/foo

real    0m0.958s
user    0m0.033s
sys     0m0.770s

Runs single loop iterations at 40,000 alloc/unlink iterations per
second.

So I'm either not understanding the test you are running and/or the
kernel/patches that you are comparing here. Is the "baseline" just a
vanilla, unmodified upstream kernel, or something else?

> That's just a test of a quick hack, however. Since there is no real
> urgency to inactivate an unlinked inode (it has no potential users until
> it's freed),

On the contrary, there is extreme urgency to inactivate inodes
quickly.

Darrick made the original assumption that we could delay
inactivation indefinitely and so he allowed really deep queues of up
to 64k deferred inactivations. But with queues this deep, we could
never get that background inactivation code to perform anywhere near
the original synchronous background inactivation code. e.g. I
measured 60-70% performance degradataions on my scalability tests,
and nothing stood out in the profiles until I started looking at
CPU data cache misses.

What we found was that if we don't run the background inactivation
while the inodes are still hot in the CPU cache, the cost of bring
the inodes back into the CPU cache at a later time is extremely
expensive and cannot be avoided. That's where all the performance
was lost and so this is exactly what the current per-cpu background
inactivation implementation avoids. i.e. we have shallow queues,
early throttling and CPU affinity to ensure that the inodes are
processed before they are evicted from the CPU caches and ensure we
don't take a performance hit.

IOWs, the deferred inactivation queues are designed to minimise
inactivation delay, generally trying to delay inactivation for a
couple of milliseconds at most during typical fast-path
inactivations (i.e. an extent or two per inode needing to be freed,
plus maybe the inode itself). Such inactivations generally take
50-100us of CPU time each to process, and we try to keep the
inactivation batch size down to 32 inodes...

> I suspect that result can be further optimized to absorb
> the cost of an rcu delay by deferring the steps that make the inode
> available for reallocation in the first place.

A typical RCU grace period delay is longer than the latency we
require to keep the inodes hot in cache for efficient background
inactivation. We can't move the "we need to RCU delay inactivation"
overhead to the background inactivation code without taking a
global performance hit to the filesystem performance due to the CPU
cache thrashing it will introduce....

Cheers,

Dave.
Brian Foster Jan. 24, 2022, 11:29 p.m. UTC | #11
On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> On Mon, Jan 24, 2022 at 10:02:27AM -0500, Brian Foster wrote:
> > On Fri, Jan 21, 2022 at 09:24:54AM -0500, Brian Foster wrote:
> > > The XFS inode allocation algorithm aggressively reuses recently
> > > freed inodes. This is historical behavior that has been in place for
> > > quite some time, since XFS was imported to mainline Linux. Once the
> > > VFS adopted RCUwalk path lookups (also some time ago), this behavior
> > > became slightly incompatible because the inode recycle path doesn't
> > > isolate concurrent access to the inode from the VFS.
> > > 
> > > This has recently manifested as problems in the VFS when XFS happens
> > > to change the type or properties of a recently unlinked inode while
> > > still involved in an RCU lookup. For example, if the VFS refers to a
> > > previous incarnation of a symlink inode, obtains the ->get_link()
> > > callback from inode_operations, and the latter happens to change to
> > > a non-symlink type via a recycle event, the ->get_link() callback
> > > pointer is reset to NULL and the lookup results in a crash.
> > > 
> > > To avoid this class of problem, isolate in-core inodes for recycling
> > > with an RCU grace period. This is the same level of protection the
> > > VFS expects for inactivated inodes that are never reused, and so
> > > guarantees no further concurrent access before the type or
> > > properties of the inode change. We don't want an unconditional
> > > synchronize_rcu() event here because that would result in a
> > > significant performance impact to mixed inode allocation workloads.
> > > 
> > > Fortunately, we can take advantage of the recently added deferred
> > > inactivation mechanism to mitigate the need for an RCU wait in most
> > > cases. Deferred inactivation queues and batches the on-disk freeing
> > > of recently destroyed inodes, and so significantly increases the
> > > likelihood that a grace period has elapsed by the time an inode is
> > > freed and observable by the allocation code as a reuse candidate.
> > > Capture the current RCU grace period cookie at inode destroy time
> > > and refer to it at allocation time to conditionally wait for an RCU
> > > grace period if one hadn't expired in the meantime.  Since only
> > > unlinked inodes are recycle candidates and unlinked inodes always
> > > require inactivation, we only need to poll and assign RCU state in
> > > the inactivation codepath. Slightly adjust struct xfs_inode to fit
> > > the new field into padding holes that conveniently preexist in the
> > > same cacheline as the deferred inactivation list.
> > > 
> > > Finally, note that the ideal long term solution here is to
> > > rearchitect bits of XFS' internal inode lifecycle management such
> > > that this additional stall point is not required, but this requires
> > > more thought, time and work to address. This approach restores
> > > functional correctness in the meantime.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > Hi all,
> > > 
> > > Here's the RCU fixup patch for inode reuse that I've been playing with,
> > > re: the vfs patch discussion [1]. I've put it in pretty much the most
> > > basic form, but I think there are a couple aspects worth thinking about:
> > > 
> > > 1. Use and frequency of start_poll_synchronize_rcu() (vs.
> > > get_state_synchronize_rcu()). The former is a bit more active than the
> > > latter in that it triggers the start of a grace period, when necessary.
> > > This currently invokes per inode, which is the ideal frequency in
> > > theory, but could be reduced, associated with the xfs_inogegc thresholds
> > > in some manner, etc., if there is good reason to do that.
> > > 
> > > 2. The rcu cookie lifecycle. This variant updates it on inactivation
> > > queue and nowhere else because the RCU docs imply that counter rollover
> > > is not a significant problem. In practice, I think this means that if an
> > > inode is stamped at least once, and the counter rolls over, future
> > > (non-inactivation, non-unlinked) eviction -> repopulation cycles could
> > > trigger rcu syncs. I think this would require repeated
> > > eviction/reinstantiation cycles within a small window to be noticeable,
> > > so I'm not sure how likely this is to occur. We could be more defensive
> > > by resetting or refreshing the cookie. E.g., refresh (or reset to zero)
> > > at recycle time, unconditionally refresh at destroy time (using
> > > get_state_synchronize_rcu() for non-inactivation), etc.
> > > 
> > > Otherwise testing is ongoing, but this version at least survives an
> > > fstests regression run.
> > > 
> > 
> > FYI, I modified my repeated alloc/free test to do some batching and form
> > it into something more able to measure the potential side effect / cost
> > of the grace period sync. The test is a single threaded, file alloc/free
> > loop using a variable per iteration batch size. The test runs for ~60s
> > and reports how many total files were allocated/freed in that period
> > with the specified batch size. Note that this particular test ran
> > without any background workload. Results are as follows:
> > 
> > 	files		baseline	test
> > 
> > 	1		38480		38437
> > 	4		126055		111080
> > 	8		218299		134469
> > 	16		306619		141968
> > 	32		397909		152267
> > 	64		418603		200875
> > 	128		469077		289365
> > 	256		684117		566016
> > 	512		931328		878933
> > 	1024		1126741		1118891
> 
> Can you post the test code, because 38,000 alloc/unlinks in 60s is
> extremely slow for a single tight open-unlink-close loop. I'd be
> expecting at least ~10,000 alloc/unlink iterations per second, not
> 650/second.
> 

Hm, Ok. My test was just a bash script doing a 'touch <files>; rm
<files>' loop. I know there was application overhead because if I
tweaked the script to open an fd directly rather than use touch, the
single file performance jumped up a bit, but it seemed to wash away as I
increased the file count so I kept running it with larger sizes. This
seems off so I'll port it over to C code and see how much the numbers
change.

> A quick test here with "batch size == 1" main loop on a vanilla
> 5.17-rc1 kernel:
> 
>         for (i = 0; i < iters; i++) {
>                 int fd = open(file, O_CREAT|O_RDWR, 0777);
> 
>                 if (fd < 0) {
>                         perror("open");
>                         exit(1);
>                 }
>                 unlink(file);
>                 close(fd);
>         }
> 
> 
> $ time ./open-unlink 10000 /mnt/scratch/blah
> 
> real    0m0.962s
> user    0m0.022s
> sys     0m0.775s
> 
> Shows pretty much 10,000 alloc/unlinks a second without any specific
> batching on my slow machine. And my "fast" machine (3yr old 2.1GHz
> Xeons)
> 
> $ time sudo ./open-unlink 40000 /mnt/scratch/foo
> 
> real    0m0.958s
> user    0m0.033s
> sys     0m0.770s
> 
> Runs single loop iterations at 40,000 alloc/unlink iterations per
> second.
> 
> So I'm either not understanding the test you are running and/or the
> kernel/patches that you are comparing here. Is the "baseline" just a
> vanilla, unmodified upstream kernel, or something else?
> 

Yeah, the baseline was just the XFS for-next branch.

> > That's just a test of a quick hack, however. Since there is no real
> > urgency to inactivate an unlinked inode (it has no potential users until
> > it's freed),
> 
> On the contrary, there is extreme urgency to inactivate inodes
> quickly.
> 

Ok, I think we're talking about slightly different things. What I mean
above is that if a task removes a file and goes off doing unrelated
$work, that inode will just sit on the percpu queue indefinitely. That's
fine, as there's no functional need for us to process it immediately
unless we're around -ENOSPC thresholds or some such that demand reclaim
of the inode. It sounds like what you're talking about is specifically
the behavior/performance of sustained file removal (which is important
obviously), where apparently there is a notable degradation if the
queues become deep enough to push the inode batches out of CPU cache. So
that makes sense...

> Darrick made the original assumption that we could delay
> inactivation indefinitely and so he allowed really deep queues of up
> to 64k deferred inactivations. But with queues this deep, we could
> never get that background inactivation code to perform anywhere near
> the original synchronous background inactivation code. e.g. I
> measured 60-70% performance degradataions on my scalability tests,
> and nothing stood out in the profiles until I started looking at
> CPU data cache misses.
> 

... but could you elaborate on the scalability tests involved here so I
can get a better sense of it in practice and perhaps observe the impact
of changes in this path?

Brian

> What we found was that if we don't run the background inactivation
> while the inodes are still hot in the CPU cache, the cost of bring
> the inodes back into the CPU cache at a later time is extremely
> expensive and cannot be avoided. That's where all the performance
> was lost and so this is exactly what the current per-cpu background
> inactivation implementation avoids. i.e. we have shallow queues,
> early throttling and CPU affinity to ensure that the inodes are
> processed before they are evicted from the CPU caches and ensure we
> don't take a performance hit.
> 
> IOWs, the deferred inactivation queues are designed to minimise
> inactivation delay, generally trying to delay inactivation for a
> couple of milliseconds at most during typical fast-path
> inactivations (i.e. an extent or two per inode needing to be freed,
> plus maybe the inode itself). Such inactivations generally take
> 50-100us of CPU time each to process, and we try to keep the
> inactivation batch size down to 32 inodes...
> 
> > I suspect that result can be further optimized to absorb
> > the cost of an rcu delay by deferring the steps that make the inode
> > available for reallocation in the first place.
> 
> A typical RCU grace period delay is longer than the latency we
> require to keep the inodes hot in cache for efficient background
> inactivation. We can't move the "we need to RCU delay inactivation"
> overhead to the background inactivation code without taking a
> global performance hit to the filesystem performance due to the CPU
> cache thrashing it will introduce....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Dave Chinner Jan. 25, 2022, 12:31 a.m. UTC | #12
On Mon, Jan 24, 2022 at 06:29:18PM -0500, Brian Foster wrote:
> On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> > > FYI, I modified my repeated alloc/free test to do some batching and form
> > > it into something more able to measure the potential side effect / cost
> > > of the grace period sync. The test is a single threaded, file alloc/free
> > > loop using a variable per iteration batch size. The test runs for ~60s
> > > and reports how many total files were allocated/freed in that period
> > > with the specified batch size. Note that this particular test ran
> > > without any background workload. Results are as follows:
> > > 
> > > 	files		baseline	test
> > > 
> > > 	1		38480		38437
> > > 	4		126055		111080
> > > 	8		218299		134469
> > > 	16		306619		141968
> > > 	32		397909		152267
> > > 	64		418603		200875
> > > 	128		469077		289365
> > > 	256		684117		566016
> > > 	512		931328		878933
> > > 	1024		1126741		1118891
> > 
> > Can you post the test code, because 38,000 alloc/unlinks in 60s is
> > extremely slow for a single tight open-unlink-close loop. I'd be
> > expecting at least ~10,000 alloc/unlink iterations per second, not
> > 650/second.
> > 
> 
> Hm, Ok. My test was just a bash script doing a 'touch <files>; rm
> <files>' loop. I know there was application overhead because if I
> tweaked the script to open an fd directly rather than use touch, the
> single file performance jumped up a bit, but it seemed to wash away as I
> increased the file count so I kept running it with larger sizes. This
> seems off so I'll port it over to C code and see how much the numbers
> change.

Yeah, using touch/rm becomes fork/exec bound very quickly. You'll
find that using "echo > <file>" is much faster than "touch <file>"
because it runs a shell built-in operation without fork/exec
overhead to create the file. But you can't play tricks like that to
replace rm:

$ time for ((i=0;i<1000;i++)); do touch /mnt/scratch/foo; rm /mnt/scratch/foo ; done

real    0m2.653s
user    0m0.910s
sys     0m2.051s
$ time for ((i=0;i<1000;i++)); do echo > /mnt/scratch/foo; rm /mnt/scratch/foo ; done

real    0m1.260s
user    0m0.452s
sys     0m0.913s
$ time ./open-unlink 1000 /mnt/scratch/foo

real    0m0.037s
user    0m0.001s
sys     0m0.030s
$

Note the difference in system time between the three operations -
almost all the difference in system CPU time is the overhead of
fork/exec to run the touch/rm binaries, not do the filesystem
operations....

> > > That's just a test of a quick hack, however. Since there is no real
> > > urgency to inactivate an unlinked inode (it has no potential users until
> > > it's freed),
> > 
> > On the contrary, there is extreme urgency to inactivate inodes
> > quickly.
> > 
> 
> Ok, I think we're talking about slightly different things. What I mean
> above is that if a task removes a file and goes off doing unrelated
> $work, that inode will just sit on the percpu queue indefinitely. That's
> fine, as there's no functional need for us to process it immediately
> unless we're around -ENOSPC thresholds or some such that demand reclaim
> of the inode.

Yup, an occasional unlink sitting around for a while on an unlinked
list isn't going to cause a performance problem.  Indeed, such
workloads are more likely to benefit from the reduced unlink()
syscall overhead and won't even notice the increase in background
CPU overhead for inactivation of those occasional inodes.

> It sounds like what you're talking about is specifically
> the behavior/performance of sustained file removal (which is important
> obviously), where apparently there is a notable degradation if the
> queues become deep enough to push the inode batches out of CPU cache. So
> that makes sense...

Yup, sustained bulk throughput is where cache residency really
matters. And for unlink, sustained unlink workloads are quite
common; they often are something people wait for on the command line
or make up a performance critical component of a highly concurrent
workload so it's pretty important to get this part right.

> > Darrick made the original assumption that we could delay
> > inactivation indefinitely and so he allowed really deep queues of up
> > to 64k deferred inactivations. But with queues this deep, we could
> > never get that background inactivation code to perform anywhere near
> > the original synchronous background inactivation code. e.g. I
> > measured 60-70% performance degradataions on my scalability tests,
> > and nothing stood out in the profiles until I started looking at
> > CPU data cache misses.
> > 
> 
> ... but could you elaborate on the scalability tests involved here so I
> can get a better sense of it in practice and perhaps observe the impact
> of changes in this path?

The same conconrrent fsmark create/traverse/unlink workloads I've
been running for the past decade+ demonstrates it pretty simply. I
also saw regressions with dbench (both op latency and throughput) as
the clinet count (concurrency) increased, and with compilebench.  I
didn't look much further because all the common benchmarks I ran
showed perf degradations with arbitrary delays that went away with
the current code we have.  ISTR that parts of aim7/reaim scalability
workloads that the intel zero-day infrastructure runs are quite
sensitive to background inactivation delays as well because that's a
CPU bound workload and hence any reduction in cache residency
results in a reduction of the number of concurrent jobs that can be
run.

Cheers,

Dave.
kernel test robot Jan. 25, 2022, 8:16 a.m. UTC | #13
Greeting,

FYI, we noticed a -62.2% regression of aim7.jobs-per-min due to commit:


commit: a7f4e88080f3d50511400259cc613a666d297227 ("[PATCH] xfs: require an rcu grace period before inode recycle")
url: https://github.com/0day-ci/linux/commits/Brian-Foster/xfs-require-an-rcu-grace-period-before-inode-recycle/20220121-222536
base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/linux-xfs/20220121142454.1994916-1-bfoster@redhat.com

in testcase: aim7
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

	disk: 4BRD_12G
	md: RAID1
	fs: xfs
	test: disk_wrt
	load: 3000
	cpufreq_governor: performance
	ucode: 0x5003006

test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/4BRD_12G/xfs/x86_64-rhel-8.3/3000/RAID1/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/disk_wrt/aim7/0x5003006

commit: 
  6191cf3ad5 ("xfs: flush inodegc workqueue tasks before cancel")
  a7f4e88080 ("xfs: require an rcu grace period before inode recycle")

6191cf3ad59fda59 a7f4e88080f3d50511400259cc6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    443599           -62.2%     167677 ±  6%  aim7.jobs-per-min
     40.85          +164.6%     108.09 ±  6%  aim7.time.elapsed_time
     40.85          +164.6%     108.09 ±  6%  aim7.time.elapsed_time.max
 2.527e+09 ±  7%    +230.1%  8.344e+09 ± 10%  cpuidle..time
   6069098 ±  6%    +198.6%   18124131 ±  9%  cpuidle..usage
     68.58 ±  8%     +26.7%      86.91 ±  3%  iostat.cpu.idle
     30.77 ± 18%     -58.3%      12.83 ± 20%  iostat.cpu.system
     80.29 ±  2%     +82.3%     146.34 ±  6%  uptime.boot
      5466 ±  2%    +105.6%      11240 ±  8%  uptime.idle
     65862 ±  3%     +80.6%     118976 ±  5%  meminfo.AnonHugePages
     14084 ± 39%     -60.9%       5502 ± 41%  meminfo.Dirty
     15002 ± 37%     -56.2%       6573 ± 35%  meminfo.Inactive(file)
     67.80 ±  8%     +27.6%      86.50 ±  2%  vmstat.cpu.id
     15808 ±  5%     -94.6%     849.00 ± 38%  vmstat.io.bo
     27.20 ± 16%     -62.0%      10.33 ± 39%  vmstat.procs.r
     43806 ±  5%     -61.7%      16761        vmstat.system.cs
    186207            -5.2%     176590        vmstat.system.in
     67.25 ±  8%     +19.5       86.72 ±  3%  mpstat.cpu.all.idle%
      0.02 ± 55%      -0.0        0.00 ±209%  mpstat.cpu.all.iowait%
      1.03 ± 10%      +0.3        1.36 ±  2%  mpstat.cpu.all.irq%
      0.11 ±  7%      +0.0        0.13 ±  3%  mpstat.cpu.all.soft%
     30.98 ± 19%     -19.4       11.55 ± 23%  mpstat.cpu.all.sys%
      0.61 ±  3%      -0.4        0.25 ±  7%  mpstat.cpu.all.usr%
     59890 ±  5%     +91.1%     114450 ±  3%  numa-meminfo.node0.AnonHugePages
     79172 ± 22%     +19.1%      94295 ±  4%  numa-meminfo.node0.KReclaimable
     79172 ± 22%     +19.1%      94295 ±  4%  numa-meminfo.node0.SReclaimable
    260929 ± 21%     +16.3%     303472 ±  4%  numa-meminfo.node0.Slab
      6556 ± 35%     -59.2%       2677 ± 43%  numa-meminfo.node1.Dirty
      6650 ± 32%     -60.7%       2611 ± 44%  numa-meminfo.node1.Inactive(file)
     88809            +3.8%      92151        proc-vmstat.nr_anon_pages
     92000            +3.3%      95019        proc-vmstat.nr_inactive_anon
     60719            +2.7%      62331        proc-vmstat.nr_kernel_stack
     33987            +6.0%      36013        proc-vmstat.nr_slab_reclaimable
     73664            +2.5%      75505        proc-vmstat.nr_slab_unreclaimable
     92000            +3.3%      95019        proc-vmstat.nr_zone_inactive_anon
    452639           +43.9%     651425 ±  7%  proc-vmstat.pgfault
    701322 ±  4%     -86.2%      96665 ± 45%  proc-vmstat.pgpgout
     16001 ±  2%     +79.0%      28642 ±  5%  proc-vmstat.pgreuse
      1651            +3.9%       1716 ±  2%  proc-vmstat.unevictable_pgs_culled
    908.60 ± 17%     -60.6%     358.00 ± 21%  turbostat.Avg_MHz
     33.06 ± 17%     -18.9       14.13 ± 19%  turbostat.Busy%
      2753            -8.2%       2527 ±  2%  turbostat.Bzy_MHz
      0.99 ±153%      -0.9        0.08 ± 62%  turbostat.C1%
   4107068 ± 27%    +282.8%   15723747 ± 15%  turbostat.C1E
     41.79 ± 26%     +28.9       70.70 ± 18%  turbostat.C1E%
     65.82 ±  7%     +29.9%      85.50 ±  3%  turbostat.CPU%c1
   8238417 ±  2%    +138.2%   19624535 ±  6%  turbostat.IRQ
    174.45 ±  2%     -23.0%     134.26 ±  2%  turbostat.PkgWatt
     57.67            -7.8%      53.17        turbostat.RAMWatt
  14410648 ±  2%     +23.0%   17723721        numa-vmstat.node0.nr_dirtied
      1789 ± 38%     -55.7%     791.83 ± 47%  numa-vmstat.node0.nr_dirty
     19789 ± 22%     +19.1%      23574 ±  4%  numa-vmstat.node0.nr_slab_reclaimable
      4888 ± 22%     -65.7%       1676 ± 90%  numa-vmstat.node0.nr_written
      1779 ± 39%     -55.6%     791.00 ± 46%  numa-vmstat.node0.nr_zone_write_pending
  16042664 ±  3%     +21.7%   19525710 ±  2%  numa-vmstat.node0.numa_hit
  16006466 ±  3%     +21.7%   19486525 ±  2%  numa-vmstat.node0.numa_local
  14478628 ±  2%     +15.4%   16706134 ±  2%  numa-vmstat.node1.nr_dirtied
      1641 ± 36%     -61.9%     626.33 ± 37%  numa-vmstat.node1.nr_dirty
      1664 ± 33%     -63.1%     614.17 ± 38%  numa-vmstat.node1.nr_inactive_file
      1671 ± 32%     -63.4%     611.33 ± 39%  numa-vmstat.node1.nr_zone_inactive_file
      1654 ± 35%     -62.0%     629.00 ± 37%  numa-vmstat.node1.nr_zone_write_pending
  15322749 ±  3%     +13.7%   17428068 ±  2%  numa-vmstat.node1.numa_hit
  15280012 ±  3%     +13.7%   17371951 ±  2%  numa-vmstat.node1.numa_local
  8.05e+09           -61.1%  3.131e+09 ±  5%  perf-stat.i.branch-instructions
      1.09 ± 45%      -0.5        0.63 ±  3%  perf-stat.i.branch-miss-rate%
  38127328 ±  3%     -56.6%   16548357 ±  6%  perf-stat.i.branch-misses
  44350095 ± 17%     -61.7%   16990385 ± 20%  perf-stat.i.cache-misses
 1.541e+08 ± 15%     -59.8%   61896928 ± 19%  perf-stat.i.cache-references
     45484 ±  5%     -63.2%      16759        perf-stat.i.context-switches
 8.062e+10 ± 18%     -61.5%  3.103e+10 ± 21%  perf-stat.i.cpu-cycles
      2460 ± 21%     -71.8%     694.46 ± 14%  perf-stat.i.cpu-migrations
   1013046 ± 15%     -51.4%     491922 ± 36%  perf-stat.i.dTLB-load-misses
  1.16e+10           -61.2%  4.505e+09 ±  5%  perf-stat.i.dTLB-loads
    139405 ± 13%     -57.2%      59703 ± 30%  perf-stat.i.dTLB-store-misses
  5.54e+09           -60.3%  2.198e+09 ±  5%  perf-stat.i.dTLB-stores
  19147689 ± 10%     -59.8%    7688166 ± 11%  perf-stat.i.iTLB-load-misses
   3979444 ±  2%     -41.1%    2345331        perf-stat.i.iTLB-loads
 4.052e+10           -61.2%  1.574e+10 ±  5%  perf-stat.i.instructions
     54.62 ± 28%     -62.7%      20.38 ± 24%  perf-stat.i.major-faults
      0.92 ± 18%     -61.5%       0.35 ± 21%  perf-stat.i.metric.GHz
    307.44 ± 12%    +149.5%     767.15 ± 15%  perf-stat.i.metric.K/sec
    287.96           -61.2%     111.77 ±  5%  perf-stat.i.metric.M/sec
      7763 ±  3%     -38.3%       4792 ±  3%  perf-stat.i.minor-faults
   8023842 ± 16%     -63.0%    2972243 ± 20%  perf-stat.i.node-load-misses
   2359076 ±  3%     -61.2%     914463 ±  6%  perf-stat.i.node-loads
   2709777 ±  4%     -63.9%     979535 ±  9%  perf-stat.i.node-store-misses
   3915550           -63.9%    1412477 ±  5%  perf-stat.i.node-stores
      7818 ±  3%     -38.4%       4813 ±  3%  perf-stat.i.page-faults
      0.47 ±  3%      +0.1        0.53 ±  3%  perf-stat.overall.branch-miss-rate%
     82.68            -6.2       76.44 ±  2%  perf-stat.overall.iTLB-load-miss-rate%
 7.877e+09           -60.6%  3.102e+09 ±  5%  perf-stat.ps.branch-instructions
  37245481 ±  3%     -56.0%   16391709 ±  6%  perf-stat.ps.branch-misses
  43407496 ± 17%     -61.2%   16831298 ± 20%  perf-stat.ps.cache-misses
 1.508e+08 ± 15%     -59.3%   61321022 ± 19%  perf-stat.ps.cache-references
     44501 ±  5%     -62.7%      16604        perf-stat.ps.context-switches
     85898            +1.5%      87191        perf-stat.ps.cpu-clock
 7.891e+10 ± 18%     -61.0%  3.074e+10 ± 21%  perf-stat.ps.cpu-cycles
      2407 ± 21%     -71.4%     687.91 ± 14%  perf-stat.ps.cpu-migrations
    990172 ± 15%     -50.8%     487318 ± 36%  perf-stat.ps.dTLB-load-misses
 1.135e+10           -60.7%  4.463e+09 ±  5%  perf-stat.ps.dTLB-loads
    136151 ± 13%     -56.6%      59137 ± 30%  perf-stat.ps.dTLB-store-misses
 5.421e+09           -59.8%  2.177e+09 ±  5%  perf-stat.ps.dTLB-stores
  18737344 ± 10%     -59.3%    7617127 ± 11%  perf-stat.ps.iTLB-load-misses
   3890634 ±  2%     -40.3%    2323525        perf-stat.ps.iTLB-loads
 3.965e+10           -60.7%   1.56e+10 ±  5%  perf-stat.ps.instructions
     52.95 ± 28%     -61.9%      20.17 ± 24%  perf-stat.ps.major-faults
      7543 ±  3%     -37.1%       4745 ±  3%  perf-stat.ps.minor-faults
   7853349 ± 16%     -62.5%    2944346 ± 20%  perf-stat.ps.node-load-misses
   2308573 ±  3%     -60.8%     905994 ±  6%  perf-stat.ps.node-loads
   2651779 ±  4%     -63.4%     970377 ±  9%  perf-stat.ps.node-store-misses
   3831581           -63.5%    1399572 ±  5%  perf-stat.ps.node-stores
      7596 ±  3%     -37.3%       4765 ±  3%  perf-stat.ps.page-faults
     85898            +1.5%      87191        perf-stat.ps.task-clock
      0.00            +1.1        1.09 ± 25%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      0.00            +1.1        1.10 ± 25%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      0.12 ±200%      +1.6        1.76 ± 26%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
      0.12 ±200%      +1.8        1.96 ± 26%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
      0.07 ± 16%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.down
      0.07 ± 16%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.__down
      0.00            +0.1        0.06 ± 17%  perf-profile.children.cycles-pp.tick_irq_enter
      0.00            +0.1        0.06 ± 19%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
      0.00            +0.1        0.07 ± 25%  perf-profile.children.cycles-pp.update_rq_clock
      0.00            +0.1        0.07 ± 27%  perf-profile.children.cycles-pp.io_serial_in
      0.00            +0.1        0.07 ± 20%  perf-profile.children.cycles-pp.irq_enter_rcu
      0.00            +0.1        0.08 ± 14%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.00            +0.1        0.08 ± 28%  perf-profile.children.cycles-pp.ktime_get_update_offsets_now
      0.00            +0.1        0.08 ± 21%  perf-profile.children.cycles-pp.read_tsc
      0.00            +0.1        0.08 ± 22%  perf-profile.children.cycles-pp.serial8250_console_putchar
      0.00            +0.1        0.09 ± 24%  perf-profile.children.cycles-pp.wait_for_xmitr
      0.01 ±200%      +0.1        0.10 ± 14%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.00            +0.1        0.09 ± 26%  perf-profile.children.cycles-pp.uart_console_write
      0.00            +0.1        0.09 ± 23%  perf-profile.children.cycles-pp.native_sched_clock
      0.00            +0.1        0.09 ± 23%  perf-profile.children.cycles-pp.serial8250_console_write
      0.00            +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.irq_work_run_list
      0.00            +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.asm_sysvec_irq_work
      0.00            +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.sysvec_irq_work
      0.00            +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.__sysvec_irq_work
      0.00            +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.irq_work_run
      0.00            +0.1        0.09 ± 25%  perf-profile.children.cycles-pp.irq_work_single
      0.00            +0.1        0.10 ± 22%  perf-profile.children.cycles-pp._printk
      0.00            +0.1        0.10 ± 22%  perf-profile.children.cycles-pp.vprintk_emit
      0.00            +0.1        0.10 ± 22%  perf-profile.children.cycles-pp.console_unlock
      0.02 ±122%      +0.1        0.12 ± 34%  perf-profile.children.cycles-pp.rebalance_domains
      0.00            +0.1        0.11 ± 33%  perf-profile.children.cycles-pp.tick_nohz_irq_exit
      0.01 ±200%      +0.1        0.12 ± 26%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.00            +0.1        0.11 ± 21%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.00            +0.1        0.13 ± 28%  perf-profile.children.cycles-pp.tick_nohz_next_event
      0.01 ±200%      +0.1        0.14 ± 26%  perf-profile.children.cycles-pp.update_blocked_averages
      0.00            +0.1        0.14 ± 20%  perf-profile.children.cycles-pp.lapic_next_deadline
      0.00            +0.1        0.15 ± 23%  perf-profile.children.cycles-pp.run_rebalance_domains
      0.20 ± 10%      +0.2        0.35 ± 18%  perf-profile.children.cycles-pp.scheduler_tick
      0.00            +0.2        0.16 ± 27%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.32 ±  7%      +0.2        0.50 ± 41%  perf-profile.children.cycles-pp._raw_spin_lock
      0.17 ± 22%      +0.2        0.42 ± 28%  perf-profile.children.cycles-pp.__softirqentry_text_start
      0.28 ± 15%      +0.3        0.55 ± 19%  perf-profile.children.cycles-pp.update_process_times
      0.29 ± 14%      +0.3        0.56 ± 20%  perf-profile.children.cycles-pp.tick_sched_handle
      0.20 ± 21%      +0.3        0.48 ± 27%  perf-profile.children.cycles-pp.irq_exit_rcu
      0.30 ± 14%      +0.3        0.60 ± 19%  perf-profile.children.cycles-pp.tick_sched_timer
      0.07 ± 65%      +0.3        0.38 ± 28%  perf-profile.children.cycles-pp.menu_select
      0.15 ± 35%      +0.3        0.45 ± 24%  perf-profile.children.cycles-pp.clockevents_program_event
      0.14 ± 40%      +0.3        0.47 ± 25%  perf-profile.children.cycles-pp.ktime_get
      0.38 ± 16%      +0.4        0.82 ± 19%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.58 ± 20%      +0.9        1.44 ± 20%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.58 ± 21%      +0.9        1.46 ± 20%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      1.16 ± 45%      +1.3        2.47 ± 21%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.84 ± 21%      +1.3        2.18 ± 22%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.02 ±122%      +0.1        0.09 ± 33%  perf-profile.self.cycles-pp.update_process_times
      0.00            +0.1        0.07 ± 27%  perf-profile.self.cycles-pp.io_serial_in
      0.00            +0.1        0.08 ± 21%  perf-profile.self.cycles-pp.read_tsc
      0.00            +0.1        0.08 ± 25%  perf-profile.self.cycles-pp.native_sched_clock
      0.00            +0.1        0.09 ± 22%  perf-profile.self.cycles-pp.update_blocked_averages
      0.01 ±200%      +0.1        0.12 ± 26%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.00            +0.1        0.14 ± 20%  perf-profile.self.cycles-pp.lapic_next_deadline
      0.03 ±122%      +0.2        0.19 ± 35%  perf-profile.self.cycles-pp.menu_select
      0.01 ±200%      +0.3        0.26 ± 25%  perf-profile.self.cycles-pp.cpuidle_enter_state
      0.13 ± 41%      +0.3        0.41 ± 25%  perf-profile.self.cycles-pp.ktime_get
      9220 ± 13%    +124.1%      20659 ±  8%  softirqs.CPU0.RCU
     11092 ±  5%     +70.2%      18876 ±  9%  softirqs.CPU0.SCHED
      8791 ± 23%    +106.1%      18118 ± 12%  softirqs.CPU1.RCU
      7976 ±  9%    +104.7%      16331 ± 10%  softirqs.CPU1.SCHED
      7028 ± 15%    +150.4%      17600 ±  9%  softirqs.CPU10.RCU
      6664 ±  3%    +118.2%      14542 ± 10%  softirqs.CPU10.SCHED
      7043 ± 17%    +155.5%      17998 ±  8%  softirqs.CPU11.RCU
      6804 ±  9%    +127.0%      15445 ± 11%  softirqs.CPU11.SCHED
      6995 ±  9%    +149.7%      17467 ±  9%  softirqs.CPU12.RCU
      6734 ±  6%    +117.9%      14676 ±  9%  softirqs.CPU12.SCHED
      6768 ± 15%    +164.5%      17905 ± 11%  softirqs.CPU13.RCU
      6458 ±  9%    +128.7%      14770 ±  9%  softirqs.CPU13.SCHED
      7216 ± 14%    +145.9%      17744 ± 10%  softirqs.CPU14.RCU
      7202 ±  6%    +105.4%      14792 ± 10%  softirqs.CPU14.SCHED
      6438 ± 15%    +168.9%      17311 ±  7%  softirqs.CPU15.RCU
      6599 ±  6%    +125.2%      14859 ±  9%  softirqs.CPU15.SCHED
      6712 ± 16%    +155.3%      17135 ± 12%  softirqs.CPU16.RCU
      6576 ±  9%    +126.4%      14885 ±  9%  softirqs.CPU16.SCHED
      6863 ± 13%    +151.9%      17290 ±  7%  softirqs.CPU17.RCU
      6749 ±  9%    +119.8%      14832 ±  8%  softirqs.CPU17.SCHED
      6711 ± 14%    +158.3%      17335 ±  9%  softirqs.CPU18.RCU
      6715 ±  5%    +116.0%      14504 ± 11%  softirqs.CPU18.SCHED
      6749 ± 20%    +149.0%      16807 ±  8%  softirqs.CPU19.RCU
      6358 ±  4%    +128.4%      14523 ±  9%  softirqs.CPU19.SCHED
      7638 ± 13%    +147.4%      18898 ±  8%  softirqs.CPU2.RCU
      7850 ±  5%     +97.5%      15505 ±  8%  softirqs.CPU2.SCHED
      7678 ± 24%    +118.3%      16759 ± 12%  softirqs.CPU20.RCU
      6652 ±  7%    +121.4%      14726 ±  8%  softirqs.CPU20.SCHED
      6882 ± 13%    +153.2%      17426 ± 10%  softirqs.CPU21.RCU
      6865 ±  7%    +113.9%      14685 ± 10%  softirqs.CPU21.SCHED
      6487 ± 11%    +169.3%      17469 ±  7%  softirqs.CPU22.RCU
      6482 ±  5%    +128.2%      14791 ±  8%  softirqs.CPU22.SCHED
      6276 ± 14%    +174.6%      17235 ± 10%  softirqs.CPU23.RCU
      6511 ±  6%    +129.5%      14941 ± 10%  softirqs.CPU23.SCHED
      6415 ± 14%    +163.2%      16885 ±  9%  softirqs.CPU24.RCU
      6805 ±  7%    +113.6%      14535 ±  8%  softirqs.CPU24.SCHED
      7473 ± 31%    +125.5%      16849 ±  8%  softirqs.CPU25.RCU
      6941 ±  9%    +107.5%      14405 ±  7%  softirqs.CPU25.SCHED
      6465 ± 13%    +160.9%      16868 ±  8%  softirqs.CPU26.RCU
      6886 ± 11%    +108.4%      14350 ±  8%  softirqs.CPU26.SCHED
      6352 ± 14%    +169.2%      17102 ±  8%  softirqs.CPU27.RCU
      6487 ±  5%    +124.8%      14581 ±  8%  softirqs.CPU27.SCHED
      6425 ± 13%    +160.1%      16712 ±  7%  softirqs.CPU28.RCU
      6749 ±  9%    +118.0%      14710 ±  6%  softirqs.CPU28.SCHED
      6178 ± 15%    +171.7%      16788 ±  9%  softirqs.CPU29.RCU
      6368 ±  5%    +125.1%      14333 ±  8%  softirqs.CPU29.SCHED
      7033 ± 12%    +173.8%      19259 ±  8%  softirqs.CPU3.RCU
      7148 ±  6%    +115.4%      15400 ±  8%  softirqs.CPU3.SCHED
      6646 ± 17%    +158.0%      17149 ±  7%  softirqs.CPU30.RCU
      6886 ±  9%    +110.3%      14483 ±  8%  softirqs.CPU30.SCHED
      6258 ± 14%    +166.0%      16650 ± 14%  softirqs.CPU31.RCU
      6873 ± 12%    +108.1%      14302 ±  6%  softirqs.CPU31.SCHED
      6406 ± 13%    +178.8%      17860 ±  7%  softirqs.CPU32.RCU
      6475 ±  6%    +123.9%      14497 ±  8%  softirqs.CPU32.SCHED
      6372 ± 13%    +174.3%      17479 ±  7%  softirqs.CPU33.RCU
      6438 ±  6%    +126.4%      14574 ±  8%  softirqs.CPU33.SCHED
      6324 ± 13%    +176.7%      17497 ±  7%  softirqs.CPU34.RCU
      6448 ±  6%    +124.4%      14468 ±  8%  softirqs.CPU34.SCHED
      6268 ± 13%    +171.2%      16996 ±  6%  softirqs.CPU35.RCU
      6432 ±  4%    +128.6%      14703 ± 10%  softirqs.CPU35.SCHED
      6358 ± 13%    +174.9%      17481 ±  7%  softirqs.CPU36.RCU
      6465 ±  5%    +123.1%      14425 ±  8%  softirqs.CPU36.SCHED
      6529 ± 12%    +175.3%      17978 ±  9%  softirqs.CPU37.RCU
      6437 ±  5%    +134.0%      15061 ± 11%  softirqs.CPU37.SCHED
      6292 ± 12%    +178.4%      17517 ±  7%  softirqs.CPU38.RCU
      6353 ±  5%    +127.4%      14444 ±  8%  softirqs.CPU38.SCHED
      6230 ± 13%    +178.7%      17367 ±  7%  softirqs.CPU39.RCU
      6408 ±  6%    +126.1%      14492 ±  8%  softirqs.CPU39.SCHED
      7067 ± 10%    +163.0%      18585 ±  9%  softirqs.CPU4.RCU
      7419 ±  6%    +108.0%      15431 ± 10%  softirqs.CPU4.SCHED
      6412 ± 15%    +167.6%      17158 ±  7%  softirqs.CPU40.RCU
      6567 ±  8%    +120.1%      14451 ±  8%  softirqs.CPU40.SCHED
      6366 ± 13%    +175.9%      17564 ±  7%  softirqs.CPU41.RCU
      6356 ±  6%    +127.1%      14434 ±  9%  softirqs.CPU41.SCHED
      6539 ± 14%    +166.9%      17451 ±  8%  softirqs.CPU42.RCU
      6393 ± 12%    +122.0%      14193 ± 10%  softirqs.CPU42.SCHED
      6722 ± 14%    +165.4%      17838 ±  7%  softirqs.CPU43.RCU
      6211 ± 13%    +112.6%      13202 ± 16%  softirqs.CPU43.SCHED
      6069 ± 13%    +137.6%      14420 ±  9%  softirqs.CPU44.RCU
      6805 ± 11%    +105.3%      13973 ±  6%  softirqs.CPU44.SCHED
      5881 ± 13%    +181.0%      16528 ± 14%  softirqs.CPU45.RCU
      6423 ±  7%    +123.6%      14363 ±  8%  softirqs.CPU45.SCHED
      6465 ± 15%    +186.6%      18531 ± 16%  softirqs.CPU46.RCU
      6812 ± 11%    +110.6%      14348 ±  9%  softirqs.CPU46.SCHED
      6796 ± 13%    +169.6%      18326 ±  7%  softirqs.CPU47.RCU
      6515 ±  2%    +118.8%      14256 ±  5%  softirqs.CPU47.SCHED
      7226 ± 14%    +154.4%      18382 ± 12%  softirqs.CPU48.RCU
      6735 ± 13%    +107.0%      13944 ±  7%  softirqs.CPU48.SCHED
      6878 ±  9%    +162.7%      18072 ±  8%  softirqs.CPU49.RCU
      6506 ±  6%    +121.5%      14411 ±  9%  softirqs.CPU49.SCHED
      7494 ± 17%    +145.7%      18412 ± 10%  softirqs.CPU5.RCU
      7326 ±  8%    +106.2%      15108 ±  9%  softirqs.CPU5.SCHED
      6872 ± 19%    +159.3%      17816 ±  9%  softirqs.CPU50.RCU
      6587 ±  5%    +120.2%      14504 ± 10%  softirqs.CPU50.SCHED
      7329 ± 19%    +149.4%      18281 ±  8%  softirqs.CPU51.RCU
      6767 ± 10%    +110.8%      14264 ±  8%  softirqs.CPU51.SCHED
      6699 ± 14%    +172.5%      18258 ±  7%  softirqs.CPU52.RCU
      6386 ±  8%    +133.9%      14937 ±  8%  softirqs.CPU52.SCHED
      7339 ± 17%    +139.7%      17593 ±  8%  softirqs.CPU53.RCU
      6669 ±  5%    +116.3%      14425 ±  6%  softirqs.CPU53.SCHED
      6991 ± 10%    +160.2%      18192 ±  8%  softirqs.CPU54.RCU
      6563 ±  5%    +122.5%      14602 ± 12%  softirqs.CPU54.SCHED
      6846 ± 12%    +171.7%      18603 ± 10%  softirqs.CPU55.RCU
      6658 ±  7%    +126.8%      15099 ±  8%  softirqs.CPU55.SCHED
      7738 ± 17%    +142.6%      18773 ±  8%  softirqs.CPU56.RCU
      6670 ±  5%    +122.1%      14816 ± 10%  softirqs.CPU56.SCHED
      7737 ± 30%    +133.9%      18095 ± 11%  softirqs.CPU57.RCU
      6514 ±  4%    +121.7%      14440 ± 10%  softirqs.CPU57.SCHED
      6821 ± 16%    +161.3%      17826 ±  9%  softirqs.CPU58.RCU
      7124 ±  9%    +104.1%      14545 ±  8%  softirqs.CPU58.SCHED
      6987 ± 14%    +156.1%      17892 ±  8%  softirqs.CPU59.RCU
      7062 ±  6%    +108.9%      14754 ±  8%  softirqs.CPU59.SCHED
      6589 ± 10%    +169.3%      17743 ±  7%  softirqs.CPU6.RCU
      7181 ± 10%    +105.4%      14754 ±  8%  softirqs.CPU6.SCHED
      6439 ± 13%    +159.0%      16675 ± 13%  softirqs.CPU60.RCU
      6752 ±  6%    +119.4%      14814 ±  9%  softirqs.CPU60.SCHED
      6680 ± 16%    +158.8%      17290 ±  9%  softirqs.CPU61.RCU
      6887 ±  5%    +116.6%      14920 ±  9%  softirqs.CPU61.SCHED
      6898 ± 14%    +159.5%      17905 ± 16%  softirqs.CPU62.RCU
      6753 ±  6%    +118.9%      14786 ±  9%  softirqs.CPU62.SCHED
      6823 ± 15%    +150.8%      17112 ±  7%  softirqs.CPU63.RCU
      7047 ±  6%    +102.3%      14258 ±  5%  softirqs.CPU63.SCHED
      7056 ± 15%    +147.5%      17467 ±  7%  softirqs.CPU64.RCU
      6983 ±  6%    +111.8%      14794 ±  9%  softirqs.CPU64.SCHED
      7279 ± 13%    +137.4%      17280 ±  8%  softirqs.CPU65.RCU
      6898 ±  4%    +110.4%      14515 ± 10%  softirqs.CPU65.SCHED
      6694 ± 15%    +161.4%      17501 ±  8%  softirqs.CPU66.RCU
      6532 ±  6%    +120.9%      14428 ±  7%  softirqs.CPU66.SCHED
      6428 ± 14%    +169.1%      17301 ±  7%  softirqs.CPU67.RCU
      6479 ±  6%    +124.3%      14535 ±  7%  softirqs.CPU67.SCHED
      6442 ± 13%    +165.6%      17107 ±  9%  softirqs.CPU68.RCU
      6399 ±  5%    +125.8%      14452 ±  8%  softirqs.CPU68.SCHED
      6834 ± 11%    +154.6%      17396 ±  7%  softirqs.CPU69.RCU
      6753 ±  5%    +114.2%      14467 ±  8%  softirqs.CPU69.SCHED
      7107 ±  9%    +154.7%      18099 ±  9%  softirqs.CPU7.RCU
      6628 ±  7%    +125.4%      14939 ±  8%  softirqs.CPU7.SCHED
      6655 ± 13%    +159.8%      17288 ±  9%  softirqs.CPU70.RCU
      6766 ± 12%    +111.6%      14313 ±  8%  softirqs.CPU70.SCHED
      6582 ± 14%    +163.7%      17356 ±  7%  softirqs.CPU71.RCU
      6514 ±  7%    +117.7%      14183 ± 11%  softirqs.CPU71.SCHED
      6305 ± 14%    +169.1%      16965 ±  6%  softirqs.CPU72.RCU
      6478 ±  6%    +120.4%      14277 ±  7%  softirqs.CPU72.SCHED
      6291 ± 14%    +171.9%      17108 ±  7%  softirqs.CPU73.RCU
      6437 ±  6%    +117.5%      14003 ±  9%  softirqs.CPU73.SCHED
      7039 ± 21%    +142.0%      17031 ±  7%  softirqs.CPU74.RCU
      7555 ± 25%     +84.3%      13925 ± 10%  softirqs.CPU74.SCHED
      5988 ± 15%    +181.7%      16871 ±  7%  softirqs.CPU75.RCU
      6516 ±  6%    +133.5%      15214 ± 14%  softirqs.CPU75.SCHED
      6047 ± 11%    +186.9%      17347 ±  8%  softirqs.CPU76.RCU
      6498 ±  6%    +127.1%      14758 ± 11%  softirqs.CPU76.SCHED
      6209 ± 10%    +162.5%      16298 ±  8%  softirqs.CPU77.RCU
      6492 ±  6%    +118.4%      14182 ±  9%  softirqs.CPU77.SCHED
      6159 ± 11%    +169.3%      16585 ±  7%  softirqs.CPU78.RCU
      6426 ±  6%    +123.4%      14357 ±  8%  softirqs.CPU78.SCHED
      6052 ± 13%    +162.7%      15898 ±  7%  softirqs.CPU79.RCU
      6510 ±  6%    +123.1%      14525 ±  8%  softirqs.CPU79.SCHED
      6848 ± 12%    +166.0%      18217 ± 11%  softirqs.CPU8.RCU
      6924 ±  5%    +109.2%      14485 ±  9%  softirqs.CPU8.SCHED
      6143 ± 13%    +171.4%      16673 ±  7%  softirqs.CPU80.RCU
      6475 ±  6%    +121.7%      14357 ±  7%  softirqs.CPU80.SCHED
      6180 ± 11%    +173.2%      16885 ±  6%  softirqs.CPU81.RCU
      6691 ±  6%    +119.6%      14696 ±  7%  softirqs.CPU81.SCHED
      6215 ± 13%    +168.7%      16703 ±  8%  softirqs.CPU82.RCU
      6442 ±  6%    +121.3%      14254 ±  8%  softirqs.CPU82.SCHED
      6275 ± 15%    +164.8%      16617 ±  7%  softirqs.CPU83.RCU
      7139 ± 16%     +98.7%      14188 ±  7%  softirqs.CPU83.SCHED
      6210 ± 14%    +167.1%      16589 ±  8%  softirqs.CPU84.RCU
      6537 ±  7%    +119.4%      14345 ±  8%  softirqs.CPU84.SCHED
      6558 ± 11%    +156.6%      16828 ±  7%  softirqs.CPU85.RCU
      6953 ± 18%    +106.3%      14345 ±  8%  softirqs.CPU85.SCHED
      6326 ± 16%    +166.0%      16825 ±  7%  softirqs.CPU86.RCU
      6458 ±  6%    +123.3%      14423 ±  9%  softirqs.CPU86.SCHED
      6216 ± 15%    +175.5%      17130 ±  8%  softirqs.CPU87.RCU
      6004 ±  3%    +132.4%      13953 ±  9%  softirqs.CPU87.SCHED
      6675 ± 13%    +168.6%      17930 ±  9%  softirqs.CPU9.RCU
      6738 ±  9%    +119.2%      14772 ±  9%  softirqs.CPU9.SCHED
    591090 ± 12%    +159.9%    1536265 ±  8%  softirqs.RCU
    593854 ±  5%    +116.8%    1287218 ±  8%  softirqs.SCHED
     16454 ±  4%     +63.0%      26822 ±  4%  softirqs.TIMER




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.16.0-rc5 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc-9 (Debian 9.3.0-22) 9.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=90300
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23502
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23502
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_GENERIC_IRQ_INJECTION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_CONTEXT_TRACKING=y
# CONFIG_CONTEXT_TRACKING_FORCE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
# CONFIG_BPF_PRELOAD is not set
# CONFIG_BPF_LSM is not set
# end of BPF subsystem

CONFIG_PREEMPT_VOLUNTARY_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
# CONFIG_PREEMPT_DYNAMIC is not set
# CONFIG_SCHED_CORE is not set

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem

CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=20
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_MISC is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
# CONFIG_BOOT_CONFIG is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_LD_ORPHAN_WARN=y
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
# CONFIG_EXPERT is not set
CONFIG_UID16=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_HAVE_ARCH_USERFAULTFD_WP=y
CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_USERFAULTFD=y
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_FILTER_PGPROT=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_NR_GPIO=1024
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_AUDIT_ARCH=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

#
# Processor type and features
#
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_X2APIC=y
CONFIG_X86_MPPARSE=y
# CONFIG_GOLDFISH is not set
CONFIG_RETPOLINE=y
# CONFIG_X86_CPU_RESCTRL is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_NUMACHIP is not set
# CONFIG_X86_VSMP is not set
CONFIG_X86_UV=y
# CONFIG_X86_GOLDFISH is not set
# CONFIG_X86_INTEL_MID is not set
CONFIG_X86_INTEL_LPSS=y
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
CONFIG_IOSF_MBI=y
# CONFIG_IOSF_MBI_DEBUG is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_PARAVIRT_SPINLOCKS=y
CONFIG_X86_HV_CALLBACK_VECTOR=y
# CONFIG_XEN is not set
CONFIG_KVM_GUEST=y
CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
CONFIG_PARAVIRT_TIME_ACCOUNTING=y
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_JAILHOUSE_GUEST is not set
# CONFIG_ACRN_GUEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
CONFIG_MAXSMP=y
CONFIG_NR_CPUS_RANGE_BEGIN=8192
CONFIG_NR_CPUS_RANGE_END=8192
CONFIG_NR_CPUS_DEFAULT=8192
CONFIG_NR_CPUS=8192
CONFIG_SCHED_CLUSTER=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCELOG_LEGACY=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_MCE_INJECT=m

#
# Performance monitoring
#
CONFIG_PERF_EVENTS_INTEL_UNCORE=m
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
# CONFIG_PERF_EVENTS_AMD_POWER is not set
CONFIG_PERF_EVENTS_AMD_UNCORE=y
# end of Performance monitoring

CONFIG_X86_16BIT=y
CONFIG_X86_ESPFIX64=y
CONFIG_X86_VSYSCALL_EMULATION=y
CONFIG_X86_IOPL_IOPERM=y
CONFIG_I8K=m
CONFIG_MICROCODE=y
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_5LEVEL=y
CONFIG_X86_DIRECT_GBPAGES=y
# CONFIG_X86_CPA_STATISTICS is not set
# CONFIG_AMD_MEM_ENCRYPT is not set
CONFIG_NUMA=y
# CONFIG_AMD_NUMA is not set
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=10
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
# CONFIG_ARCH_MEMORY_PROBE is not set
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_X86_PMEM_LEGACY_DEVICE=y
CONFIG_X86_PMEM_LEGACY=m
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
# CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK is not set
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_X86_UMIP=y
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
CONFIG_X86_INTEL_TSX_MODE_OFF=y
# CONFIG_X86_INTEL_TSX_MODE_ON is not set
# CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
# CONFIG_X86_SGX is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_MIXED=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
# CONFIG_KEXEC_SIG is not set
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_X86_NEED_RELOCS=y
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_DYNAMIC_MEMORY_LAYOUT=y
CONFIG_RANDOMIZE_MEMORY=y
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
CONFIG_HOTPLUG_CPU=y
CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
# CONFIG_COMPAT_VDSO is not set
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_LEGACY_VSYSCALL_XONLY is not set
# CONFIG_LEGACY_VSYSCALL_NONE is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_MODIFY_LDT_SYSCALL=y
# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y
# end of Processor type and features

CONFIG_ARCH_HAS_ADD_PAGES=y
CONFIG_ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_HIBERNATION_SNAPSHOT_DEV=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_ADVANCED_DEBUG is not set
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_PM_SLEEP_DEBUG=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_CLK=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
# CONFIG_ENERGY_MODEL is not set
CONFIG_ARCH_SUPPORTS_ACPI=y
CONFIG_ACPI=y
CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_SPCR_TABLE=y
# CONFIG_ACPI_FPDT is not set
CONFIG_ACPI_LPIT=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_REV_OVERRIDE_POSSIBLE=y
CONFIG_ACPI_EC_DEBUGFS=m
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_TAD=m
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_CPU_FREQ_PSS=y
CONFIG_ACPI_PROCESSOR_CSTATE=y
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_ACPI_CPPC_LIB=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_PROCESSOR_AGGREGATOR=m
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_PLATFORM_PROFILE=m
CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
CONFIG_ACPI_TABLE_UPGRADE=y
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_ACPI_SBS=m
CONFIG_ACPI_HED=y
# CONFIG_ACPI_CUSTOM_METHOD is not set
CONFIG_ACPI_BGRT=y
CONFIG_ACPI_NFIT=m
# CONFIG_NFIT_SECURITY_DEBUG is not set
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_HMAT is not set
CONFIG_HAVE_ACPI_APEI=y
CONFIG_HAVE_ACPI_APEI_NMI=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_ACPI_APEI_MEMORY_FAILURE=y
CONFIG_ACPI_APEI_EINJ=m
# CONFIG_ACPI_APEI_ERST_DEBUG is not set
# CONFIG_ACPI_DPTF is not set
CONFIG_ACPI_WATCHDOG=y
CONFIG_ACPI_EXTLOG=m
CONFIG_ACPI_ADXL=y
# CONFIG_ACPI_CONFIGFS is not set
CONFIG_PMIC_OPREGION=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_PRMT=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ATTR_SET=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y

#
# CPU frequency scaling drivers
#
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ_CPB=y
CONFIG_X86_POWERNOW_K8=m
# CONFIG_X86_AMD_FREQ_SENSITIVITY is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_P4_CLOCKMOD=m

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=m
# end of CPU Frequency scaling

#
# CPU Idle
#
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_CPU_IDLE_GOV_TEO is not set
# CONFIG_CPU_IDLE_GOV_HALTPOLL is not set
CONFIG_HALTPOLL_CPUIDLE=y
# end of CPU Idle

CONFIG_INTEL_IDLE=y
# end of Power management and ACPI options

#
# Bus options (PCI etc.)
#
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_MMCONF_FAM10H=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
# end of Bus options (PCI etc.)

#
# Binary Emulations
#
CONFIG_IA32_EMULATION=y
# CONFIG_X86_X32 is not set
CONFIG_COMPAT_32=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
# end of Binary Emulations

CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
CONFIG_KVM_COMPAT=y
CONFIG_HAVE_KVM_IRQ_BYPASS=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_KVM_XFER_TO_GUEST_WORK=y
CONFIG_HAVE_KVM_PM_NOTIFIER=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_XEN is not set
CONFIG_KVM_MMU_AUDIT=y
CONFIG_AS_AVX512=y
CONFIG_AS_SHA1_NI=y
CONFIG_AS_SHA256_NI=y
CONFIG_AS_TPAUSE=y

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_HOTPLUG_SMT=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
# CONFIG_STATIC_CALL_SELFTEST is not set
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_HAS_SET_DIRECT_MAP=y
CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_HAVE_ARCH_STACKLEAK=y
CONFIG_HAVE_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_LTO_NONE=y
CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_MOVE_PUD=y
CONFIG_HAVE_MOVE_PMD=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
CONFIG_HAVE_EXIT_THREAD=y
CONFIG_ARCH_MMAP_RND_BITS=28
CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_HAVE_STACK_VALIDATION=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS=y
CONFIG_ARCH_USE_MEMREMAP_PROT=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_HAVE_STATIC_CALL=y
CONFIG_HAVE_STATIC_CALL_INLINE=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_ARCH_WANT_LD_ORPHAN_WARN=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
CONFIG_DYNAMIC_SIGFRAME=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=m
# CONFIG_BLK_DEV_ZONED is not set
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
CONFIG_BLK_WBT=y
CONFIG_BLK_WBT_MQ=y
# CONFIG_BLK_CGROUP_IOLATENCY is not set
# CONFIG_BLK_CGROUP_FC_APPID is not set
# CONFIG_BLK_CGROUP_IOCOST is not set
# CONFIG_BLK_CGROUP_IOPRIO is not set
CONFIG_BLK_DEBUG_FS=y
# CONFIG_BLK_SED_OPAL is not set
# CONFIG_BLK_INLINE_ENCRYPTION is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_PM=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_EXCLUSIVE_SYSTEM_RAM=y
CONFIG_HAVE_BOOTMEM_INFO_NODE=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_MHP_MEMMAP_ON_MEMORY=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_ARCH_ENABLE_THP_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=m
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_ARCH_WANTS_THP_SWAP=y
CONFIG_THP_SWAP=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
# CONFIG_CMA is not set
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set
CONFIG_ZPOOL=y
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
CONFIG_ZSMALLOC_STAT=y
CONFIG_GENERIC_EARLY_IOREMAP=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_PAGE_IDLE_FLAG=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_PTE_DEVMAP=y
CONFIG_ZONE_DMA=y
CONFIG_ZONE_DMA32=y
CONFIG_ZONE_DEVICE=y
CONFIG_DEV_PAGEMAP_OPS=y
CONFIG_DEVICE_PRIVATE=y
CONFIG_VMAP_PFN=y
CONFIG_ARCH_USES_HIGH_VMA_FLAGS=y
CONFIG_ARCH_HAS_PKEYS=y
# CONFIG_PERCPU_STATS is not set
# CONFIG_GUP_TEST is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_SECRETMEM=y

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
CONFIG_UNIX_DIAG=m
CONFIG_TLS=m
CONFIG_TLS_DEVICE=y
# CONFIG_TLS_TOE is not set
CONFIG_XFRM=y
CONFIG_XFRM_OFFLOAD=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_USER_COMPAT is not set
# CONFIG_XFRM_INTERFACE is not set
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
CONFIG_XDP_SOCKETS=y
# CONFIG_XDP_SOCKETS_DIAG is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
# CONFIG_INET_ESPINTCP is not set
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
# CONFIG_TCP_CONG_CDG is not set
CONFIG_TCP_CONG_BBR=m
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
# CONFIG_INET6_ESPINTCP is not set
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
# CONFIG_IPV6_RPL_LWTUNNEL is not set
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
CONFIG_NETLABEL=y
# CONFIG_MPTCP is not set
CONFIG_NETWORK_SECMARK=y
CONFIG_NET_PTP_CLASSIFY=y
CONFIG_NETWORK_PHY_TIMESTAMPING=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_EGRESS=y
CONFIG_NETFILTER_SKIP_EGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
# CONFIG_NETFILTER_NETLINK_HOOK is not set
# CONFIG_NETFILTER_NETLINK_ACCT is not set
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_SYSLOG=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
CONFIG_NF_CT_NETLINK_HELPER=m
CONFIG_NETFILTER_NETLINK_GLUE_CT=y
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_REDIRECT=y
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NETFILTER_SYNPROXY=m
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_COUNTER=m
CONFIG_NFT_CONNLIMIT=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
CONFIG_NFT_SOCKET=m
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
CONFIG_NF_DUP_NETDEV=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
# CONFIG_NFT_REJECT_NETDEV is not set
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XTABLES_COMPAT=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
CONFIG_NETFILTER_XT_TARGET_NETMAP=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_REDIRECT=m
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
# CONFIG_NETFILTER_XT_MATCH_L2TP is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_IPMAC=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
# CONFIG_IP_VS_TWOS is not set

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_SOCKET_IPV4=m
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
# CONFIG_IP_NF_TARGET_CLUSTERIP is not set
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_SOCKET_IPV6=m
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_DUP_IPV6=m
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
# CONFIG_IP6_NF_TARGET_HL is not set
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
CONFIG_NFT_BRIDGE_REJECT=m
# CONFIG_NF_CONNTRACK_BRIDGE is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
CONFIG_BRIDGE_EBT_IP6=m
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_NFLOG=m
# CONFIG_BPFILTER is not set
CONFIG_IP_DCCP=y
CONFIG_INET_DCCP_DIAG=m

#
# DCCP CCIDs Configuration
#
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=y
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_TFRC_LIB=y
# end of DCCP CCIDs Configuration

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
# end of DCCP Kernel Hacking

CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5 is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
CONFIG_SCTP_COOKIE_HMAC_SHA1=y
CONFIG_INET_SCTP_DIAG=m
# CONFIG_RDS is not set
CONFIG_TIPC=m
CONFIG_TIPC_MEDIA_UDP=y
CONFIG_TIPC_CRYPTO=y
CONFIG_TIPC_DIAG=m
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_MRP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_BRIDGE_VLAN_FILTERING=y
# CONFIG_BRIDGE_MRP is not set
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_VLAN_8021Q_MVRP=y
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
CONFIG_6LOWPAN=m
# CONFIG_6LOWPAN_DEBUGFS is not set
# CONFIG_6LOWPAN_NHC is not set
CONFIG_IEEE802154=m
# CONFIG_IEEE802154_NL802154_EXPERIMENTAL is not set
CONFIG_IEEE802154_SOCKET=m
CONFIG_IEEE802154_6LOWPAN=m
CONFIG_MAC802154=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=y
# CONFIG_NET_SCH_CAKE is not set
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_HHF=m
CONFIG_NET_SCH_PIE=m
# CONFIG_NET_SCH_FQ_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
# CONFIG_NET_SCH_ETS is not set
CONFIG_NET_SCH_DEFAULT=y
# CONFIG_DEFAULT_FQ is not set
# CONFIG_DEFAULT_CODEL is not set
CONFIG_DEFAULT_FQ_CODEL=y
# CONFIG_DEFAULT_SFQ is not set
# CONFIG_DEFAULT_PFIFO_FAST is not set
CONFIG_DEFAULT_NET_SCH="fq_codel"

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_CLS_MATCHALL=m
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
# CONFIG_NET_EMATCH_CANID is not set
CONFIG_NET_EMATCH_IPSET=m
# CONFIG_NET_EMATCH_IPT is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_SAMPLE=m
# CONFIG_NET_ACT_IPT is not set
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
CONFIG_NET_ACT_VLAN=m
CONFIG_NET_ACT_BPF=m
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
CONFIG_NET_ACT_SKBMOD=m
# CONFIG_NET_ACT_IFE is not set
CONFIG_NET_ACT_TUNNEL_KEY=m
# CONFIG_NET_ACT_GATE is not set
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
CONFIG_DNS_RESOLVER=m
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VMWARE_VMCI_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_HYPERV_VSOCKETS=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=y
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_NET_NSH=y
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
CONFIG_NET_L3_MASTER_DEV=y
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

# CONFIG_HAMRADIO is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m
CONFIG_CAN_GW=m
# CONFIG_CAN_J1939 is not set
# CONFIG_CAN_ISOTP is not set

#
# CAN Device Drivers
#
CONFIG_CAN_VCAN=m
# CONFIG_CAN_VXCAN is not set
CONFIG_CAN_SLCAN=m
CONFIG_CAN_DEV=m
CONFIG_CAN_CALC_BITTIMING=y
# CONFIG_CAN_KVASER_PCIEFD is not set
CONFIG_CAN_C_CAN=m
CONFIG_CAN_C_CAN_PLATFORM=m
CONFIG_CAN_C_CAN_PCI=m
CONFIG_CAN_CC770=m
# CONFIG_CAN_CC770_ISA is not set
CONFIG_CAN_CC770_PLATFORM=m
# CONFIG_CAN_IFI_CANFD is not set
# CONFIG_CAN_M_CAN is not set
# CONFIG_CAN_PEAK_PCIEFD is not set
CONFIG_CAN_SJA1000=m
CONFIG_CAN_EMS_PCI=m
# CONFIG_CAN_F81601 is not set
CONFIG_CAN_KVASER_PCI=m
CONFIG_CAN_PEAK_PCI=m
CONFIG_CAN_PEAK_PCIEC=y
CONFIG_CAN_PLX_PCI=m
# CONFIG_CAN_SJA1000_ISA is not set
CONFIG_CAN_SJA1000_PLATFORM=m
CONFIG_CAN_SOFTING=m

#
# CAN SPI interfaces
#
# CONFIG_CAN_HI311X is not set
# CONFIG_CAN_MCP251X is not set
# CONFIG_CAN_MCP251XFD is not set
# end of CAN SPI interfaces

#
# CAN USB interfaces
#
# CONFIG_CAN_8DEV_USB is not set
# CONFIG_CAN_EMS_USB is not set
# CONFIG_CAN_ESD_USB2 is not set
# CONFIG_CAN_ETAS_ES58X is not set
# CONFIG_CAN_GS_USB is not set
# CONFIG_CAN_KVASER_USB is not set
# CONFIG_CAN_MCBA_USB is not set
# CONFIG_CAN_PEAK_USB is not set
# CONFIG_CAN_UCAN is not set
# end of CAN USB interfaces

# CONFIG_CAN_DEBUG_DEVICES is not set
# end of CAN Device Drivers

CONFIG_BT=m
CONFIG_BT_BREDR=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m
CONFIG_BT_HS=y
CONFIG_BT_LE=y
# CONFIG_BT_6LOWPAN is not set
# CONFIG_BT_LEDS is not set
# CONFIG_BT_MSFTEXT is not set
# CONFIG_BT_AOSPEXT is not set
CONFIG_BT_DEBUGFS=y
# CONFIG_BT_SELFTEST is not set

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
CONFIG_BT_HCIUART_ATH3K=y
# CONFIG_BT_HCIUART_INTEL is not set
# CONFIG_BT_HCIUART_AG6XX is not set
# CONFIG_BT_HCIBCM203X is not set
# CONFIG_BT_HCIBPA10X is not set
# CONFIG_BT_HCIBFUSB is not set
CONFIG_BT_HCIVHCI=m
CONFIG_BT_MRVL=m
# CONFIG_BT_MRVL_SDIO is not set
# CONFIG_BT_MTKSDIO is not set
# CONFIG_BT_VIRTIO is not set
# end of Bluetooth device drivers

# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
# CONFIG_MCTP is not set
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_CFG80211=m
# CONFIG_NL80211_TESTMODE is not set
# CONFIG_CFG80211_DEVELOPER_WARNINGS is not set
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_DEFAULT_PS=y
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_CFG80211_CRDA_SUPPORT=y
# CONFIG_CFG80211_WEXT is not set
CONFIG_MAC80211=m
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
# CONFIG_MAC80211_MESSAGE_TRACING is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
CONFIG_RFKILL=m
CONFIG_RFKILL_LEDS=y
CONFIG_RFKILL_INPUT=y
# CONFIG_RFKILL_GPIO is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
# CONFIG_NET_9P_DEBUG is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
# CONFIG_NFC is not set
CONFIG_PSAMPLE=m
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_SOCK_VALIDATE_XMIT=y
CONFIG_NET_SELFTESTS=y
CONFIG_NET_SOCK_MSG=y
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y

#
# Device Drivers
#
CONFIG_HAVE_EISA=y
# CONFIG_EISA is not set
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIEAER_INJECT=m
CONFIG_PCIE_ECRC=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_PCIE_DPC=y
# CONFIG_PCIE_PTM is not set
# CONFIG_PCIE_EDR is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_QUIRKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_PCI_PF_STUB=m
CONFIG_PCI_ATS=y
CONFIG_PCI_LOCKLESS_CONFIG=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_P2PDMA is not set
CONFIG_PCI_LABEL=y
CONFIG_PCI_HYPERV=m
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=y

#
# PCI controller drivers
#
CONFIG_VMD=y
CONFIG_PCI_HYPERV_INTERFACE=m

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
# CONFIG_UEVENT_HELPER is not set
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_PAGED_BUF=y
CONFIG_EXTRA_FIRMWARE=""
CONFIG_FW_LOADER_USER_HELPER=y
# CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
# CONFIG_FW_LOADER_COMPRESS is not set
CONFIG_FW_CACHE=y
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_REGMAP=y
CONFIG_REGMAP_I2C=m
CONFIG_REGMAP_SPI=m
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
CONFIG_DMI_SYSFS=y
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y
# CONFIG_ISCSI_IBFT is not set
CONFIG_FW_CFG_SYSFS=y
# CONFIG_FW_CFG_SYSFS_CMDLINE is not set
CONFIG_SYSFB=y
# CONFIG_SYSFB_SIMPLEFB is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# EFI (Extensible Firmware Interface) Support
#
CONFIG_EFI_VARS=y
CONFIG_EFI_ESRT=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y
CONFIG_EFI_RUNTIME_MAP=y
# CONFIG_EFI_FAKE_MEMMAP is not set
CONFIG_EFI_RUNTIME_WRAPPERS=y
CONFIG_EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER=y
# CONFIG_EFI_BOOTLOADER_CONTROL is not set
# CONFIG_EFI_CAPSULE_LOADER is not set
# CONFIG_EFI_TEST is not set
CONFIG_APPLE_PROPERTIES=y
# CONFIG_RESET_ATTACK_MITIGATION is not set
# CONFIG_EFI_RCI2_TABLE is not set
# CONFIG_EFI_DISABLE_PCI_DMA is not set
# end of EFI (Extensible Firmware Interface) Support

CONFIG_UEFI_CPER=y
CONFIG_UEFI_CPER_X86=y
CONFIG_EFI_DEV_PATH_PARSER=y
CONFIG_EFI_EARLYCON=y
CONFIG_EFI_CUSTOM_SSDT_OVERLAYS=y

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_NULL_BLK=m
# CONFIG_BLK_DEV_FD is not set
CONFIG_CDROM=m
# CONFIG_PARIDE is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_ZRAM is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
# CONFIG_BLK_DEV_DRBD is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
CONFIG_VIRTIO_BLK=m
CONFIG_BLK_DEV_RBD=m
# CONFIG_BLK_DEV_RSXX is not set

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
CONFIG_NVME_FC=m
# CONFIG_NVME_TCP is not set
CONFIG_NVME_TARGET=m
# CONFIG_NVME_TARGET_PASSTHRU is not set
CONFIG_NVME_TARGET_LOOP=m
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
# CONFIG_NVME_TARGET_TCP is not set
# end of NVME Support

#
# Misc devices
#
CONFIG_SENSORS_LIS3LV02D=m
# CONFIG_AD525X_DPOT is not set
# CONFIG_DUMMY_IRQ is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
CONFIG_SGI_XP=m
CONFIG_HP_ILO=m
CONFIG_SGI_GRU=m
# CONFIG_SGI_GRU_DEBUG is not set
CONFIG_APDS9802ALS=m
CONFIG_ISL29003=m
CONFIG_ISL29020=m
CONFIG_SENSORS_TSL2550=m
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=m
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
CONFIG_VMWARE_BALLOON=m
# CONFIG_LATTICE_ECP3_CONFIG is not set
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
CONFIG_MISC_RTSX=m
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_AT25 is not set
CONFIG_EEPROM_LEGACY=m
CONFIG_EEPROM_MAX6875=m
CONFIG_EEPROM_93CX6=m
# CONFIG_EEPROM_93XX46 is not set
# CONFIG_EEPROM_IDT_89HPESX is not set
# CONFIG_EEPROM_EE1004 is not set
# end of EEPROM support

CONFIG_CB710_CORE=m
# CONFIG_CB710_DEBUG is not set
CONFIG_CB710_DEBUG_ASSUMPTIONS=y

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# end of Texas Instruments shared transport line discipline

CONFIG_SENSORS_LIS3_I2C=m
CONFIG_ALTERA_STAPL=m
CONFIG_INTEL_MEI=m
CONFIG_INTEL_MEI_ME=m
# CONFIG_INTEL_MEI_TXE is not set
# CONFIG_INTEL_MEI_HDCP is not set
# CONFIG_INTEL_MEI_PXP is not set
CONFIG_VMWARE_VMCI=m
# CONFIG_GENWQE is not set
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
CONFIG_MISC_RTSX_PCI=m
# CONFIG_MISC_RTSX_USB is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
CONFIG_PVPANIC=y
# CONFIG_PVPANIC_MMIO is not set
# CONFIG_PVPANIC_PCI is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI_COMMON=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
CONFIG_SCSI_MPT3SAS=m
CONFIG_SCSI_MPT2SAS_MAX_SGE=128
CONFIG_SCSI_MPT3SAS_MAX_SGE=128
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_MPI3MR is not set
# CONFIG_SCSI_SMARTPQI is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_UFS_HWMON is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_SCSI_MYRS is not set
# CONFIG_VMWARE_PVSCSI is not set
CONFIG_HYPERV_STORAGE=m
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
CONFIG_SCSI_ISCI=m
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_EFCT is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_VIRTIO is not set
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=y
CONFIG_SCSI_DH_HP_SW=y
CONFIG_SCSI_DH_EMC=y
CONFIG_SCSI_DH_ALUA=y
# end of SCSI device support

CONFIG_ATA=m
CONFIG_SATA_HOST=y
CONFIG_PATA_TIMINGS=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_FORCE=y
CONFIG_ATA_ACPI=y
# CONFIG_SATA_ZPODD is not set
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
CONFIG_SATA_AHCI=m
CONFIG_SATA_MOBILE_LPM_POLICY=0
CONFIG_SATA_AHCI_PLATFORM=m
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
CONFIG_ATA_PIIX=m
# CONFIG_SATA_DWC is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SCH is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
# CONFIG_PATA_ACPI is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
# CONFIG_MD_MULTIPATH is not set
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
# CONFIG_DM_UNSTRIPED is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_SMQ=m
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
CONFIG_DM_ERA=m
# CONFIG_DM_CLONE is not set
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
# CONFIG_DM_MULTIPATH_HST is not set
# CONFIG_DM_MULTIPATH_IOA is not set
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
# CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG is not set
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=m
CONFIG_DM_AUDIT=y
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
CONFIG_TCM_PSCSI=m
CONFIG_TCM_USER2=m
CONFIG_LOOPBACK_TARGET=m
CONFIG_ISCSI_TARGET=m
# CONFIG_SBP_TARGET is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_SBP2=m
CONFIG_FIREWIRE_NET=m
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_MII=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_WIREGUARD is not set
# CONFIG_EQUALIZER is not set
# CONFIG_NET_FC is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_IPVLAN is not set
# CONFIG_VXLAN is not set
# CONFIG_GENEVE is not set
# CONFIG_BAREUDP is not set
# CONFIG_GTP is not set
# CONFIG_AMT is not set
# CONFIG_MACSEC is not set
CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_TUN=m
# CONFIG_TUN_VNET_CROSS_LE is not set
# CONFIG_VETH is not set
CONFIG_VIRTIO_NET=m
# CONFIG_NLMON is not set
# CONFIG_NET_VRF is not set
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set
# CONFIG_ATM_LANAI is not set
# CONFIG_ATM_ENI is not set
# CONFIG_ATM_FIRESTREAM is not set
# CONFIG_ATM_ZATM is not set
# CONFIG_ATM_NICSTAR is not set
# CONFIG_ATM_IDT77252 is not set
# CONFIG_ATM_AMBASSADOR is not set
# CONFIG_ATM_HORIZON is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
# CONFIG_ATM_HE is not set
# CONFIG_ATM_SOLOS is not set
CONFIG_ETHERNET=y
CONFIG_MDIO=y
# CONFIG_NET_VENDOR_3COM is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_AGERE=y
# CONFIG_ET131X is not set
CONFIG_NET_VENDOR_ALACRITECH=y
# CONFIG_SLICOSS is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
# CONFIG_ALTERA_TSE is not set
CONFIG_NET_VENDOR_AMAZON=y
# CONFIG_ENA_ETHERNET is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_AMD_XGBE is not set
CONFIG_NET_VENDOR_AQUANTIA=y
# CONFIG_AQTION is not set
CONFIG_NET_VENDOR_ARC=y
CONFIG_NET_VENDOR_ASIX=y
# CONFIG_SPI_AX88796C is not set
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_ALX is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BCMGENET is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
# CONFIG_TIGON3 is not set
# CONFIG_BNX2X is not set
# CONFIG_SYSTEMPORT is not set
# CONFIG_BNXT is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
CONFIG_NET_VENDOR_CADENCE=y
# CONFIG_MACB is not set
CONFIG_NET_VENDOR_CAVIUM=y
# CONFIG_THUNDER_NIC_PF is not set
# CONFIG_THUNDER_NIC_VF is not set
# CONFIG_THUNDER_NIC_BGX is not set
# CONFIG_THUNDER_NIC_RGX is not set
CONFIG_CAVIUM_PTP=y
# CONFIG_LIQUIDIO is not set
# CONFIG_LIQUIDIO_VF is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
CONFIG_NET_VENDOR_CORTINA=y
# CONFIG_CX_ECAT is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
# CONFIG_NET_TULIP is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EZCHIP=y
CONFIG_NET_VENDOR_GOOGLE=y
# CONFIG_GVE is not set
CONFIG_NET_VENDOR_HUAWEI=y
# CONFIG_HINIC is not set
CONFIG_NET_VENDOR_I825XX=y
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_E1000E_HWTS=y
CONFIG_IGB=y
CONFIG_IGB_HWMON=y
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
CONFIG_IXGBE=y
CONFIG_IXGBE_HWMON=y
# CONFIG_IXGBE_DCB is not set
CONFIG_IXGBE_IPSEC=y
# CONFIG_IXGBEVF is not set
CONFIG_I40E=y
# CONFIG_I40E_DCB is not set
# CONFIG_I40EVF is not set
# CONFIG_ICE is not set
# CONFIG_FM10K is not set
CONFIG_IGC=y
CONFIG_NET_VENDOR_MICROSOFT=y
# CONFIG_MICROSOFT_MANA is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_LITEX=y
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_MVMDIO is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_PRESTERA is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX5_CORE is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8842 is not set
# CONFIG_KS8851 is not set
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MICROCHIP=y
# CONFIG_ENC28J60 is not set
# CONFIG_ENCX24J600 is not set
# CONFIG_LAN743X is not set
CONFIG_NET_VENDOR_MICROSEMI=y
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_NETERION=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_NETRONOME=y
# CONFIG_NFP is not set
CONFIG_NET_VENDOR_NI=y
# CONFIG_NI_XGE_MANAGEMENT_ENET is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_NE2K_PCI is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_ETHOC is not set
CONFIG_NET_VENDOR_PACKET_ENGINES=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_PENSANDO=y
# CONFIG_IONIC is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_QED is not set
CONFIG_NET_VENDOR_QUALCOMM=y
# CONFIG_QCOM_EMAC is not set
# CONFIG_RMNET is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_ATP is not set
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
CONFIG_R8169=y
CONFIG_NET_VENDOR_RENESAS=y
CONFIG_NET_VENDOR_ROCKER=y
# CONFIG_ROCKER is not set
CONFIG_NET_VENDOR_SAMSUNG=y
# CONFIG_SXGBE_ETH is not set
CONFIG_NET_VENDOR_SEEQ=y
CONFIG_NET_VENDOR_SOLARFLARE=y
# CONFIG_SFC is not set
# CONFIG_SFC_FALCON is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_EPIC100 is not set
# CONFIG_SMSC911X is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_SOCIONEXT=y
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_SYNOPSYS=y
# CONFIG_DWC_XLGMAC is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TI_CPSW_PHY_SEL is not set
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XILINX=y
# CONFIG_XILINX_EMACLITE is not set
# CONFIG_XILINX_AXI_EMAC is not set
# CONFIG_XILINX_LL_TEMAC is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y
CONFIG_SWPHY=y
# CONFIG_LED_TRIGGER_PHY is not set
CONFIG_FIXED_PHY=y

#
# MII PHY device drivers
#
# CONFIG_AMD_PHY is not set
# CONFIG_ADIN_PHY is not set
# CONFIG_AQUANTIA_PHY is not set
CONFIG_AX88796B_PHY=y
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM54140_PHY is not set
# CONFIG_BCM7XXX_PHY is not set
# CONFIG_BCM84881_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_CORTINA_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_INTEL_XWAY_PHY is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_MARVELL_10G_PHY is not set
# CONFIG_MARVELL_88X2222_PHY is not set
# CONFIG_MAXLINEAR_GPHY is not set
# CONFIG_MEDIATEK_GE_PHY is not set
# CONFIG_MICREL_PHY is not set
# CONFIG_MICROCHIP_PHY is not set
# CONFIG_MICROCHIP_T1_PHY is not set
# CONFIG_MICROSEMI_PHY is not set
# CONFIG_MOTORCOMM_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_NXP_C45_TJA11XX_PHY is not set
# CONFIG_NXP_TJA11XX_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_REALTEK_PHY=y
# CONFIG_RENESAS_PHY is not set
# CONFIG_ROCKCHIP_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_TERANETICS_PHY is not set
# CONFIG_DP83822_PHY is not set
# CONFIG_DP83TC811_PHY is not set
# CONFIG_DP83848_PHY is not set
# CONFIG_DP83867_PHY is not set
# CONFIG_DP83869_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_XILINX_GMII2RGMII is not set
# CONFIG_MICREL_KS8995MA is not set
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_FWNODE_MDIO=y
CONFIG_ACPI_MDIO=y
CONFIG_MDIO_DEVRES=y
# CONFIG_MDIO_BITBANG is not set
# CONFIG_MDIO_BCM_UNIMAC is not set
# CONFIG_MDIO_MVUSB is not set
# CONFIG_MDIO_MSCC_MIIM is not set
# CONFIG_MDIO_THUNDER is not set

#
# MDIO Multiplexers
#

#
# PCS device drivers
#
# CONFIG_PCS_XPCS is not set
# end of PCS device drivers

# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
CONFIG_USB_NET_DRIVERS=y
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
CONFIG_USB_RTL8152=y
# CONFIG_USB_LAN78XX is not set
CONFIG_USB_USBNET=y
CONFIG_USB_NET_AX8817X=y
CONFIG_USB_NET_AX88179_178A=y
# CONFIG_USB_NET_CDCETHER is not set
# CONFIG_USB_NET_CDC_EEM is not set
# CONFIG_USB_NET_CDC_NCM is not set
# CONFIG_USB_NET_HUAWEI_CDC_NCM is not set
# CONFIG_USB_NET_CDC_MBIM is not set
# CONFIG_USB_NET_DM9601 is not set
# CONFIG_USB_NET_SR9700 is not set
# CONFIG_USB_NET_SR9800 is not set
# CONFIG_USB_NET_SMSC75XX is not set
# CONFIG_USB_NET_SMSC95XX is not set
# CONFIG_USB_NET_GL620A is not set
# CONFIG_USB_NET_NET1080 is not set
# CONFIG_USB_NET_PLUSB is not set
# CONFIG_USB_NET_MCS7830 is not set
# CONFIG_USB_NET_RNDIS_HOST is not set
# CONFIG_USB_NET_CDC_SUBSET is not set
# CONFIG_USB_NET_ZAURUS is not set
# CONFIG_USB_NET_CX82310_ETH is not set
# CONFIG_USB_NET_KALMIA is not set
# CONFIG_USB_NET_QMI_WWAN is not set
# CONFIG_USB_HSO is not set
# CONFIG_USB_NET_INT51X1 is not set
# CONFIG_USB_IPHETH is not set
# CONFIG_USB_SIERRA_NET is not set
# CONFIG_USB_NET_CH9200 is not set
# CONFIG_USB_NET_AQC111 is not set
CONFIG_WLAN=y
CONFIG_WLAN_VENDOR_ADMTEK=y
# CONFIG_ADM8211 is not set
CONFIG_WLAN_VENDOR_ATH=y
# CONFIG_ATH_DEBUG is not set
# CONFIG_ATH5K is not set
# CONFIG_ATH5K_PCI is not set
# CONFIG_ATH9K is not set
# CONFIG_ATH9K_HTC is not set
# CONFIG_CARL9170 is not set
# CONFIG_ATH6KL is not set
# CONFIG_AR5523 is not set
# CONFIG_WIL6210 is not set
# CONFIG_ATH10K is not set
# CONFIG_WCN36XX is not set
# CONFIG_ATH11K is not set
CONFIG_WLAN_VENDOR_ATMEL=y
# CONFIG_ATMEL is not set
# CONFIG_AT76C50X_USB is not set
CONFIG_WLAN_VENDOR_BROADCOM=y
# CONFIG_B43 is not set
# CONFIG_B43LEGACY is not set
# CONFIG_BRCMSMAC is not set
# CONFIG_BRCMFMAC is not set
CONFIG_WLAN_VENDOR_CISCO=y
# CONFIG_AIRO is not set
CONFIG_WLAN_VENDOR_INTEL=y
# CONFIG_IPW2100 is not set
# CONFIG_IPW2200 is not set
# CONFIG_IWL4965 is not set
# CONFIG_IWL3945 is not set
# CONFIG_IWLWIFI is not set
CONFIG_WLAN_VENDOR_INTERSIL=y
# CONFIG_HOSTAP is not set
# CONFIG_HERMES is not set
# CONFIG_P54_COMMON is not set
CONFIG_WLAN_VENDOR_MARVELL=y
# CONFIG_LIBERTAS is not set
# CONFIG_LIBERTAS_THINFIRM is not set
# CONFIG_MWIFIEX is not set
# CONFIG_MWL8K is not set
# CONFIG_WLAN_VENDOR_MEDIATEK is not set
CONFIG_WLAN_VENDOR_MICROCHIP=y
# CONFIG_WILC1000_SDIO is not set
# CONFIG_WILC1000_SPI is not set
CONFIG_WLAN_VENDOR_RALINK=y
# CONFIG_RT2X00 is not set
CONFIG_WLAN_VENDOR_REALTEK=y
# CONFIG_RTL8180 is not set
# CONFIG_RTL8187 is not set
CONFIG_RTL_CARDS=m
# CONFIG_RTL8192CE is not set
# CONFIG_RTL8192SE is not set
# CONFIG_RTL8192DE is not set
# CONFIG_RTL8723AE is not set
# CONFIG_RTL8723BE is not set
# CONFIG_RTL8188EE is not set
# CONFIG_RTL8192EE is not set
# CONFIG_RTL8821AE is not set
# CONFIG_RTL8192CU is not set
# CONFIG_RTL8XXXU is not set
# CONFIG_RTW88 is not set
# CONFIG_RTW89 is not set
CONFIG_WLAN_VENDOR_RSI=y
# CONFIG_RSI_91X is not set
CONFIG_WLAN_VENDOR_ST=y
# CONFIG_CW1200 is not set
CONFIG_WLAN_VENDOR_TI=y
# CONFIG_WL1251 is not set
# CONFIG_WL12XX is not set
# CONFIG_WL18XX is not set
# CONFIG_WLCORE is not set
CONFIG_WLAN_VENDOR_ZYDAS=y
# CONFIG_USB_ZD1201 is not set
# CONFIG_ZD1211RW is not set
CONFIG_WLAN_VENDOR_QUANTENNA=y
# CONFIG_QTNFMAC_PCIE is not set
# CONFIG_MAC80211_HWSIM is not set
# CONFIG_USB_NET_RNDIS_WLAN is not set
# CONFIG_VIRT_WIFI is not set
# CONFIG_WAN is not set
CONFIG_IEEE802154_DRIVERS=m
# CONFIG_IEEE802154_FAKELB is not set
# CONFIG_IEEE802154_AT86RF230 is not set
# CONFIG_IEEE802154_MRF24J40 is not set
# CONFIG_IEEE802154_CC2520 is not set
# CONFIG_IEEE802154_ATUSB is not set
# CONFIG_IEEE802154_ADF7242 is not set
# CONFIG_IEEE802154_CA8210 is not set
# CONFIG_IEEE802154_MCR20A is not set
# CONFIG_IEEE802154_HWSIM is not set

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_FUJITSU_ES is not set
# CONFIG_HYPERV_NET is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=m
# CONFIG_ISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_LEDS=y
CONFIG_INPUT_FF_MEMLESS=m
CONFIG_INPUT_SPARSEKMAP=m
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
# CONFIG_KEYBOARD_APPLESPI is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1050 is not set
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_DLINK_DIR685 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_SAMSUNG is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_TM2_TOUCHKEY is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_CYPRESS_SF is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_BYD=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_SYNAPTICS_SMBUS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_ELANTECH_SMBUS=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_PS2_FOCALTECH=y
CONFIG_MOUSE_PS2_VMMOUSE=y
CONFIG_MOUSE_PS2_SMBUS=y
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_CYAPA=m
CONFIG_MOUSE_ELAN_I2C=m
CONFIG_MOUSE_ELAN_I2C_I2C=y
CONFIG_MOUSE_ELAN_I2C_SMBUS=y
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_MOUSE_GPIO is not set
CONFIG_MOUSE_SYNAPTICS_I2C=m
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
CONFIG_RMI4_CORE=m
CONFIG_RMI4_I2C=m
CONFIG_RMI4_SPI=m
CONFIG_RMI4_SMB=m
CONFIG_RMI4_F03=y
CONFIG_RMI4_F03_SERIO=m
CONFIG_RMI4_2D_SENSOR=y
CONFIG_RMI4_F11=y
CONFIG_RMI4_F12=y
CONFIG_RMI4_F30=y
CONFIG_RMI4_F34=y
# CONFIG_RMI4_F3A is not set
# CONFIG_RMI4_F54 is not set
CONFIG_RMI4_F55=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_SERIO_ALTERA_PS2=m
# CONFIG_SERIO_PS2MULT is not set
CONFIG_SERIO_ARC_PS2=m
CONFIG_HYPERV_KEYBOARD=m
# CONFIG_SERIO_GPIO_PS2 is not set
# CONFIG_USERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_PNP=y
# CONFIG_SERIAL_8250_16550A_VARIANTS is not set
# CONFIG_SERIAL_8250_FINTEK is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_DMA=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_EXAR=y
CONFIG_SERIAL_8250_NR_UARTS=64
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
CONFIG_SERIAL_8250_RSA=y
CONFIG_SERIAL_8250_DWLIB=y
CONFIG_SERIAL_8250_DW=y
# CONFIG_SERIAL_8250_RT288X is not set
CONFIG_SERIAL_8250_LPSS=y
CONFIG_SERIAL_8250_MID=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MAX3100 is not set
# CONFIG_SERIAL_MAX310X is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
# CONFIG_SERIAL_LANTIQ is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_BCM63XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
CONFIG_SERIAL_ARC=m
CONFIG_SERIAL_ARC_NR_PORTS=1
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# CONFIG_SERIAL_SPRD is not set
# end of Serial drivers

CONFIG_SERIAL_MCTRL_GPIO=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
CONFIG_SYNCLINK_GT=m
CONFIG_N_HDLC=m
CONFIG_N_GSM=m
CONFIG_NOZOMI=m
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
# CONFIG_SERIAL_DEV_BUS is not set
CONFIG_PRINTER=m
# CONFIG_LP_CONSOLE is not set
CONFIG_PPDEV=m
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DMI_DECODE=y
CONFIG_IPMI_PLAT_DATA=y
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=y
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
CONFIG_DEVMEM=y
CONFIG_NVRAM=y
CONFIG_DEVPORT=y
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HPET_MMAP_DEFAULT is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_UV_MMTIMER=m
CONFIG_TCG_TPM=y
CONFIG_HW_RANDOM_TPM=y
CONFIG_TCG_TIS_CORE=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_SPI is not set
# CONFIG_TCG_TIS_I2C_CR50 is not set
CONFIG_TCG_TIS_I2C_ATMEL=m
CONFIG_TCG_TIS_I2C_INFINEON=m
CONFIG_TCG_TIS_I2C_NUVOTON=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
CONFIG_TCG_CRB=y
# CONFIG_TCG_VTPM_PROXY is not set
CONFIG_TCG_TIS_ST33ZP24=m
CONFIG_TCG_TIS_ST33ZP24_I2C=m
# CONFIG_TCG_TIS_ST33ZP24_SPI is not set
CONFIG_TELCLOCK=m
# CONFIG_XILLYBUS is not set
# CONFIG_XILLYUSB is not set
# CONFIG_RANDOM_TRUST_CPU is not set
# CONFIG_RANDOM_TRUST_BOOTLOADER is not set
# end of Character devices

#
# I2C support
#
CONFIG_I2C=y
CONFIG_ACPI_I2C_OPREGION=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_MUX=m

#
# Multiplexer I2C Chip support
#
# CONFIG_I2C_MUX_GPIO is not set
# CONFIG_I2C_MUX_LTC4306 is not set
# CONFIG_I2C_MUX_PCA9541 is not set
# CONFIG_I2C_MUX_PCA954x is not set
# CONFIG_I2C_MUX_REG is not set
CONFIG_I2C_MUX_MLXCPLD=m
# end of Multiplexer I2C Chip support

CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_SMBUS=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
# CONFIG_I2C_AMD_MP2 is not set
CONFIG_I2C_I801=y
CONFIG_I2C_ISCH=m
CONFIG_I2C_ISMT=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
# CONFIG_I2C_NVIDIA_GPU is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# ACPI drivers
#
CONFIG_I2C_SCMI=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_CBUS_GPIO is not set
CONFIG_I2C_DESIGNWARE_CORE=m
# CONFIG_I2C_DESIGNWARE_SLAVE is not set
CONFIG_I2C_DESIGNWARE_PLATFORM=m
CONFIG_I2C_DESIGNWARE_BAYTRAIL=y
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EMEV2 is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_CP2615 is not set
CONFIG_I2C_PARPORT=m
# CONFIG_I2C_ROBOTFUZZ_OSIF is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_MLXCPLD=m
# CONFIG_I2C_VIRTIO is not set
# end of I2C Hardware Bus support

CONFIG_I2C_STUB=m
# CONFIG_I2C_SLAVE is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# end of I2C support

# CONFIG_I3C is not set
CONFIG_SPI=y
# CONFIG_SPI_DEBUG is not set
CONFIG_SPI_MASTER=y
# CONFIG_SPI_MEM is not set

#
# SPI Master Controller Drivers
#
# CONFIG_SPI_ALTERA is not set
# CONFIG_SPI_AXI_SPI_ENGINE is not set
# CONFIG_SPI_BITBANG is not set
# CONFIG_SPI_BUTTERFLY is not set
# CONFIG_SPI_CADENCE is not set
# CONFIG_SPI_DESIGNWARE is not set
# CONFIG_SPI_NXP_FLEXSPI is not set
# CONFIG_SPI_GPIO is not set
# CONFIG_SPI_LM70_LLP is not set
# CONFIG_SPI_LANTIQ_SSC is not set
# CONFIG_SPI_OC_TINY is not set
# CONFIG_SPI_PXA2XX is not set
# CONFIG_SPI_ROCKCHIP is not set
# CONFIG_SPI_SC18IS602 is not set
# CONFIG_SPI_SIFIVE is not set
# CONFIG_SPI_MXIC is not set
# CONFIG_SPI_XCOMM is not set
# CONFIG_SPI_XILINX is not set
# CONFIG_SPI_ZYNQMP_GQSPI is not set
# CONFIG_SPI_AMD is not set

#
# SPI Multiplexer support
#
# CONFIG_SPI_MUX is not set

#
# SPI Protocol Masters
#
# CONFIG_SPI_SPIDEV is not set
# CONFIG_SPI_LOOPBACK_TEST is not set
# CONFIG_SPI_TLE62X0 is not set
# CONFIG_SPI_SLAVE is not set
CONFIG_SPI_DYNAMIC=y
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
CONFIG_PPS=y
# CONFIG_PPS_DEBUG is not set

#
# PPS clients support
#
# CONFIG_PPS_CLIENT_KTIMER is not set
CONFIG_PPS_CLIENT_LDISC=m
CONFIG_PPS_CLIENT_PARPORT=m
CONFIG_PPS_CLIENT_GPIO=m

#
# PPS generators support
#

#
# PTP clock support
#
CONFIG_PTP_1588_CLOCK=y
CONFIG_PTP_1588_CLOCK_OPTIONAL=y
# CONFIG_DP83640_PHY is not set
# CONFIG_PTP_1588_CLOCK_INES is not set
CONFIG_PTP_1588_CLOCK_KVM=m
# CONFIG_PTP_1588_CLOCK_IDT82P33 is not set
# CONFIG_PTP_1588_CLOCK_IDTCM is not set
# CONFIG_PTP_1588_CLOCK_VMW is not set
# end of PTP clock support

CONFIG_PINCTRL=y
CONFIG_PINMUX=y
CONFIG_PINCONF=y
CONFIG_GENERIC_PINCONF=y
# CONFIG_DEBUG_PINCTRL is not set
CONFIG_PINCTRL_AMD=m
# CONFIG_PINCTRL_MCP23S08 is not set
# CONFIG_PINCTRL_SX150X is not set

#
# Intel pinctrl drivers
#
CONFIG_PINCTRL_BAYTRAIL=y
# CONFIG_PINCTRL_CHERRYVIEW is not set
# CONFIG_PINCTRL_LYNXPOINT is not set
CONFIG_PINCTRL_INTEL=y
# CONFIG_PINCTRL_ALDERLAKE is not set
CONFIG_PINCTRL_BROXTON=m
CONFIG_PINCTRL_CANNONLAKE=m
CONFIG_PINCTRL_CEDARFORK=m
CONFIG_PINCTRL_DENVERTON=m
# CONFIG_PINCTRL_ELKHARTLAKE is not set
# CONFIG_PINCTRL_EMMITSBURG is not set
CONFIG_PINCTRL_GEMINILAKE=m
# CONFIG_PINCTRL_ICELAKE is not set
# CONFIG_PINCTRL_JASPERLAKE is not set
# CONFIG_PINCTRL_LAKEFIELD is not set
CONFIG_PINCTRL_LEWISBURG=m
CONFIG_PINCTRL_SUNRISEPOINT=m
# CONFIG_PINCTRL_TIGERLAKE is not set
# end of Intel pinctrl drivers

#
# Renesas pinctrl drivers
#
# end of Renesas pinctrl drivers

CONFIG_GPIOLIB=y
CONFIG_GPIOLIB_FASTPATH_LIMIT=512
CONFIG_GPIO_ACPI=y
CONFIG_GPIOLIB_IRQCHIP=y
# CONFIG_DEBUG_GPIO is not set
CONFIG_GPIO_CDEV=y
CONFIG_GPIO_CDEV_V1=y
CONFIG_GPIO_GENERIC=m

#
# Memory mapped GPIO drivers
#
CONFIG_GPIO_AMDPT=m
# CONFIG_GPIO_DWAPB is not set
# CONFIG_GPIO_EXAR is not set
# CONFIG_GPIO_GENERIC_PLATFORM is not set
CONFIG_GPIO_ICH=m
# CONFIG_GPIO_MB86S7X is not set
# CONFIG_GPIO_VX855 is not set
# CONFIG_GPIO_AMD_FCH is not set
# end of Memory mapped GPIO drivers

#
# Port-mapped I/O GPIO drivers
#
# CONFIG_GPIO_F7188X is not set
# CONFIG_GPIO_IT87 is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_SCH311X is not set
# CONFIG_GPIO_WINBOND is not set
# CONFIG_GPIO_WS16C48 is not set
# end of Port-mapped I/O GPIO drivers

#
# I2C GPIO expanders
#
# CONFIG_GPIO_ADP5588 is not set
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCA9570 is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_TPIC2810 is not set
# end of I2C GPIO expanders

#
# MFD GPIO expanders
#
# end of MFD GPIO expanders

#
# PCI GPIO expanders
#
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_PCI_IDIO_16 is not set
# CONFIG_GPIO_PCIE_IDIO_24 is not set
# CONFIG_GPIO_RDC321X is not set
# end of PCI GPIO expanders

#
# SPI GPIO expanders
#
# CONFIG_GPIO_MAX3191X is not set
# CONFIG_GPIO_MAX7301 is not set
# CONFIG_GPIO_MC33880 is not set
# CONFIG_GPIO_PISOSR is not set
# CONFIG_GPIO_XRA1403 is not set
# end of SPI GPIO expanders

#
# USB GPIO expanders
#
# end of USB GPIO expanders

#
# Virtual GPIO drivers
#
# CONFIG_GPIO_AGGREGATOR is not set
# CONFIG_GPIO_MOCKUP is not set
# CONFIG_GPIO_VIRTIO is not set
# end of Virtual GPIO drivers

# CONFIG_W1 is not set
CONFIG_POWER_RESET=y
# CONFIG_POWER_RESET_RESTART is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_POWER_SUPPLY_HWMON=y
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_CHARGER_ADP5061 is not set
# CONFIG_BATTERY_CW2015 is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_CHARGER_SBS is not set
# CONFIG_MANAGER_SBS is not set
# CONFIG_BATTERY_BQ27XXX is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_LT3651 is not set
# CONFIG_CHARGER_LTC4162L is not set
# CONFIG_CHARGER_BQ2415X is not set
# CONFIG_CHARGER_BQ24257 is not set
# CONFIG_CHARGER_BQ24735 is not set
# CONFIG_CHARGER_BQ2515X is not set
# CONFIG_CHARGER_BQ25890 is not set
# CONFIG_CHARGER_BQ25980 is not set
# CONFIG_CHARGER_BQ256XX is not set
# CONFIG_BATTERY_GAUGE_LTC2941 is not set
# CONFIG_BATTERY_GOLDFISH is not set
# CONFIG_BATTERY_RT5033 is not set
# CONFIG_CHARGER_RT9455 is not set
# CONFIG_CHARGER_BD99954 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
CONFIG_SENSORS_ABITUGURU=m
CONFIG_SENSORS_ABITUGURU3=m
# CONFIG_SENSORS_AD7314 is not set
CONFIG_SENSORS_AD7414=m
CONFIG_SENSORS_AD7418=m
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
CONFIG_SENSORS_ADM1029=m
CONFIG_SENSORS_ADM1031=m
# CONFIG_SENSORS_ADM1177 is not set
CONFIG_SENSORS_ADM9240=m
CONFIG_SENSORS_ADT7X10=m
# CONFIG_SENSORS_ADT7310 is not set
CONFIG_SENSORS_ADT7410=m
CONFIG_SENSORS_ADT7411=m
CONFIG_SENSORS_ADT7462=m
CONFIG_SENSORS_ADT7470=m
CONFIG_SENSORS_ADT7475=m
# CONFIG_SENSORS_AHT10 is not set
# CONFIG_SENSORS_AQUACOMPUTER_D5NEXT is not set
# CONFIG_SENSORS_AS370 is not set
CONFIG_SENSORS_ASC7621=m
# CONFIG_SENSORS_AXI_FAN_CONTROL is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_K10TEMP=m
CONFIG_SENSORS_FAM15H_POWER=m
CONFIG_SENSORS_APPLESMC=m
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ASPEED is not set
CONFIG_SENSORS_ATXP1=m
# CONFIG_SENSORS_CORSAIR_CPRO is not set
# CONFIG_SENSORS_CORSAIR_PSU is not set
# CONFIG_SENSORS_DRIVETEMP is not set
CONFIG_SENSORS_DS620=m
CONFIG_SENSORS_DS1621=m
CONFIG_SENSORS_DELL_SMM=m
CONFIG_SENSORS_I5K_AMB=m
CONFIG_SENSORS_F71805F=m
CONFIG_SENSORS_F71882FG=m
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_FSCHMD=m
# CONFIG_SENSORS_FTSTEUTATES is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
CONFIG_SENSORS_G760A=m
# CONFIG_SENSORS_G762 is not set
# CONFIG_SENSORS_HIH6130 is not set
CONFIG_SENSORS_IBMAEM=m
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
# CONFIG_SENSORS_POWR1220 is not set
CONFIG_SENSORS_LINEAGE=m
# CONFIG_SENSORS_LTC2945 is not set
# CONFIG_SENSORS_LTC2947_I2C is not set
# CONFIG_SENSORS_LTC2947_SPI is not set
# CONFIG_SENSORS_LTC2990 is not set
# CONFIG_SENSORS_LTC2992 is not set
CONFIG_SENSORS_LTC4151=m
CONFIG_SENSORS_LTC4215=m
# CONFIG_SENSORS_LTC4222 is not set
CONFIG_SENSORS_LTC4245=m
# CONFIG_SENSORS_LTC4260 is not set
CONFIG_SENSORS_LTC4261=m
# CONFIG_SENSORS_MAX1111 is not set
# CONFIG_SENSORS_MAX127 is not set
CONFIG_SENSORS_MAX16065=m
CONFIG_SENSORS_MAX1619=m
CONFIG_SENSORS_MAX1668=m
CONFIG_SENSORS_MAX197=m
# CONFIG_SENSORS_MAX31722 is not set
# CONFIG_SENSORS_MAX31730 is not set
# CONFIG_SENSORS_MAX6620 is not set
# CONFIG_SENSORS_MAX6621 is not set
CONFIG_SENSORS_MAX6639=m
CONFIG_SENSORS_MAX6642=m
CONFIG_SENSORS_MAX6650=m
CONFIG_SENSORS_MAX6697=m
# CONFIG_SENSORS_MAX31790 is not set
CONFIG_SENSORS_MCP3021=m
# CONFIG_SENSORS_MLXREG_FAN is not set
# CONFIG_SENSORS_TC654 is not set
# CONFIG_SENSORS_TPS23861 is not set
# CONFIG_SENSORS_MR75203 is not set
# CONFIG_SENSORS_ADCXX is not set
CONFIG_SENSORS_LM63=m
# CONFIG_SENSORS_LM70 is not set
CONFIG_SENSORS_LM73=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
CONFIG_SENSORS_LM95234=m
CONFIG_SENSORS_LM95241=m
CONFIG_SENSORS_LM95245=m
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_NTC_THERMISTOR=m
# CONFIG_SENSORS_NCT6683 is not set
CONFIG_SENSORS_NCT6775=m
# CONFIG_SENSORS_NCT7802 is not set
# CONFIG_SENSORS_NCT7904 is not set
# CONFIG_SENSORS_NPCM7XX is not set
# CONFIG_SENSORS_NZXT_KRAKEN2 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_PMBUS=m
CONFIG_SENSORS_PMBUS=m
# CONFIG_SENSORS_ADM1266 is not set
CONFIG_SENSORS_ADM1275=m
# CONFIG_SENSORS_BEL_PFE is not set
# CONFIG_SENSORS_BPA_RS600 is not set
# CONFIG_SENSORS_FSP_3Y is not set
# CONFIG_SENSORS_IBM_CFFPS is not set
# CONFIG_SENSORS_DPS920AB is not set
# CONFIG_SENSORS_INSPUR_IPSPS is not set
# CONFIG_SENSORS_IR35221 is not set
# CONFIG_SENSORS_IR36021 is not set
# CONFIG_SENSORS_IR38064 is not set
# CONFIG_SENSORS_IRPS5401 is not set
# CONFIG_SENSORS_ISL68137 is not set
CONFIG_SENSORS_LM25066=m
CONFIG_SENSORS_LTC2978=m
# CONFIG_SENSORS_LTC3815 is not set
# CONFIG_SENSORS_MAX15301 is not set
CONFIG_SENSORS_MAX16064=m
# CONFIG_SENSORS_MAX16601 is not set
# CONFIG_SENSORS_MAX20730 is not set
# CONFIG_SENSORS_MAX20751 is not set
# CONFIG_SENSORS_MAX31785 is not set
CONFIG_SENSORS_MAX34440=m
CONFIG_SENSORS_MAX8688=m
# CONFIG_SENSORS_MP2888 is not set
# CONFIG_SENSORS_MP2975 is not set
# CONFIG_SENSORS_PIM4328 is not set
# CONFIG_SENSORS_PM6764TR is not set
# CONFIG_SENSORS_PXE1610 is not set
# CONFIG_SENSORS_Q54SJ108A2 is not set
# CONFIG_SENSORS_STPDDC60 is not set
# CONFIG_SENSORS_TPS40422 is not set
# CONFIG_SENSORS_TPS53679 is not set
CONFIG_SENSORS_UCD9000=m
CONFIG_SENSORS_UCD9200=m
# CONFIG_SENSORS_XDPE122 is not set
CONFIG_SENSORS_ZL6100=m
# CONFIG_SENSORS_SBTSI is not set
# CONFIG_SENSORS_SBRMI is not set
CONFIG_SENSORS_SHT15=m
CONFIG_SENSORS_SHT21=m
# CONFIG_SENSORS_SHT3x is not set
# CONFIG_SENSORS_SHT4x is not set
# CONFIG_SENSORS_SHTC1 is not set
CONFIG_SENSORS_SIS5595=m
CONFIG_SENSORS_DME1737=m
CONFIG_SENSORS_EMC1403=m
# CONFIG_SENSORS_EMC2103 is not set
CONFIG_SENSORS_EMC6W201=m
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
CONFIG_SENSORS_SCH56XX_COMMON=m
CONFIG_SENSORS_SCH5627=m
CONFIG_SENSORS_SCH5636=m
# CONFIG_SENSORS_STTS751 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_ADC128D818 is not set
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_ADS7871 is not set
CONFIG_SENSORS_AMC6821=m
CONFIG_SENSORS_INA209=m
CONFIG_SENSORS_INA2XX=m
# CONFIG_SENSORS_INA3221 is not set
# CONFIG_SENSORS_TC74 is not set
CONFIG_SENSORS_THMC50=m
CONFIG_SENSORS_TMP102=m
# CONFIG_SENSORS_TMP103 is not set
# CONFIG_SENSORS_TMP108 is not set
CONFIG_SENSORS_TMP401=m
CONFIG_SENSORS_TMP421=m
# CONFIG_SENSORS_TMP513 is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83773G is not set
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83795=m
# CONFIG_SENSORS_W83795_FANCTRL is not set
CONFIG_SENSORS_W83L785TS=m
CONFIG_SENSORS_W83L786NG=m
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
# CONFIG_SENSORS_XGENE is not set

#
# ACPI drivers
#
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SENSORS_ATK0110=m
CONFIG_THERMAL=y
# CONFIG_THERMAL_NETLINK is not set
# CONFIG_THERMAL_STATISTICS is not set
CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS=0
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_WRITABLE_TRIPS=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
# CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set
# CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_THERMAL_GOV_USER_SPACE=y
# CONFIG_THERMAL_EMULATION is not set

#
# Intel thermal drivers
#
CONFIG_INTEL_POWERCLAMP=m
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_X86_PKG_TEMP_THERMAL=m
# CONFIG_INTEL_SOC_DTS_THERMAL is not set

#
# ACPI INT340X thermal drivers
#
# CONFIG_INT340X_THERMAL is not set
# end of ACPI INT340X thermal drivers

CONFIG_INTEL_PCH_THERMAL=m
# CONFIG_INTEL_TCC_COOLING is not set
# CONFIG_INTEL_MENLOW is not set
# end of Intel thermal drivers

CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
CONFIG_WATCHDOG_SYSFS=y
# CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT is not set

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
CONFIG_WDAT_WDT=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_ZIIRAVE_WATCHDOG is not set
# CONFIG_MLX_WDT is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_EBC_C384_WDT is not set
CONFIG_F71808E_WDT=m
CONFIG_SP5100_TCO=m
CONFIG_SBC_FITPC2_WATCHDOG=m
# CONFIG_EUROTECH_WDT is not set
CONFIG_IB700_WDT=m
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=y
CONFIG_IE6XX_WDT=m
CONFIG_ITCO_WDT=y
CONFIG_ITCO_VENDOR_SUPPORT=y
CONFIG_IT8712F_WDT=m
CONFIG_IT87_WDT=m
CONFIG_HP_WATCHDOG=m
CONFIG_HPWDT_NMI_DECODING=y
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
CONFIG_NV_TCO=m
# CONFIG_60XX_WDT is not set
# CONFIG_CPU5_WDT is not set
CONFIG_SMSC_SCH311X_WDT=m
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_TQMX86_WDT is not set
CONFIG_VIA_WDT=m
CONFIG_W83627HF_WDT=m
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
CONFIG_INTEL_MEI_WDT=m
# CONFIG_NI903X_WDT is not set
# CONFIG_NIC7018_WDT is not set
# CONFIG_MEN_A21_WDT is not set

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
CONFIG_BCMA=m
CONFIG_BCMA_HOST_PCI_POSSIBLE=y
CONFIG_BCMA_HOST_PCI=y
# CONFIG_BCMA_HOST_SOC is not set
CONFIG_BCMA_DRIVER_PCI=y
CONFIG_BCMA_DRIVER_GMAC_CMN=y
CONFIG_BCMA_DRIVER_GPIO=y
# CONFIG_BCMA_DEBUG is not set

#
# Multifunction device drivers
#
CONFIG_MFD_CORE=y
# CONFIG_MFD_AS3711 is not set
# CONFIG_PMIC_ADP5520 is not set
# CONFIG_MFD_AAT2870_CORE is not set
# CONFIG_MFD_BCM590XX is not set
# CONFIG_MFD_BD9571MWV is not set
# CONFIG_MFD_AXP20X_I2C is not set
# CONFIG_MFD_MADERA is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_DA9052_SPI is not set
# CONFIG_MFD_DA9052_I2C is not set
# CONFIG_MFD_DA9055 is not set
# CONFIG_MFD_DA9062 is not set
# CONFIG_MFD_DA9063 is not set
# CONFIG_MFD_DA9150 is not set
# CONFIG_MFD_DLN2 is not set
# CONFIG_MFD_MC13XXX_SPI is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_MFD_MP2629 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_HTC_I2CPLD is not set
# CONFIG_MFD_INTEL_QUARK_I2C_GPIO is not set
CONFIG_LPC_ICH=y
CONFIG_LPC_SCH=m
# CONFIG_INTEL_SOC_PMIC_CHTDC_TI is not set
CONFIG_MFD_INTEL_LPSS=y
CONFIG_MFD_INTEL_LPSS_ACPI=y
CONFIG_MFD_INTEL_LPSS_PCI=y
# CONFIG_MFD_INTEL_PMC_BXT is not set
# CONFIG_MFD_INTEL_PMT is not set
# CONFIG_MFD_IQS62X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_88PM800 is not set
# CONFIG_MFD_88PM805 is not set
# CONFIG_MFD_88PM860X is not set
# CONFIG_MFD_MAX14577 is not set
# CONFIG_MFD_MAX77693 is not set
# CONFIG_MFD_MAX77843 is not set
# CONFIG_MFD_MAX8907 is not set
# CONFIG_MFD_MAX8925 is not set
# CONFIG_MFD_MAX8997 is not set
# CONFIG_MFD_MAX8998 is not set
# CONFIG_MFD_MT6360 is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_MENF21BMC is not set
# CONFIG_EZX_PCAP is not set
# CONFIG_MFD_VIPERBOARD is not set
# CONFIG_MFD_RETU is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_RT4831 is not set
# CONFIG_MFD_RT5033 is not set
# CONFIG_MFD_RC5T583 is not set
# CONFIG_MFD_SI476X_CORE is not set
CONFIG_MFD_SM501=m
CONFIG_MFD_SM501_GPIO=y
# CONFIG_MFD_SKY81452 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_LP3943 is not set
# CONFIG_MFD_LP8788 is not set
# CONFIG_MFD_TI_LMU is not set
# CONFIG_MFD_PALMAS is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65086 is not set
# CONFIG_MFD_TPS65090 is not set
# CONFIG_MFD_TI_LP873X is not set
# CONFIG_MFD_TPS6586X is not set
# CONFIG_MFD_TPS65910 is not set
# CONFIG_MFD_TPS65912_I2C is not set
# CONFIG_MFD_TPS65912_SPI is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_TWL6040_CORE is not set
# CONFIG_MFD_WL1273_CORE is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_MFD_TQMX86 is not set
CONFIG_MFD_VX855=m
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_ARIZONA_SPI is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM831X_I2C is not set
# CONFIG_MFD_WM831X_SPI is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_WM8994 is not set
# CONFIG_MFD_ATC260X_I2C is not set
# CONFIG_MFD_INTEL_M10_BMC is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
CONFIG_RC_CORE=m
CONFIG_RC_MAP=m
CONFIG_LIRC=y
CONFIG_RC_DECODERS=y
CONFIG_IR_NEC_DECODER=m
CONFIG_IR_RC5_DECODER=m
CONFIG_IR_RC6_DECODER=m
CONFIG_IR_JVC_DECODER=m
CONFIG_IR_SONY_DECODER=m
CONFIG_IR_SANYO_DECODER=m
# CONFIG_IR_SHARP_DECODER is not set
CONFIG_IR_MCE_KBD_DECODER=m
# CONFIG_IR_XMP_DECODER is not set
CONFIG_IR_IMON_DECODER=m
# CONFIG_IR_RCMM_DECODER is not set
CONFIG_RC_DEVICES=y
# CONFIG_RC_ATI_REMOTE is not set
CONFIG_IR_ENE=m
# CONFIG_IR_IMON is not set
# CONFIG_IR_IMON_RAW is not set
# CONFIG_IR_MCEUSB is not set
CONFIG_IR_ITE_CIR=m
CONFIG_IR_FINTEK=m
CONFIG_IR_NUVOTON=m
# CONFIG_IR_REDRAT3 is not set
# CONFIG_IR_STREAMZAP is not set
CONFIG_IR_WINBOND_CIR=m
# CONFIG_IR_IGORPLUGUSB is not set
# CONFIG_IR_IGUANA is not set
# CONFIG_IR_TTUSBIR is not set
# CONFIG_RC_LOOPBACK is not set
CONFIG_IR_SERIAL=m
CONFIG_IR_SERIAL_TRANSMITTER=y
# CONFIG_RC_XBOX_DVD is not set
# CONFIG_IR_TOY is not set

#
# CEC support
#
CONFIG_MEDIA_CEC_SUPPORT=y
# CONFIG_CEC_CH7322 is not set
# CONFIG_CEC_SECO is not set
# CONFIG_USB_PULSE8_CEC is not set
# CONFIG_USB_RAINSHADOW_CEC is not set
# end of CEC support

CONFIG_MEDIA_SUPPORT=m
# CONFIG_MEDIA_SUPPORT_FILTER is not set
# CONFIG_MEDIA_SUBDRV_AUTOSELECT is not set

#
# Media device types
#
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_RADIO_SUPPORT=y
CONFIG_MEDIA_SDR_SUPPORT=y
CONFIG_MEDIA_PLATFORM_SUPPORT=y
CONFIG_MEDIA_TEST_SUPPORT=y
# end of Media device types

#
# Media core support
#
CONFIG_VIDEO_DEV=m
CONFIG_MEDIA_CONTROLLER=y
CONFIG_DVB_CORE=m
# end of Media core support

#
# Video4Linux options
#
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L2_I2C=y
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# end of Video4Linux options

#
# Media controller options
#
# CONFIG_MEDIA_CONTROLLER_DVB is not set
# end of Media controller options

#
# Digital TV options
#
# CONFIG_DVB_MMAP is not set
CONFIG_DVB_NET=y
CONFIG_DVB_MAX_ADAPTERS=16
CONFIG_DVB_DYNAMIC_MINORS=y
# CONFIG_DVB_DEMUX_SECTION_LOSS_LOG is not set
# CONFIG_DVB_ULE_DEBUG is not set
# end of Digital TV options

#
# Media drivers
#
# CONFIG_MEDIA_USB_SUPPORT is not set
# CONFIG_MEDIA_PCI_SUPPORT is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_SI470X is not set
# CONFIG_RADIO_SI4713 is not set
# CONFIG_USB_MR800 is not set
# CONFIG_USB_DSBR is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_SHARK is not set
# CONFIG_RADIO_SHARK2 is not set
# CONFIG_USB_KEENE is not set
# CONFIG_USB_RAREMONO is not set
# CONFIG_USB_MA901 is not set
# CONFIG_RADIO_TEA5764 is not set
# CONFIG_RADIO_SAA7706H is not set
# CONFIG_RADIO_TEF6862 is not set
# CONFIG_RADIO_WL1273 is not set
CONFIG_VIDEOBUF2_CORE=m
CONFIG_VIDEOBUF2_V4L2=m
CONFIG_VIDEOBUF2_MEMOPS=m
CONFIG_VIDEOBUF2_VMALLOC=m
# CONFIG_V4L_PLATFORM_DRIVERS is not set
# CONFIG_V4L_MEM2MEM_DRIVERS is not set
# CONFIG_DVB_PLATFORM_DRIVERS is not set
# CONFIG_SDR_PLATFORM_DRIVERS is not set

#
# MMC/SDIO DVB adapters
#
# CONFIG_SMS_SDIO_DRV is not set
# CONFIG_V4L_TEST_DRIVERS is not set
# CONFIG_DVB_TEST_DRIVERS is not set

#
# FireWire (IEEE 1394) Adapters
#
# CONFIG_DVB_FIREDTV is not set
# end of Media drivers

#
# Media ancillary drivers
#
CONFIG_MEDIA_ATTACH=y
CONFIG_VIDEO_IR_I2C=m

#
# Audio decoders, processors and mixers
#
# CONFIG_VIDEO_TVAUDIO is not set
# CONFIG_VIDEO_TDA7432 is not set
# CONFIG_VIDEO_TDA9840 is not set
# CONFIG_VIDEO_TEA6415C is not set
# CONFIG_VIDEO_TEA6420 is not set
# CONFIG_VIDEO_MSP3400 is not set
# CONFIG_VIDEO_CS3308 is not set
# CONFIG_VIDEO_CS5345 is not set
# CONFIG_VIDEO_CS53L32A is not set
# CONFIG_VIDEO_TLV320AIC23B is not set
# CONFIG_VIDEO_UDA1342 is not set
# CONFIG_VIDEO_WM8775 is not set
# CONFIG_VIDEO_WM8739 is not set
# CONFIG_VIDEO_VP27SMPX is not set
# CONFIG_VIDEO_SONY_BTF_MPX is not set
# end of Audio decoders, processors and mixers

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set
# end of RDS decoders

#
# Video decoders
#
# CONFIG_VIDEO_ADV7180 is not set
# CONFIG_VIDEO_ADV7183 is not set
# CONFIG_VIDEO_ADV7604 is not set
# CONFIG_VIDEO_ADV7842 is not set
# CONFIG_VIDEO_BT819 is not set
# CONFIG_VIDEO_BT856 is not set
# CONFIG_VIDEO_BT866 is not set
# CONFIG_VIDEO_KS0127 is not set
# CONFIG_VIDEO_ML86V7667 is not set
# CONFIG_VIDEO_SAA7110 is not set
# CONFIG_VIDEO_SAA711X is not set
# CONFIG_VIDEO_TC358743 is not set
# CONFIG_VIDEO_TVP514X is not set
# CONFIG_VIDEO_TVP5150 is not set
# CONFIG_VIDEO_TVP7002 is not set
# CONFIG_VIDEO_TW2804 is not set
# CONFIG_VIDEO_TW9903 is not set
# CONFIG_VIDEO_TW9906 is not set
# CONFIG_VIDEO_TW9910 is not set
# CONFIG_VIDEO_VPX3220 is not set

#
# Video and audio decoders
#
# CONFIG_VIDEO_SAA717X is not set
# CONFIG_VIDEO_CX25840 is not set
# end of Video decoders

#
# Video encoders
#
# CONFIG_VIDEO_SAA7127 is not set
# CONFIG_VIDEO_SAA7185 is not set
# CONFIG_VIDEO_ADV7170 is not set
# CONFIG_VIDEO_ADV7175 is not set
# CONFIG_VIDEO_ADV7343 is not set
# CONFIG_VIDEO_ADV7393 is not set
# CONFIG_VIDEO_ADV7511 is not set
# CONFIG_VIDEO_AD9389B is not set
# CONFIG_VIDEO_AK881X is not set
# CONFIG_VIDEO_THS8200 is not set
# end of Video encoders

#
# Video improvement chips
#
# CONFIG_VIDEO_UPD64031A is not set
# CONFIG_VIDEO_UPD64083 is not set
# end of Video improvement chips

#
# Audio/Video compression chips
#
# CONFIG_VIDEO_SAA6752HS is not set
# end of Audio/Video compression chips

#
# SDR tuner chips
#
# CONFIG_SDR_MAX2175 is not set
# end of SDR tuner chips

#
# Miscellaneous helper chips
#
# CONFIG_VIDEO_THS7303 is not set
# CONFIG_VIDEO_M52790 is not set
# CONFIG_VIDEO_I2C is not set
# CONFIG_VIDEO_ST_MIPID02 is not set
# end of Miscellaneous helper chips

#
# Camera sensor devices
#
# CONFIG_VIDEO_HI556 is not set
# CONFIG_VIDEO_HI846 is not set
# CONFIG_VIDEO_IMX208 is not set
# CONFIG_VIDEO_IMX214 is not set
# CONFIG_VIDEO_IMX219 is not set
# CONFIG_VIDEO_IMX258 is not set
# CONFIG_VIDEO_IMX274 is not set
# CONFIG_VIDEO_IMX290 is not set
# CONFIG_VIDEO_IMX319 is not set
# CONFIG_VIDEO_IMX355 is not set
# CONFIG_VIDEO_OV02A10 is not set
# CONFIG_VIDEO_OV2640 is not set
# CONFIG_VIDEO_OV2659 is not set
# CONFIG_VIDEO_OV2680 is not set
# CONFIG_VIDEO_OV2685 is not set
# CONFIG_VIDEO_OV2740 is not set
# CONFIG_VIDEO_OV5647 is not set
# CONFIG_VIDEO_OV5648 is not set
# CONFIG_VIDEO_OV6650 is not set
# CONFIG_VIDEO_OV5670 is not set
# CONFIG_VIDEO_OV5675 is not set
# CONFIG_VIDEO_OV5695 is not set
# CONFIG_VIDEO_OV7251 is not set
# CONFIG_VIDEO_OV772X is not set
# CONFIG_VIDEO_OV7640 is not set
# CONFIG_VIDEO_OV7670 is not set
# CONFIG_VIDEO_OV7740 is not set
# CONFIG_VIDEO_OV8856 is not set
# CONFIG_VIDEO_OV8865 is not set
# CONFIG_VIDEO_OV9640 is not set
# CONFIG_VIDEO_OV9650 is not set
# CONFIG_VIDEO_OV9734 is not set
# CONFIG_VIDEO_OV13858 is not set
# CONFIG_VIDEO_OV13B10 is not set
# CONFIG_VIDEO_VS6624 is not set
# CONFIG_VIDEO_MT9M001 is not set
# CONFIG_VIDEO_MT9M032 is not set
# CONFIG_VIDEO_MT9M111 is not set
# CONFIG_VIDEO_MT9P031 is not set
# CONFIG_VIDEO_MT9T001 is not set
# CONFIG_VIDEO_MT9T112 is not set
# CONFIG_VIDEO_MT9V011 is not set
# CONFIG_VIDEO_MT9V032 is not set
# CONFIG_VIDEO_MT9V111 is not set
# CONFIG_VIDEO_SR030PC30 is not set
# CONFIG_VIDEO_NOON010PC30 is not set
# CONFIG_VIDEO_M5MOLS is not set
# CONFIG_VIDEO_RDACM20 is not set
# CONFIG_VIDEO_RDACM21 is not set
# CONFIG_VIDEO_RJ54N1 is not set
# CONFIG_VIDEO_S5K6AA is not set
# CONFIG_VIDEO_S5K6A3 is not set
# CONFIG_VIDEO_S5K4ECGX is not set
# CONFIG_VIDEO_S5K5BAF is not set
# CONFIG_VIDEO_CCS is not set
# CONFIG_VIDEO_ET8EK8 is not set
# CONFIG_VIDEO_S5C73M3 is not set
# end of Camera sensor devices

#
# Lens drivers
#
# CONFIG_VIDEO_AD5820 is not set
# CONFIG_VIDEO_AK7375 is not set
# CONFIG_VIDEO_DW9714 is not set
# CONFIG_VIDEO_DW9768 is not set
# CONFIG_VIDEO_DW9807_VCM is not set
# end of Lens drivers

#
# Flash devices
#
# CONFIG_VIDEO_ADP1653 is not set
# CONFIG_VIDEO_LM3560 is not set
# CONFIG_VIDEO_LM3646 is not set
# end of Flash devices

#
# SPI helper chips
#
# CONFIG_VIDEO_GS1662 is not set
# end of SPI helper chips

#
# Media SPI Adapters
#
CONFIG_CXD2880_SPI_DRV=m
# end of Media SPI Adapters

CONFIG_MEDIA_TUNER=m

#
# Customize TV tuners
#
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA18250=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA827X=m
CONFIG_MEDIA_TUNER_TDA18271=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MSI001=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_MT2060=m
CONFIG_MEDIA_TUNER_MT2063=m
CONFIG_MEDIA_TUNER_MT2266=m
CONFIG_MEDIA_TUNER_MT2131=m
CONFIG_MEDIA_TUNER_QT1010=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_XC4000=m
CONFIG_MEDIA_TUNER_MXL5005S=m
CONFIG_MEDIA_TUNER_MXL5007T=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_MEDIA_TUNER_MAX2165=m
CONFIG_MEDIA_TUNER_TDA18218=m
CONFIG_MEDIA_TUNER_FC0011=m
CONFIG_MEDIA_TUNER_FC0012=m
CONFIG_MEDIA_TUNER_FC0013=m
CONFIG_MEDIA_TUNER_TDA18212=m
CONFIG_MEDIA_TUNER_E4000=m
CONFIG_MEDIA_TUNER_FC2580=m
CONFIG_MEDIA_TUNER_M88RS6000T=m
CONFIG_MEDIA_TUNER_TUA9001=m
CONFIG_MEDIA_TUNER_SI2157=m
CONFIG_MEDIA_TUNER_IT913X=m
CONFIG_MEDIA_TUNER_R820T=m
CONFIG_MEDIA_TUNER_MXL301RF=m
CONFIG_MEDIA_TUNER_QM1D1C0042=m
CONFIG_MEDIA_TUNER_QM1D1B0004=m
# end of Customize TV tuners

#
# Customise DVB Frontends
#

#
# Multistandard (satellite) frontends
#
CONFIG_DVB_STB0899=m
CONFIG_DVB_STB6100=m
CONFIG_DVB_STV090x=m
CONFIG_DVB_STV0910=m
CONFIG_DVB_STV6110x=m
CONFIG_DVB_STV6111=m
CONFIG_DVB_MXL5XX=m
CONFIG_DVB_M88DS3103=m

#
# Multistandard (cable + terrestrial) frontends
#
CONFIG_DVB_DRXK=m
CONFIG_DVB_TDA18271C2DD=m
CONFIG_DVB_SI2165=m
CONFIG_DVB_MN88472=m
CONFIG_DVB_MN88473=m

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=m
CONFIG_DVB_CX24123=m
CONFIG_DVB_MT312=m
CONFIG_DVB_ZL10036=m
CONFIG_DVB_ZL10039=m
CONFIG_DVB_S5H1420=m
CONFIG_DVB_STV0288=m
CONFIG_DVB_STB6000=m
CONFIG_DVB_STV0299=m
CONFIG_DVB_STV6110=m
CONFIG_DVB_STV0900=m
CONFIG_DVB_TDA8083=m
CONFIG_DVB_TDA10086=m
CONFIG_DVB_TDA8261=m
CONFIG_DVB_VES1X93=m
CONFIG_DVB_TUNER_ITD1000=m
CONFIG_DVB_TUNER_CX24113=m
CONFIG_DVB_TDA826X=m
CONFIG_DVB_TUA6100=m
CONFIG_DVB_CX24116=m
CONFIG_DVB_CX24117=m
CONFIG_DVB_CX24120=m
CONFIG_DVB_SI21XX=m
CONFIG_DVB_TS2020=m
CONFIG_DVB_DS3000=m
CONFIG_DVB_MB86A16=m
CONFIG_DVB_TDA10071=m

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_SP887X=m
CONFIG_DVB_CX22700=m
CONFIG_DVB_CX22702=m
CONFIG_DVB_S5H1432=m
CONFIG_DVB_DRXD=m
CONFIG_DVB_L64781=m
CONFIG_DVB_TDA1004X=m
CONFIG_DVB_NXT6000=m
CONFIG_DVB_MT352=m
CONFIG_DVB_ZL10353=m
CONFIG_DVB_DIB3000MB=m
CONFIG_DVB_DIB3000MC=m
CONFIG_DVB_DIB7000M=m
CONFIG_DVB_DIB7000P=m
CONFIG_DVB_DIB9000=m
CONFIG_DVB_TDA10048=m
CONFIG_DVB_AF9013=m
CONFIG_DVB_EC100=m
CONFIG_DVB_STV0367=m
CONFIG_DVB_CXD2820R=m
CONFIG_DVB_CXD2841ER=m
CONFIG_DVB_RTL2830=m
CONFIG_DVB_RTL2832=m
CONFIG_DVB_RTL2832_SDR=m
CONFIG_DVB_SI2168=m
CONFIG_DVB_ZD1301_DEMOD=m
CONFIG_DVB_CXD2880=m

#
# DVB-C (cable) frontends
#
CONFIG_DVB_VES1820=m
CONFIG_DVB_TDA10021=m
CONFIG_DVB_TDA10023=m
CONFIG_DVB_STV0297=m

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_NXT200X=m
CONFIG_DVB_OR51211=m
CONFIG_DVB_OR51132=m
CONFIG_DVB_BCM3510=m
CONFIG_DVB_LGDT330X=m
CONFIG_DVB_LGDT3305=m
CONFIG_DVB_LGDT3306A=m
CONFIG_DVB_LG2160=m
CONFIG_DVB_S5H1409=m
CONFIG_DVB_AU8522=m
CONFIG_DVB_AU8522_DTV=m
CONFIG_DVB_AU8522_V4L=m
CONFIG_DVB_S5H1411=m
CONFIG_DVB_MXL692=m

#
# ISDB-T (terrestrial) frontends
#
CONFIG_DVB_S921=m
CONFIG_DVB_DIB8000=m
CONFIG_DVB_MB86A20S=m

#
# ISDB-S (satellite) & ISDB-T (terrestrial) frontends
#
CONFIG_DVB_TC90522=m
CONFIG_DVB_MN88443X=m

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=m
CONFIG_DVB_TUNER_DIB0070=m
CONFIG_DVB_TUNER_DIB0090=m

#
# SEC control devices for DVB-S
#
CONFIG_DVB_DRX39XYJ=m
CONFIG_DVB_LNBH25=m
CONFIG_DVB_LNBH29=m
CONFIG_DVB_LNBP21=m
CONFIG_DVB_LNBP22=m
CONFIG_DVB_ISL6405=m
CONFIG_DVB_ISL6421=m
CONFIG_DVB_ISL6423=m
CONFIG_DVB_A8293=m
CONFIG_DVB_LGS8GL5=m
CONFIG_DVB_LGS8GXX=m
CONFIG_DVB_ATBM8830=m
CONFIG_DVB_TDA665x=m
CONFIG_DVB_IX2505V=m
CONFIG_DVB_M88RS2000=m
CONFIG_DVB_AF9033=m
CONFIG_DVB_HORUS3A=m
CONFIG_DVB_ASCOT2E=m
CONFIG_DVB_HELENE=m

#
# Common Interface (EN50221) controller drivers
#
CONFIG_DVB_CXD2099=m
CONFIG_DVB_SP2=m
# end of Customise DVB Frontends

#
# Tools to develop new frontends
#
# CONFIG_DVB_DUMMY_FE is not set
# end of Media ancillary drivers

#
# Graphics support
#
# CONFIG_AGP is not set
CONFIG_INTEL_GTT=m
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_MIPI_DSI=y
CONFIG_DRM_DP_AUX_CHARDEV=y
# CONFIG_DRM_DEBUG_SELFTEST is not set
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
# CONFIG_DRM_DP_CEC is not set
CONFIG_DRM_TTM=m
CONFIG_DRM_VRAM_HELPER=m
CONFIG_DRM_TTM_HELPER=m
CONFIG_DRM_GEM_SHMEM_HELPER=y

#
# I2C encoder or helper chips
#
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
# CONFIG_DRM_I2C_NXP_TDA998X is not set
# CONFIG_DRM_I2C_NXP_TDA9950 is not set
# end of I2C encoder or helper chips

#
# ARM devices
#
# end of ARM devices

# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_AMDGPU is not set
# CONFIG_DRM_NOUVEAU is not set
CONFIG_DRM_I915=m
CONFIG_DRM_I915_FORCE_PROBE=""
CONFIG_DRM_I915_CAPTURE_ERROR=y
CONFIG_DRM_I915_COMPRESS_ERROR=y
CONFIG_DRM_I915_USERPTR=y
CONFIG_DRM_I915_GVT=y
# CONFIG_DRM_I915_GVT_KVMGT is not set
CONFIG_DRM_I915_REQUEST_TIMEOUT=20000
CONFIG_DRM_I915_FENCE_TIMEOUT=10000
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND=250
CONFIG_DRM_I915_HEARTBEAT_INTERVAL=2500
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT=8000
CONFIG_DRM_I915_STOP_TIMEOUT=100
CONFIG_DRM_I915_TIMESLICE_DURATION=1
# CONFIG_DRM_VGEM is not set
# CONFIG_DRM_VKMS is not set
# CONFIG_DRM_VMWGFX is not set
CONFIG_DRM_GMA500=m
# CONFIG_DRM_UDL is not set
CONFIG_DRM_AST=m
CONFIG_DRM_MGAG200=m
CONFIG_DRM_QXL=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_DRM_PANEL=y

#
# Display Panels
#
# CONFIG_DRM_PANEL_RASPBERRYPI_TOUCHSCREEN is not set
# CONFIG_DRM_PANEL_WIDECHIPS_WS2401 is not set
# end of Display Panels

CONFIG_DRM_BRIDGE=y
CONFIG_DRM_PANEL_BRIDGE=y

#
# Display Interface Bridges
#
# CONFIG_DRM_ANALOGIX_ANX78XX is not set
# end of Display Interface Bridges

# CONFIG_DRM_ETNAVIV is not set
CONFIG_DRM_BOCHS=m
CONFIG_DRM_CIRRUS_QEMU=m
# CONFIG_DRM_GM12U320 is not set
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_TINYDRM_HX8357D is not set
# CONFIG_TINYDRM_ILI9225 is not set
# CONFIG_TINYDRM_ILI9341 is not set
# CONFIG_TINYDRM_ILI9486 is not set
# CONFIG_TINYDRM_MI0283QT is not set
# CONFIG_TINYDRM_REPAPER is not set
# CONFIG_TINYDRM_ST7586 is not set
# CONFIG_TINYDRM_ST7735R is not set
# CONFIG_DRM_VBOXVIDEO is not set
# CONFIG_DRM_GUD is not set
# CONFIG_DRM_HYPERV is not set
# CONFIG_DRM_LEGACY is not set
CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_SM501 is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
CONFIG_FB_HYPERV=m
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SSD1307 is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_L4F00242T03 is not set
# CONFIG_LCD_LMS283GF05 is not set
# CONFIG_LCD_LTV350QV is not set
# CONFIG_LCD_ILI922X is not set
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_TDO24M is not set
# CONFIG_LCD_VGG2432A4 is not set
CONFIG_LCD_PLATFORM=m
# CONFIG_LCD_AMS369FG06 is not set
# CONFIG_LCD_LMS501KF03 is not set
# CONFIG_LCD_HX8357 is not set
# CONFIG_LCD_OTM3225A is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_KTD253 is not set
# CONFIG_BACKLIGHT_PWM is not set
CONFIG_BACKLIGHT_APPLE=m
# CONFIG_BACKLIGHT_QCOM_WLED is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630A is not set
# CONFIG_BACKLIGHT_LM3639 is not set
CONFIG_BACKLIGHT_LP855X=m
# CONFIG_BACKLIGHT_GPIO is not set
# CONFIG_BACKLIGHT_LV5207LP is not set
# CONFIG_BACKLIGHT_BD6107 is not set
# CONFIG_BACKLIGHT_ARCXCNN is not set
# end of Backlight & LCD device support

CONFIG_HDMI=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
CONFIG_HID_BATTERY_STRENGTH=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
# CONFIG_HID_ACCUTOUCH is not set
CONFIG_HID_ACRUX=m
# CONFIG_HID_ACRUX_FF is not set
CONFIG_HID_APPLE=m
# CONFIG_HID_APPLEIR is not set
CONFIG_HID_ASUS=m
CONFIG_HID_AUREAL=m
CONFIG_HID_BELKIN=m
# CONFIG_HID_BETOP_FF is not set
# CONFIG_HID_BIGBEN_FF is not set
CONFIG_HID_CHERRY=m
# CONFIG_HID_CHICONY is not set
# CONFIG_HID_CORSAIR is not set
# CONFIG_HID_COUGAR is not set
# CONFIG_HID_MACALLY is not set
CONFIG_HID_CMEDIA=m
# CONFIG_HID_CP2112 is not set
# CONFIG_HID_CREATIVE_SB0540 is not set
CONFIG_HID_CYPRESS=m
CONFIG_HID_DRAGONRISE=m
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
# CONFIG_HID_ELAN is not set
CONFIG_HID_ELECOM=m
# CONFIG_HID_ELO is not set
CONFIG_HID_EZKEY=m
# CONFIG_HID_FT260 is not set
CONFIG_HID_GEMBIRD=m
CONFIG_HID_GFRM=m
# CONFIG_HID_GLORIOUS is not set
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_VIVALDI is not set
# CONFIG_HID_GT683R is not set
CONFIG_HID_KEYTOUCH=m
CONFIG_HID_KYE=m
# CONFIG_HID_UCLOGIC is not set
CONFIG_HID_WALTOP=m
# CONFIG_HID_VIEWSONIC is not set
# CONFIG_HID_XIAOMI is not set
CONFIG_HID_GYRATION=m
CONFIG_HID_ICADE=m
CONFIG_HID_ITE=m
CONFIG_HID_JABRA=m
CONFIG_HID_TWINHAN=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LCPOWER=m
CONFIG_HID_LED=m
CONFIG_HID_LENOVO=m
CONFIG_HID_LOGITECH=m
CONFIG_HID_LOGITECH_DJ=m
CONFIG_HID_LOGITECH_HIDPP=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MAGICMOUSE=y
# CONFIG_HID_MALTRON is not set
# CONFIG_HID_MAYFLASH is not set
# CONFIG_HID_REDRAGON is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_MULTITOUCH=m
# CONFIG_HID_NINTENDO is not set
CONFIG_HID_NTI=m
# CONFIG_HID_NTRIG is not set
CONFIG_HID_ORTEK=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
# CONFIG_HID_PENMOUNT is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_PICOLCD=m
CONFIG_HID_PICOLCD_FB=y
CONFIG_HID_PICOLCD_BACKLIGHT=y
CONFIG_HID_PICOLCD_LCD=y
CONFIG_HID_PICOLCD_LEDS=y
CONFIG_HID_PICOLCD_CIR=y
CONFIG_HID_PLANTRONICS=m
CONFIG_HID_PRIMAX=m
# CONFIG_HID_RETRODE is not set
# CONFIG_HID_ROCCAT is not set
CONFIG_HID_SAITEK=m
CONFIG_HID_SAMSUNG=m
# CONFIG_HID_SEMITEK is not set
# CONFIG_HID_SONY is not set
CONFIG_HID_SPEEDLINK=m
# CONFIG_HID_STEAM is not set
CONFIG_HID_STEELSERIES=m
CONFIG_HID_SUNPLUS=m
CONFIG_HID_RMI=m
CONFIG_HID_GREENASIA=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_HYPERV_MOUSE=m
CONFIG_HID_SMARTJOYPLUS=m
# CONFIG_SMARTJOYPLUS_FF is not set
CONFIG_HID_TIVO=m
CONFIG_HID_TOPSEED=m
CONFIG_HID_THINGM=m
CONFIG_HID_THRUSTMASTER=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_HID_UDRAW_PS3 is not set
# CONFIG_HID_U2FZERO is not set
# CONFIG_HID_WACOM is not set
CONFIG_HID_WIIMOTE=m
CONFIG_HID_XINMO=m
CONFIG_HID_ZEROPLUS=m
# CONFIG_ZEROPLUS_FF is not set
CONFIG_HID_ZYDACRON=m
CONFIG_HID_SENSOR_HUB=y
CONFIG_HID_SENSOR_CUSTOM_SENSOR=m
CONFIG_HID_ALPS=m
# CONFIG_HID_MCP2221 is not set
# end of Special HID drivers

#
# USB HID support
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set
# end of USB HID support

#
# I2C HID support
#
# CONFIG_I2C_HID_ACPI is not set
# end of I2C HID support

#
# Intel ISH HID support
#
CONFIG_INTEL_ISH_HID=m
# CONFIG_INTEL_ISH_FIRMWARE_DOWNLOADER is not set
# end of Intel ISH HID support

#
# AMD SFH HID Support
#
# CONFIG_AMD_SFH_HID is not set
# end of AMD SFH HID Support
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_USB_CONN_GPIO is not set
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_PCI=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
CONFIG_USB_DEFAULT_PERSIST=y
# CONFIG_USB_FEW_INIT_RETRIES is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_OTG_PRODUCTLIST is not set
CONFIG_USB_LEDS_TRIGGER_USBPORT=y
CONFIG_USB_AUTOSUSPEND_DELAY=2
CONFIG_USB_MON=y

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_XHCI_HCD=y
# CONFIG_USB_XHCI_DBGCAP is not set
CONFIG_USB_XHCI_PCI=y
# CONFIG_USB_XHCI_PCI_RENESAS is not set
# CONFIG_USB_XHCI_PLATFORM is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_EHCI_PCI=y
# CONFIG_USB_EHCI_FSL is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_FOTG210_HCD is not set
# CONFIG_USB_MAX3421_HCD is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_OHCI_HCD_PCI=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_HCD_BCMA is not set
# CONFIG_USB_HCD_TEST_MODE is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_REALTEK is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_STORAGE_ENE_UB6250 is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_USB_CDNS_SUPPORT is not set
# CONFIG_USB_MUSB_HDRC is not set
# CONFIG_USB_DWC3 is not set
# CONFIG_USB_DWC2 is not set
# CONFIG_USB_CHIPIDEA is not set
# CONFIG_USB_ISP1760 is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_SIMPLE is not set
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_F81232 is not set
# CONFIG_USB_SERIAL_F8153X is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_METRO is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MXUPORT is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QCAUX is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_XSENS_MT is not set
# CONFIG_USB_SERIAL_WISHBONE is not set
# CONFIG_USB_SERIAL_SSU100 is not set
# CONFIG_USB_SERIAL_QT2 is not set
# CONFIG_USB_SERIAL_UPD78F0730 is not set
# CONFIG_USB_SERIAL_XR is not set
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_APPLE_MFI_FASTCHARGE is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_EHSET_TEST_FIXTURE is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set
# CONFIG_USB_HUB_USB251XB is not set
# CONFIG_USB_HSIC_USB3503 is not set
# CONFIG_USB_HSIC_USB4604 is not set
# CONFIG_USB_LINK_LAYER_TEST is not set
# CONFIG_USB_CHAOSKEY is not set
# CONFIG_USB_ATM is not set

#
# USB Physical Layer drivers
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_ISP1301 is not set
# end of USB Physical Layer drivers

# CONFIG_USB_GADGET is not set
CONFIG_TYPEC=y
# CONFIG_TYPEC_TCPM is not set
CONFIG_TYPEC_UCSI=y
# CONFIG_UCSI_CCG is not set
CONFIG_UCSI_ACPI=y
# CONFIG_TYPEC_TPS6598X is not set
# CONFIG_TYPEC_STUSB160X is not set

#
# USB Type-C Multiplexer/DeMultiplexer Switch support
#
# CONFIG_TYPEC_MUX_PI3USB30532 is not set
# end of USB Type-C Multiplexer/DeMultiplexer Switch support

#
# USB Type-C Alternate Mode drivers
#
# CONFIG_TYPEC_DP_ALTMODE is not set
# end of USB Type-C Alternate Mode drivers

# CONFIG_USB_ROLE_SWITCH is not set
CONFIG_MMC=m
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_SDIO_UART=m
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_IO_ACCESSORS=y
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=m
CONFIG_MMC_SDHCI_PLTFM=m
# CONFIG_MMC_SDHCI_F_SDH30 is not set
# CONFIG_MMC_WBSD is not set
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SPI is not set
# CONFIG_MMC_CB710 is not set
# CONFIG_MMC_VIA_SDMMC is not set
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MMC_USDHI6ROL0 is not set
# CONFIG_MMC_REALTEK_PCI is not set
CONFIG_MMC_CQHCI=m
# CONFIG_MMC_HSQ is not set
# CONFIG_MMC_TOSHIBA_PCI is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MMC_SDHCI_XENON is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set
# CONFIG_LEDS_CLASS_MULTICOLOR is not set
# CONFIG_LEDS_BRIGHTNESS_HW_CHANGED is not set

#
# LED drivers
#
# CONFIG_LEDS_APU is not set
CONFIG_LEDS_LM3530=m
# CONFIG_LEDS_LM3532 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP3952 is not set
# CONFIG_LEDS_LP50XX is not set
CONFIG_LEDS_CLEVO_MAIL=m
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA963X is not set
# CONFIG_LEDS_DAC124S085 is not set
# CONFIG_LEDS_PWM is not set
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_INTEL_SS4200=m
CONFIG_LEDS_LT3593=m
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_TLC591XX is not set
# CONFIG_LEDS_LM355x is not set

#
# LED driver for blink(1) USB RGB LED is under Special HID drivers (HID_THINGM)
#
CONFIG_LEDS_BLINKM=m
CONFIG_LEDS_MLXCPLD=m
# CONFIG_LEDS_MLXREG is not set
# CONFIG_LEDS_USER is not set
# CONFIG_LEDS_NIC78BX is not set
# CONFIG_LEDS_TI_LMU_COMMON is not set

#
# Flash and Torch LED drivers
#

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_ONESHOT=m
# CONFIG_LEDS_TRIGGER_DISK is not set
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_BACKLIGHT=m
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_ACTIVITY is not set
CONFIG_LEDS_TRIGGER_GPIO=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_LEDS_TRIGGER_TRANSIENT=m
CONFIG_LEDS_TRIGGER_CAMERA=m
# CONFIG_LEDS_TRIGGER_PANIC is not set
# CONFIG_LEDS_TRIGGER_NETDEV is not set
# CONFIG_LEDS_TRIGGER_PATTERN is not set
CONFIG_LEDS_TRIGGER_AUDIO=m
# CONFIG_LEDS_TRIGGER_TTY is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
# CONFIG_EDAC_I10NM is not set
CONFIG_EDAC_PND2=m
# CONFIG_EDAC_IGEN6 is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_MC146818_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_DEBUG is not set
CONFIG_RTC_NVMEM=y

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_ABB5ZES3 is not set
# CONFIG_RTC_DRV_ABEOZ9 is not set
# CONFIG_RTC_DRV_ABX80X is not set
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1307_CENTURY is not set
CONFIG_RTC_DRV_DS1374=m
# CONFIG_RTC_DRV_DS1374_WDT is not set
CONFIG_RTC_DRV_DS1672=m
CONFIG_RTC_DRV_MAX6900=m
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_ISL12022=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8523=m
# CONFIG_RTC_DRV_PCF85063 is not set
# CONFIG_RTC_DRV_PCF85363 is not set
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
CONFIG_RTC_DRV_M41T80=m
CONFIG_RTC_DRV_M41T80_WDT=y
CONFIG_RTC_DRV_BQ32K=m
# CONFIG_RTC_DRV_S35390A is not set
CONFIG_RTC_DRV_FM3130=m
# CONFIG_RTC_DRV_RX8010 is not set
CONFIG_RTC_DRV_RX8581=m
CONFIG_RTC_DRV_RX8025=m
CONFIG_RTC_DRV_EM3027=m
# CONFIG_RTC_DRV_RV3028 is not set
# CONFIG_RTC_DRV_RV3032 is not set
# CONFIG_RTC_DRV_RV8803 is not set
# CONFIG_RTC_DRV_SD3078 is not set

#
# SPI RTC drivers
#
# CONFIG_RTC_DRV_M41T93 is not set
# CONFIG_RTC_DRV_M41T94 is not set
# CONFIG_RTC_DRV_DS1302 is not set
# CONFIG_RTC_DRV_DS1305 is not set
# CONFIG_RTC_DRV_DS1343 is not set
# CONFIG_RTC_DRV_DS1347 is not set
# CONFIG_RTC_DRV_DS1390 is not set
# CONFIG_RTC_DRV_MAX6916 is not set
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RX4581=m
# CONFIG_RTC_DRV_RS5C348 is not set
# CONFIG_RTC_DRV_MAX6902 is not set
# CONFIG_RTC_DRV_PCF2123 is not set
# CONFIG_RTC_DRV_MCP795 is not set
CONFIG_RTC_I2C_AND_SPI=y

#
# SPI and I2C RTC drivers
#
CONFIG_RTC_DRV_DS3232=m
CONFIG_RTC_DRV_DS3232_HWMON=y
# CONFIG_RTC_DRV_PCF2127 is not set
CONFIG_RTC_DRV_RV3029C2=m
# CONFIG_RTC_DRV_RV3029_HWMON is not set
# CONFIG_RTC_DRV_RX6110 is not set

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1286=m
CONFIG_RTC_DRV_DS1511=m
CONFIG_RTC_DRV_DS1553=m
# CONFIG_RTC_DRV_DS1685_FAMILY is not set
CONFIG_RTC_DRV_DS1742=m
CONFIG_RTC_DRV_DS2404=m
CONFIG_RTC_DRV_STK17TA8=m
# CONFIG_RTC_DRV_M48T86 is not set
CONFIG_RTC_DRV_M48T35=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_MSM6242=m
CONFIG_RTC_DRV_BQ4802=m
CONFIG_RTC_DRV_RP5C01=m
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_RTC_DRV_FTRTC010 is not set

#
# HID Sensor RTC drivers
#
# CONFIG_RTC_DRV_GOLDFISH is not set
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
CONFIG_DMA_ENGINE=y
CONFIG_DMA_VIRTUAL_CHANNELS=y
CONFIG_DMA_ACPI=y
# CONFIG_ALTERA_MSGDMA is not set
CONFIG_INTEL_IDMA64=m
# CONFIG_INTEL_IDXD is not set
# CONFIG_INTEL_IDXD_COMPAT is not set
CONFIG_INTEL_IOATDMA=m
# CONFIG_PLX_DMA is not set
# CONFIG_AMD_PTDMA is not set
# CONFIG_QCOM_HIDMA_MGMT is not set
# CONFIG_QCOM_HIDMA is not set
CONFIG_DW_DMAC_CORE=y
CONFIG_DW_DMAC=m
CONFIG_DW_DMAC_PCI=y
# CONFIG_DW_EDMA is not set
# CONFIG_DW_EDMA_PCIE is not set
CONFIG_HSU_DMA=y
# CONFIG_SF_PDMA is not set
# CONFIG_INTEL_LDMA is not set

#
# DMA Clients
#
CONFIG_ASYNC_TX_DMA=y
CONFIG_DMATEST=m
CONFIG_DMA_ENGINE_RAID=y

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# CONFIG_DMABUF_SYSFS_STATS is not set
# end of DMABUF options

CONFIG_DCA=m
# CONFIG_AUXDISPLAY is not set
# CONFIG_PANEL is not set
CONFIG_UIO=m
CONFIG_UIO_CIF=m
CONFIG_UIO_PDRV_GENIRQ=m
# CONFIG_UIO_DMEM_GENIRQ is not set
CONFIG_UIO_AEC=m
CONFIG_UIO_SERCOS3=m
CONFIG_UIO_PCI_GENERIC=m
# CONFIG_UIO_NETX is not set
# CONFIG_UIO_PRUSS is not set
# CONFIG_UIO_MF624 is not set
CONFIG_UIO_HV_GENERIC=m
CONFIG_VFIO=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
CONFIG_VFIO_NOIOMMU=y
CONFIG_VFIO_PCI_CORE=m
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
CONFIG_VFIO_PCI=m
# CONFIG_VFIO_PCI_VGA is not set
# CONFIG_VFIO_PCI_IGD is not set
CONFIG_VFIO_MDEV=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=y
CONFIG_VIRTIO_PCI_LIB_LEGACY=y
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PCI_LEGACY=y
# CONFIG_VIRTIO_PMEM is not set
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_MEM=m
CONFIG_VIRTIO_INPUT=m
# CONFIG_VIRTIO_MMIO is not set
CONFIG_VIRTIO_DMA_SHARED_BUFFER=m
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
# CONFIG_VHOST_SCSI is not set
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
CONFIG_HYPERV=m
CONFIG_HYPERV_TIMER=y
CONFIG_HYPERV_UTILS=m
CONFIG_HYPERV_BALLOON=m
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
CONFIG_ACPI_WMI=m
CONFIG_WMI_BMOF=m
# CONFIG_HUAWEI_WMI is not set
# CONFIG_UV_SYSFS is not set
CONFIG_MXM_WMI=m
# CONFIG_PEAQ_WMI is not set
# CONFIG_NVIDIA_WMI_EC_BACKLIGHT is not set
# CONFIG_XIAOMI_WMI is not set
# CONFIG_GIGABYTE_WMI is not set
CONFIG_ACERHDF=m
# CONFIG_ACER_WIRELESS is not set
CONFIG_ACER_WMI=m
# CONFIG_AMD_PMC is not set
# CONFIG_ADV_SWBUTTON is not set
CONFIG_APPLE_GMUX=m
CONFIG_ASUS_LAPTOP=m
# CONFIG_ASUS_WIRELESS is not set
CONFIG_ASUS_WMI=m
CONFIG_ASUS_NB_WMI=m
# CONFIG_MERAKI_MX100 is not set
CONFIG_EEEPC_LAPTOP=m
CONFIG_EEEPC_WMI=m
# CONFIG_X86_PLATFORM_DRIVERS_DELL is not set
CONFIG_AMILO_RFKILL=m
CONFIG_FUJITSU_LAPTOP=m
CONFIG_FUJITSU_TABLET=m
# CONFIG_GPD_POCKET_FAN is not set
CONFIG_HP_ACCEL=m
# CONFIG_WIRELESS_HOTKEY is not set
CONFIG_HP_WMI=m
# CONFIG_IBM_RTL is not set
CONFIG_IDEAPAD_LAPTOP=m
CONFIG_SENSORS_HDAPS=m
CONFIG_THINKPAD_ACPI=m
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_THINKPAD_LMI is not set
CONFIG_X86_PLATFORM_DRIVERS_INTEL=y
# CONFIG_INTEL_ATOMISP2_PM is not set
# CONFIG_INTEL_SAR_INT1092 is not set
CONFIG_INTEL_PMC_CORE=m

#
# Intel Speed Select Technology interface support
#
# CONFIG_INTEL_SPEED_SELECT_INTERFACE is not set
# end of Intel Speed Select Technology interface support

CONFIG_INTEL_WMI=y
# CONFIG_INTEL_WMI_SBL_FW_UPDATE is not set
CONFIG_INTEL_WMI_THUNDERBOLT=m
CONFIG_INTEL_HID_EVENT=m
CONFIG_INTEL_VBTN=m
# CONFIG_INTEL_INT0002_VGPIO is not set
CONFIG_INTEL_OAKTRAIL=m
# CONFIG_INTEL_ISHTP_ECLITE is not set
# CONFIG_INTEL_PUNIT_IPC is not set
CONFIG_INTEL_RST=m
# CONFIG_INTEL_SMARTCONNECT is not set
CONFIG_INTEL_TURBO_MAX_3=y
# CONFIG_INTEL_UNCORE_FREQ_CONTROL is not set
CONFIG_MSI_LAPTOP=m
CONFIG_MSI_WMI=m
# CONFIG_PCENGINES_APU2 is not set
# CONFIG_BARCO_P50_GPIO is not set
CONFIG_SAMSUNG_LAPTOP=m
CONFIG_SAMSUNG_Q10=m
CONFIG_TOSHIBA_BT_RFKILL=m
# CONFIG_TOSHIBA_HAPS is not set
# CONFIG_TOSHIBA_WMI is not set
CONFIG_ACPI_CMPC=m
CONFIG_COMPAL_LAPTOP=m
# CONFIG_LG_LAPTOP is not set
CONFIG_PANASONIC_LAPTOP=m
CONFIG_SONY_LAPTOP=m
CONFIG_SONYPI_COMPAT=y
# CONFIG_SYSTEM76_ACPI is not set
CONFIG_TOPSTAR_LAPTOP=m
# CONFIG_I2C_MULTI_INSTANTIATE is not set
CONFIG_MLX_PLATFORM=m
CONFIG_INTEL_IPS=m
# CONFIG_INTEL_SCU_PCI is not set
# CONFIG_INTEL_SCU_PLATFORM is not set
CONFIG_PMC_ATOM=y
# CONFIG_CHROME_PLATFORMS is not set
CONFIG_MELLANOX_PLATFORM=y
CONFIG_MLXREG_HOTPLUG=m
# CONFIG_MLXREG_IO is not set
# CONFIG_MLXREG_LC is not set
CONFIG_SURFACE_PLATFORMS=y
# CONFIG_SURFACE3_WMI is not set
# CONFIG_SURFACE_3_POWER_OPREGION is not set
# CONFIG_SURFACE_GPE is not set
# CONFIG_SURFACE_HOTPLUG is not set
# CONFIG_SURFACE_PRO3_BUTTON is not set
CONFIG_HAVE_CLK=y
CONFIG_HAVE_CLK_PREPARE=y
CONFIG_COMMON_CLK=y
# CONFIG_LMK04832 is not set
# CONFIG_COMMON_CLK_MAX9485 is not set
# CONFIG_COMMON_CLK_SI5341 is not set
# CONFIG_COMMON_CLK_SI5351 is not set
# CONFIG_COMMON_CLK_SI544 is not set
# CONFIG_COMMON_CLK_CDCE706 is not set
# CONFIG_COMMON_CLK_CS2000_CP is not set
# CONFIG_COMMON_CLK_PWM is not set
# CONFIG_XILINX_VCU is not set
CONFIG_HWSPINLOCK=y

#
# Clock Source drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
# end of Clock Source drivers

CONFIG_MAILBOX=y
CONFIG_PCC=y
# CONFIG_ALTERA_MBOX is not set
CONFIG_IOMMU_IOVA=y
CONFIG_IOASID=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
# CONFIG_IOMMU_DEFAULT_DMA_STRICT is not set
CONFIG_IOMMU_DEFAULT_DMA_LAZY=y
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_IOMMU_DMA=y
# CONFIG_AMD_IOMMU is not set
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
# CONFIG_INTEL_IOMMU_SVM is not set
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON=y
CONFIG_IRQ_REMAP=y
CONFIG_HYPERV_IOMMU=y
# CONFIG_VIRTIO_IOMMU is not set

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_QCOM_GLINK_RPM is not set
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

# CONFIG_SOUNDWIRE is not set

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
CONFIG_NTB=m
# CONFIG_NTB_MSI is not set
# CONFIG_NTB_AMD is not set
# CONFIG_NTB_IDT is not set
# CONFIG_NTB_INTEL is not set
# CONFIG_NTB_EPF is not set
# CONFIG_NTB_SWITCHTEC is not set
# CONFIG_NTB_PINGPONG is not set
# CONFIG_NTB_TOOL is not set
# CONFIG_NTB_PERF is not set
# CONFIG_NTB_TRANSPORT is not set
# CONFIG_VME_BUS is not set
CONFIG_PWM=y
CONFIG_PWM_SYSFS=y
# CONFIG_PWM_DEBUG is not set
# CONFIG_PWM_DWC is not set
CONFIG_PWM_LPSS=m
CONFIG_PWM_LPSS_PCI=m
CONFIG_PWM_LPSS_PLATFORM=m
# CONFIG_PWM_PCA9685 is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_USB_LGM_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# CONFIG_PHY_INTEL_LGM_EMMC is not set
# end of PHY Subsystem

CONFIG_POWERCAP=y
CONFIG_INTEL_RAPL_CORE=m
CONFIG_INTEL_RAPL=m
# CONFIG_IDLE_INJECT is not set
# CONFIG_DTPM is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

CONFIG_RAS=y
# CONFIG_RAS_CEC is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

CONFIG_LIBNVDIMM=m
CONFIG_BLK_DEV_PMEM=m
CONFIG_ND_BLK=m
CONFIG_ND_CLAIM=y
CONFIG_ND_BTT=m
CONFIG_BTT=y
CONFIG_ND_PFN=m
CONFIG_NVDIMM_PFN=y
CONFIG_NVDIMM_DAX=y
CONFIG_NVDIMM_KEYS=y
CONFIG_DAX_DRIVER=y
CONFIG_DAX=y
CONFIG_DEV_DAX=m
CONFIG_DEV_DAX_PMEM=m
CONFIG_DEV_DAX_KMEM=m
CONFIG_DEV_DAX_PMEM_COMPAT=m
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y
# CONFIG_NVMEM_RMEM is not set

#
# HW tracing support
#
CONFIG_STM=m
# CONFIG_STM_PROTO_BASIC is not set
# CONFIG_STM_PROTO_SYS_T is not set
CONFIG_STM_DUMMY=m
CONFIG_STM_SOURCE_CONSOLE=m
CONFIG_STM_SOURCE_HEARTBEAT=m
CONFIG_STM_SOURCE_FTRACE=m
CONFIG_INTEL_TH=m
CONFIG_INTEL_TH_PCI=m
CONFIG_INTEL_TH_ACPI=m
CONFIG_INTEL_TH_GTH=m
CONFIG_INTEL_TH_STH=m
CONFIG_INTEL_TH_MSU=m
CONFIG_INTEL_TH_PTI=m
# CONFIG_INTEL_TH_DEBUG is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_TEE is not set
# CONFIG_UNISYS_VISORBUS is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# end of Device Drivers

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
CONFIG_EXT2_FS=m
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
CONFIG_JBD2=y
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_XFS_FS=m
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_ONLINE_SCRUB=y
# CONFIG_XFS_ONLINE_REPAIR is not set
CONFIG_XFS_DEBUG=y
CONFIG_XFS_ASSERT_FATAL=y
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
# CONFIG_NILFS2_FS is not set
CONFIG_F2FS_FS=m
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
# CONFIG_F2FS_FS_SECURITY is not set
# CONFIG_F2FS_CHECK_FS is not set
# CONFIG_F2FS_FAULT_INJECTION is not set
# CONFIG_F2FS_FS_COMPRESSION is not set
CONFIG_F2FS_IOSTAT=y
CONFIG_FS_DAX=y
CONFIG_FS_DAX_PMD=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_VERITY is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
# CONFIG_VIRTIO_FS is not set
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
# CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_EXFAT_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_VMCORE_DEVICE_DUMP=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_PROC_PID_ARCH_STATUS=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
# CONFIG_TMPFS_INODE64 is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_HUGETLB_PAGE_FREE_VMEMMAP=y
# CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON is not set
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=y
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
# CONFIG_SQUASHFS_FILE_CACHE is not set
CONFIG_SQUASHFS_FILE_DIRECT=y
# CONFIG_SQUASHFS_DECOMP_SINGLE is not set
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZ4 is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_ZSTD is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
# CONFIG_PSTORE_842_COMPRESS is not set
# CONFIG_PSTORE_ZSTD_COMPRESS is not set
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=m
# CONFIG_PSTORE_BLK is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
# CONFIG_NFS_SWAP is not set
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_PNFS_FILE_LAYOUT=m
CONFIG_PNFS_BLOCK=m
CONFIG_PNFS_FLEXFILE_LAYOUT=m
CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
# CONFIG_NFS_V4_1_MIGRATION is not set
CONFIG_NFS_V4_SECURITY_LABEL=y
CONFIG_ROOT_NFS=y
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DEBUG=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
# CONFIG_NFS_V4_2_READ_PLUS is not set
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_NFSD_PNFS=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
CONFIG_NFSD_SCSILAYOUT=y
# CONFIG_NFSD_FLEXFILELAYOUT is not set
# CONFIG_NFSD_V4_2_INTER_SSC is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_NFS_V4_2_SSC_HELPER=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_BACKCHANNEL=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
CONFIG_SUNRPC_DEBUG=y
CONFIG_CEPH_FS=m
# CONFIG_CEPH_FSCACHE is not set
CONFIG_CEPH_FS_POSIX_ACL=y
# CONFIG_CEPH_FS_SECURITY_LABEL is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_CIFS_DEBUG=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DEBUG_DUMP_KEYS is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_SWN_UPCALL is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=m
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_9P_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_MAC_ROMAN=m
CONFIG_NLS_MAC_CELTIC=m
CONFIG_NLS_MAC_CENTEURO=m
CONFIG_NLS_MAC_CROATIAN=m
CONFIG_NLS_MAC_CYRILLIC=m
CONFIG_NLS_MAC_GAELIC=m
CONFIG_NLS_MAC_GREEK=m
CONFIG_NLS_MAC_ICELAND=m
CONFIG_NLS_MAC_INUIT=m
CONFIG_NLS_MAC_ROMANIAN=m
CONFIG_NLS_MAC_TURKISH=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y
# CONFIG_UNICODE is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
CONFIG_TRUSTED_KEYS=y
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_PAGE_TABLE_ISOLATION=y
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_PATH=y
CONFIG_INTEL_TXT=y
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_HASH=y
CONFIG_SECURITY_APPARMOR_HASH_DEFAULT=y
# CONFIG_SECURITY_APPARMOR_DEBUG is not set
# CONFIG_SECURITY_LOADPIN is not set
CONFIG_SECURITY_YAMA=y
# CONFIG_SECURITY_SAFESETID is not set
# CONFIG_SECURITY_LOCKDOWN_LSM is not set
# CONFIG_SECURITY_LANDLOCK is not set
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
# CONFIG_INTEGRITY_PLATFORM_KEYRING is not set
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_TEMPLATE is not set
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
CONFIG_IMA_DEFAULT_HASH_SHA1=y
# CONFIG_IMA_DEFAULT_HASH_SHA256 is not set
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha1"
# CONFIG_IMA_WRITE_POLICY is not set
# CONFIG_IMA_READ_POLICY is not set
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
# CONFIG_IMA_DISABLE_HTABLE is not set
CONFIG_EVM=y
CONFIG_EVM_ATTR_FSUUID=y
# CONFIG_EVM_ADD_XATTRS is not set
# CONFIG_EVM_LOAD_X509 is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_APPARMOR is not set
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_SIMD=y

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
# CONFIG_CRYPTO_ECDSA is not set
# CONFIG_CRYPTO_ECRDSA is not set
# CONFIG_CRYPTO_SM2 is not set
# CONFIG_CRYPTO_CURVE25519 is not set
# CONFIG_CRYPTO_CURVE25519_X86 is not set

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
# CONFIG_CRYPTO_AEGIS128 is not set
# CONFIG_CRYPTO_AEGIS128_AESNI_SSE2 is not set
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=y
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_OFB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
# CONFIG_CRYPTO_KEYWRAP is not set
# CONFIG_CRYPTO_NHPOLY1305_SSE2 is not set
# CONFIG_CRYPTO_NHPOLY1305_AVX2 is not set
# CONFIG_CRYPTO_ADIANTUM is not set
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_BLAKE2B=m
# CONFIG_CRYPTO_BLAKE2S is not set
# CONFIG_CRYPTO_BLAKE2S_X86 is not set
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_POLY1305_X86_64=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA1_SSSE3=y
CONFIG_CRYPTO_SHA256_SSSE3=y
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
# CONFIG_CRYPTO_SM3 is not set
# CONFIG_CRYPTO_STREEBOG is not set
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_TI is not set
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_BLOWFISH_X86_64=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_DES3_EDE_X86_64 is not set
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_CHACHA20_X86_64=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
CONFIG_CRYPTO_SM4=m
# CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64 is not set
# CONFIG_CRYPTO_SM4_AESNI_AVX2_X86_64 is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=m
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
# CONFIG_CRYPTO_842 is not set
# CONFIG_CRYPTO_LZ4 is not set
# CONFIG_CRYPTO_LZ4HC is not set
# CONFIG_CRYPTO_ZSTD is not set

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_USER_API_RNG=y
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=y
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
# CONFIG_CRYPTO_STATS is not set
CONFIG_CRYPTO_HASH_INFO=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
# CONFIG_CRYPTO_LIB_BLAKE2S is not set
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
# CONFIG_CRYPTO_LIB_CHACHA is not set
# CONFIG_CRYPTO_LIB_CURVE25519 is not set
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11
CONFIG_CRYPTO_ARCH_HAVE_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
# CONFIG_CRYPTO_LIB_POLY1305 is not set
# CONFIG_CRYPTO_LIB_CHACHA20POLY1305 is not set
CONFIG_CRYPTO_LIB_SHA256=y
CONFIG_CRYPTO_LIB_SM4=m
CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
# CONFIG_CRYPTO_DEV_ATMEL_ECC is not set
# CONFIG_CRYPTO_DEV_ATMEL_SHA204A is not set
CONFIG_CRYPTO_DEV_CCP=y
CONFIG_CRYPTO_DEV_CCP_DD=m
CONFIG_CRYPTO_DEV_SP_CCP=y
CONFIG_CRYPTO_DEV_CCP_CRYPTO=m
CONFIG_CRYPTO_DEV_SP_PSP=y
# CONFIG_CRYPTO_DEV_CCP_DEBUGFS is not set
CONFIG_CRYPTO_DEV_QAT=m
CONFIG_CRYPTO_DEV_QAT_DH895xCC=m
CONFIG_CRYPTO_DEV_QAT_C3XXX=m
CONFIG_CRYPTO_DEV_QAT_C62X=m
# CONFIG_CRYPTO_DEV_QAT_4XXX is not set
CONFIG_CRYPTO_DEV_QAT_DH895xCCVF=m
CONFIG_CRYPTO_DEV_QAT_C3XXXVF=m
CONFIG_CRYPTO_DEV_QAT_C62XVF=m
CONFIG_CRYPTO_DEV_NITROX=m
CONFIG_CRYPTO_DEV_NITROX_CNN55XX=m
# CONFIG_CRYPTO_DEV_VIRTIO is not set
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
# CONFIG_ASYMMETRIC_TPM_KEY_SUBTYPE is not set
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
CONFIG_SIGNED_PE_FILE_VERIFICATION=y

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_MODULE_SIG_KEY_TYPE_RSA=y
# CONFIG_MODULE_SIG_KEY_TYPE_ECDSA is not set
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
# CONFIG_SYSTEM_REVOCATION_LIST is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=m
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_CORDIC=m
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
CONFIG_CRC_CCITT=y
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC64 is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=m
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
# CONFIG_XZ_DEC_MICROLZMA is not set
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_ENC8=y
CONFIG_REED_SOLOMON_DEC8=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT_MAP=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_SWIOTLB=y
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_MEMREGION=y
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE=y
CONFIG_ARCH_HAS_COPY_MC=y
CONFIG_ARCH_STACKWALK=y
CONFIG_SBITMAP=y
# end of Library routines

CONFIG_ASN1_ENCODER=y

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_PRINTK_CALLER=y
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_BOOT_PRINTK_DELAY=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
# CONFIG_GDB_SCRIPTS is not set
CONFIG_FRAME_WARN=2048
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
CONFIG_STACK_VALIDATION=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
# end of Generic Kernel Debugging Instruments

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
# CONFIG_DEBUG_WX is not set
CONFIG_GENERIC_PTDUMP=y
# CONFIG_PTDUMP_DEBUGFS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
# CONFIG_DEBUG_VIRTUAL is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

CONFIG_DEBUG_SHIRQ=y

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=480
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_WQ_WATCHDOG=y
# CONFIG_TEST_LOCKUP is not set
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_BOOTTIME_TRACING is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
# CONFIG_MMIOTRACE is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_BPF_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_RING_BUFFER_BENCHMARK=m
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
# CONFIG_SAMPLES is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y
# CONFIG_IO_STRICT_DEVMEM is not set

#
# x86 Debugging
#
CONFIG_TRACE_IRQFLAGS_NMI_SUPPORT=y
CONFIG_EARLY_PRINTK_USB=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_USB_XDBC=y
# CONFIG_EFI_PGT_DUMP is not set
# CONFIG_DEBUG_TLBFLUSH is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_X86_DECODER_SELFTEST=y
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_CPA_DEBUG is not set
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_DEBUG_NMI_SELFTEST is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_PUNIT_ATOM_DEBUG is not set
CONFIG_UNWINDER_ORC=y
# CONFIG_UNWINDER_FRAME_POINTER is not set
# end of x86 Debugging

#
# Kernel Testing and Coverage
#
# CONFIG_KUNIT is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
# CONFIG_FAULT_INJECTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
# CONFIG_LKDTM is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_DIV64 is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_PERCPU_TEST is not set
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_STRING_SELFTEST is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_SCANF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_OVERFLOW is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_HASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
# CONFIG_TEST_LIVEPATCH is not set
# CONFIG_TEST_STACKINIT is not set
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_HMM is not set
# CONFIG_TEST_FREE_PAGES is not set
# CONFIG_TEST_FPU is not set
# CONFIG_TEST_CLOCKSOURCE_WATCHDOG is not set
CONFIG_ARCH_USE_MEMTEST=y
# CONFIG_MEMTEST is not set
# CONFIG_HYPERV_TESTING is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking
#!/bin/sh

export_top_env()
{
	export suite='aim7'
	export testcase='aim7'
	export category='benchmark'
	export job_origin='aim7-fs-raid.yaml'
	export queue_cmdline_keys='branch
commit'
	export queue='validate'
	export testbox='lkp-csl-2sp9'
	export tbox_group='lkp-csl-2sp9'
	export kconfig='x86_64-rhel-8.3'
	export submit_id='61ef8eb0c0fb59091dc8d9a2'
	export job_file='/lkp/jobs/scheduled/lkp-csl-2sp9/aim7-performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006-debian-10.4-x86_64-20200603.cgz-a7f4e88080f3d50511400259cc613a-20220125-2333-1pc1wwq-3.yaml'
	export id='1175e2d184b17141364184ad52d1eb9a62648d0c'
	export queuer_version='/lkp-src'
	export model='Cascade Lake'
	export nr_node=2
	export nr_cpu=88
	export memory='128G'
	export nr_hdd_partitions=4
	export nr_ssd_partitions=1
	export hdd_partitions='/dev/disk/by-id/ata-ST4000NM0035-1V4107_ZC13Q1RD-part*'
	export ssd_partitions='/dev/disk/by-id/ata-INTEL_SSDSC2BB480G7_PHDV723200JX480BGN-part1'
	export rootfs_partition='/dev/disk/by-id/ata-INTEL_SSDSC2BB480G7_PHDV723200JX480BGN-part2'
	export brand='Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz'
	export commit='a7f4e88080f3d50511400259cc613a666d297227'
	export ucode='0x5003006'
	export need_kconfig_hw='{"I40E"=>"y"}
SATA_AHCI'
	export need_kconfig='{"BLK_DEV_RAM"=>"m"}
{"BLK_DEV"=>"y"}
{"BLOCK"=>"y"}
MD_RAID1
XFS_FS'
	export enqueue_time='2022-01-25 13:46:24 +0800'
	export _id='61ef8eb0c0fb59091dc8d9a2'
	export _rt='/result/aim7/performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006/lkp-csl-2sp9/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227'
	export user='lkp'
	export compiler='gcc-9'
	export LKP_SERVER='internal-lkp-server'
	export head_commit='dc194be385d9847f7c49c3511747cf7624df5afa'
	export base_commit='e783362eb54cd99b2cac8b3a9aeac942e6f6ac07'
	export branch='linux-review/Brian-Foster/xfs-require-an-rcu-grace-period-before-inode-recycle/20220121-222536'
	export rootfs='debian-10.4-x86_64-20200603.cgz'
	export result_root='/result/aim7/performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006/lkp-csl-2sp9/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/3'
	export scheduler_version='/lkp/lkp/.src-20220125-114532'
	export arch='x86_64'
	export max_uptime=2100
	export initrd='/osimage/debian/debian-10.4-x86_64-20200603.cgz'
	export bootloader_append='root=/dev/ram0
RESULT_ROOT=/result/aim7/performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006/lkp-csl-2sp9/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/3
BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/vmlinuz-5.16.0-rc5-00022-ga7f4e88080f3
branch=linux-review/Brian-Foster/xfs-require-an-rcu-grace-period-before-inode-recycle/20220121-222536
job=/lkp/jobs/scheduled/lkp-csl-2sp9/aim7-performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006-debian-10.4-x86_64-20200603.cgz-a7f4e88080f3d50511400259cc613a-20220125-2333-1pc1wwq-3.yaml
user=lkp
ARCH=x86_64
kconfig=x86_64-rhel-8.3
commit=a7f4e88080f3d50511400259cc613a666d297227
max_uptime=2100
LKP_SERVER=internal-lkp-server
nokaslr
selinux=0
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_loglevel
console=tty0
earlyprintk=ttyS0,115200
console=ttyS0,115200
vga=normal
rw'
	export modules_initrd='/pkg/linux/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/modules.cgz'
	export bm_initrd='/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20220105.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/perf_20220116.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/perf-x86_64-79e06c4c4950-1_20220116.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/md_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/fs_20210917.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/aim7-x86_64-1-1_20220124.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/mpstat_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/turbostat_20200721.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/turbostat-x86_64-3.7-4_20200721.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/sar-x86_64-34c92ae-1_20200702.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz'
	export ucode_initrd='/osimage/ucode/intel-ucode-20210222.cgz'
	export lkp_initrd='/osimage/user/lkp/lkp-x86_64.cgz'
	export site='inn'
	export LKP_CGI_PORT=80
	export LKP_CIFS_PORT=139
	export last_kernel='5.17.0-rc1'
	export repeat_to=6
	export schedule_notify_address=
	export kernel='/pkg/linux/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/vmlinuz-5.16.0-rc5-00022-ga7f4e88080f3'
	export dequeue_time='2022-01-25 13:50:04 +0800'
	export job_initrd='/lkp/jobs/scheduled/lkp-csl-2sp9/aim7-performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006-debian-10.4-x86_64-20200603.cgz-a7f4e88080f3d50511400259cc613a-20220125-2333-1pc1wwq-3.cgz'

	[ -n "$LKP_SRC" ] ||
	export LKP_SRC=/lkp/${user:-lkp}/src
}

run_job()
{
	echo $$ > $TMP/run-job.pid

	. $LKP_SRC/lib/http.sh
	. $LKP_SRC/lib/job.sh
	. $LKP_SRC/lib/env.sh

	export_top_env

	run_setup nr_brd=4 ramdisk_size=12884901888 $LKP_SRC/setup/disk

	run_setup raid_level='raid1' $LKP_SRC/setup/md

	run_setup fs='xfs' $LKP_SRC/setup/fs

	run_setup $LKP_SRC/setup/cpufreq_governor 'performance'

	run_monitor delay=15 $LKP_SRC/monitors/no-stdout/wrapper perf-profile
	run_monitor $LKP_SRC/monitors/wrapper kmsg
	run_monitor $LKP_SRC/monitors/no-stdout/wrapper boot-time
	run_monitor $LKP_SRC/monitors/wrapper uptime
	run_monitor $LKP_SRC/monitors/wrapper iostat
	run_monitor $LKP_SRC/monitors/wrapper heartbeat
	run_monitor $LKP_SRC/monitors/wrapper vmstat
	run_monitor $LKP_SRC/monitors/wrapper numa-numastat
	run_monitor $LKP_SRC/monitors/wrapper numa-vmstat
	run_monitor $LKP_SRC/monitors/wrapper numa-meminfo
	run_monitor $LKP_SRC/monitors/wrapper proc-vmstat
	run_monitor $LKP_SRC/monitors/wrapper proc-stat
	run_monitor $LKP_SRC/monitors/wrapper meminfo
	run_monitor $LKP_SRC/monitors/wrapper slabinfo
	run_monitor $LKP_SRC/monitors/wrapper interrupts
	run_monitor $LKP_SRC/monitors/wrapper lock_stat
	run_monitor lite_mode=1 $LKP_SRC/monitors/wrapper perf-sched
	run_monitor $LKP_SRC/monitors/wrapper softirqs
	run_monitor $LKP_SRC/monitors/one-shot/wrapper bdi_dev_mapping
	run_monitor $LKP_SRC/monitors/wrapper diskstats
	run_monitor $LKP_SRC/monitors/wrapper nfsstat
	run_monitor $LKP_SRC/monitors/wrapper cpuidle
	run_monitor $LKP_SRC/monitors/wrapper cpufreq-stats
	run_monitor $LKP_SRC/monitors/wrapper turbostat
	run_monitor $LKP_SRC/monitors/wrapper sched_debug
	run_monitor $LKP_SRC/monitors/wrapper perf-stat
	run_monitor $LKP_SRC/monitors/wrapper mpstat
	run_monitor $LKP_SRC/monitors/wrapper oom-killer
	run_monitor $LKP_SRC/monitors/plain/watchdog

	run_test test='disk_wrt' load=3000 $LKP_SRC/tests/wrapper aim7
}

extract_stats()
{
	export stats_part_begin=
	export stats_part_end=

	env delay=15 $LKP_SRC/stats/wrapper perf-profile
	env test='disk_wrt' load=3000 $LKP_SRC/stats/wrapper aim7
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper boot-time
	$LKP_SRC/stats/wrapper uptime
	$LKP_SRC/stats/wrapper iostat
	$LKP_SRC/stats/wrapper vmstat
	$LKP_SRC/stats/wrapper numa-numastat
	$LKP_SRC/stats/wrapper numa-vmstat
	$LKP_SRC/stats/wrapper numa-meminfo
	$LKP_SRC/stats/wrapper proc-vmstat
	$LKP_SRC/stats/wrapper meminfo
	$LKP_SRC/stats/wrapper slabinfo
	$LKP_SRC/stats/wrapper interrupts
	$LKP_SRC/stats/wrapper lock_stat
	env lite_mode=1 $LKP_SRC/stats/wrapper perf-sched
	$LKP_SRC/stats/wrapper softirqs
	$LKP_SRC/stats/wrapper diskstats
	$LKP_SRC/stats/wrapper nfsstat
	$LKP_SRC/stats/wrapper cpuidle
	$LKP_SRC/stats/wrapper turbostat
	$LKP_SRC/stats/wrapper sched_debug
	$LKP_SRC/stats/wrapper perf-stat
	$LKP_SRC/stats/wrapper mpstat

	$LKP_SRC/stats/wrapper time aim7.time
	$LKP_SRC/stats/wrapper dmesg
	$LKP_SRC/stats/wrapper kmsg
	$LKP_SRC/stats/wrapper last_state
	$LKP_SRC/stats/wrapper stderr
	$LKP_SRC/stats/wrapper time
}

"$@"
---
:#! jobs/aim7-fs-raid.yaml:
suite: aim7
testcase: aim7
category: benchmark
perf-profile:
  delay: 15
disk: 4BRD_12G
md: RAID1
fs: xfs
aim7:
  test: disk_wrt
  load: 3000
job_origin: aim7-fs-raid.yaml
:#! queue options:
queue_cmdline_keys:
- branch
- commit
queue: bisect
testbox: lkp-csl-2sp9
tbox_group: lkp-csl-2sp9
kconfig: x86_64-rhel-8.3
submit_id: 61ef4c01c0fb595258ef555b
job_file: "/lkp/jobs/scheduled/lkp-csl-2sp9/aim7-performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006-debian-10.4-x86_64-20200603.cgz-a7f4e88080f3d50511400259cc613a-20220125-21080-1sxyo2o-1.yaml"
id: c520994fa832c1f33a61c731eae6e76fb60bccc4
queuer_version: "/lkp-src"
:#! hosts/lkp-csl-2sp9:
model: Cascade Lake
nr_node: 2
nr_cpu: 88
memory: 128G
nr_hdd_partitions: 4
nr_ssd_partitions: 1
hdd_partitions: "/dev/disk/by-id/ata-ST4000NM0035-1V4107_ZC13Q1RD-part*"
ssd_partitions: "/dev/disk/by-id/ata-INTEL_SSDSC2BB480G7_PHDV723200JX480BGN-part1"
rootfs_partition: "/dev/disk/by-id/ata-INTEL_SSDSC2BB480G7_PHDV723200JX480BGN-part2"
brand: Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz
:#! include/category/benchmark:
kmsg:
boot-time:
uptime:
iostat:
heartbeat:
vmstat:
numa-numastat:
numa-vmstat:
numa-meminfo:
proc-vmstat:
proc-stat:
meminfo:
slabinfo:
interrupts:
lock_stat:
perf-sched:
  lite_mode: 1
softirqs:
bdi_dev_mapping:
diskstats:
nfsstat:
cpuidle:
cpufreq-stats:
turbostat:
sched_debug:
perf-stat:
mpstat:
:#! include/category/ALL:
cpufreq_governor: performance
:#! include/queue/cyclic:
commit: a7f4e88080f3d50511400259cc613a666d297227
:#! include/testbox/lkp-csl-2sp9:
ucode: '0x5003006'
need_kconfig_hw:
- I40E: y
- SATA_AHCI
:#! include/disk/nr_brd:
need_kconfig:
- BLK_DEV_RAM: m
- BLK_DEV: y
- BLOCK: y
- MD_RAID1
- XFS_FS
:#! include/md/raid_level:
:#! include/fs/OTHERS:
enqueue_time: 2022-01-25 09:01:53.351186321 +08:00
_id: 61ef4c11c0fb595258ef555c
_rt: "/result/aim7/performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006/lkp-csl-2sp9/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227"
:#! schedule options:
user: lkp
compiler: gcc-9
LKP_SERVER: internal-lkp-server
head_commit: dc194be385d9847f7c49c3511747cf7624df5afa
base_commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
branch: linux-devel/devel-hourly-20220124-165037
rootfs: debian-10.4-x86_64-20200603.cgz
result_root: "/result/aim7/performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006/lkp-csl-2sp9/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/1"
scheduler_version: "/lkp/lkp/.src-20220119-222532"
arch: x86_64
max_uptime: 2100
initrd: "/osimage/debian/debian-10.4-x86_64-20200603.cgz"
bootloader_append:
- root=/dev/ram0
- RESULT_ROOT=/result/aim7/performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006/lkp-csl-2sp9/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/1
- BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/vmlinuz-5.16.0-rc5-00022-ga7f4e88080f3
- branch=linux-devel/devel-hourly-20220124-165037
- job=/lkp/jobs/scheduled/lkp-csl-2sp9/aim7-performance-4BRD_12G-xfs-3000-RAID1-disk_wrt-ucode=0x5003006-debian-10.4-x86_64-20200603.cgz-a7f4e88080f3d50511400259cc613a-20220125-21080-1sxyo2o-1.yaml
- user=lkp
- ARCH=x86_64
- kconfig=x86_64-rhel-8.3
- commit=a7f4e88080f3d50511400259cc613a666d297227
- max_uptime=2100
- LKP_SERVER=internal-lkp-server
- nokaslr
- selinux=0
- debug
- apic=debug
- sysrq_always_enabled
- rcupdate.rcu_cpu_stall_timeout=100
- net.ifnames=0
- printk.devkmsg=on
- panic=-1
- softlockup_panic=1
- nmi_watchdog=panic
- oops=panic
- load_ramdisk=2
- prompt_ramdisk=0
- drbd.minor_count=8
- systemd.log_level=err
- ignore_loglevel
- console=tty0
- earlyprintk=ttyS0,115200
- console=ttyS0,115200
- vga=normal
- rw
modules_initrd: "/pkg/linux/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/modules.cgz"
bm_initrd: "/osimage/deps/debian-10.4-x86_64-20200603.cgz/run-ipconfig_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/lkp_20220105.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/rsync-rootfs_20200608.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/perf_20220116.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/perf-x86_64-79e06c4c4950-1_20220116.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/md_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/fs_20210917.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/aim7-x86_64-1-1_20220124.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/mpstat_20200714.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/turbostat_20200721.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/turbostat-x86_64-3.7-4_20200721.cgz,/osimage/pkg/debian-10.4-x86_64-20200603.cgz/sar-x86_64-34c92ae-1_20200702.cgz,/osimage/deps/debian-10.4-x86_64-20200603.cgz/hw_20200715.cgz"
ucode_initrd: "/osimage/ucode/intel-ucode-20210222.cgz"
lkp_initrd: "/osimage/user/lkp/lkp-x86_64.cgz"
site: inn
:#! /cephfs/db/releases/20220124151123/lkp-src/include/site/inn:
LKP_CGI_PORT: 80
LKP_CIFS_PORT: 139
oom-killer:
watchdog:
:#! runtime status:
last_kernel: 5.16.0-rc5-00022-ga7f4e88080f3
repeat_to: 3
schedule_notify_address:
:#! user overrides:
kernel: "/pkg/linux/x86_64-rhel-8.3/gcc-9/a7f4e88080f3d50511400259cc613a666d297227/vmlinuz-5.16.0-rc5-00022-ga7f4e88080f3"
dequeue_time: 2022-01-25 09:10:51.368472481 +08:00
:#! /cephfs/db/releases/20220124220107/lkp-src/include/site/inn:
job_state: finished
loadavg: 1935.18 745.16 271.00 1/902 12505
start_time: '1643073114'
end_time: '1643073216'
version: "/lkp/lkp/.src-20220119-222632:b5d83127:af6983726"
"modprobe" "-r" "brd"
 "modprobe" "brd" "rd_nr=4" "rd_size=12582912"
 "dmsetup" "remove_all"
 "wipefs" "-a" "--force" "/dev/ram0"
 "wipefs" "-a" "--force" "/dev/ram1"
 "wipefs" "-a" "--force" "/dev/ram2"
 "wipefs" "-a" "--force" "/dev/ram3"
 "mdadm" "-q" "--create" "/dev/md0" "--chunk=256" "--level=raid1" "--raid-devices=4" "--force" "--assume-clean" "/dev/ram0" "/dev/ram1" "/dev/ram2" "/dev/ram3"
wipefs -a --force /dev/md0
mkfs -t xfs -f /dev/md0
mkdir -p /fs/md0
modprobe xfs
mount -t xfs -o inode64 /dev/md0 /fs/md0

for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
do
	online_file="$cpu_dir"/online
	[ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue

	file="$cpu_dir"/cpufreq/scaling_governor
	[ -f "$file" ] && echo "performance" > "$file"
done

echo "500 32000 128 512" > /proc/sys/kernel/sem
cat > workfile <<EOF
FILESIZE: 1M
POOLSIZE: 10M
10 disk_wrt
EOF
echo "/fs/md0" > config

	(
		echo lkp-csl-2sp9
		echo disk_wrt

		echo 1
		echo 3000
		echo 2
		echo 3000
		echo 1
	) | ./multitask -t &
Paul E. McKenney Jan. 25, 2022, 2:40 p.m. UTC | #14
On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> On Mon, Jan 24, 2022 at 06:29:18PM -0500, Brian Foster wrote:
> > On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> > > > FYI, I modified my repeated alloc/free test to do some batching and form
> > > > it into something more able to measure the potential side effect / cost
> > > > of the grace period sync. The test is a single threaded, file alloc/free
> > > > loop using a variable per iteration batch size. The test runs for ~60s
> > > > and reports how many total files were allocated/freed in that period
> > > > with the specified batch size. Note that this particular test ran
> > > > without any background workload. Results are as follows:
> > > > 
> > > > 	files		baseline	test
> > > > 
> > > > 	1		38480		38437
> > > > 	4		126055		111080
> > > > 	8		218299		134469
> > > > 	16		306619		141968
> > > > 	32		397909		152267
> > > > 	64		418603		200875
> > > > 	128		469077		289365
> > > > 	256		684117		566016
> > > > 	512		931328		878933
> > > > 	1024		1126741		1118891
> > > 
> > > Can you post the test code, because 38,000 alloc/unlinks in 60s is
> > > extremely slow for a single tight open-unlink-close loop. I'd be
> > > expecting at least ~10,000 alloc/unlink iterations per second, not
> > > 650/second.
> > > 
> > 
> > Hm, Ok. My test was just a bash script doing a 'touch <files>; rm
> > <files>' loop. I know there was application overhead because if I
> > tweaked the script to open an fd directly rather than use touch, the
> > single file performance jumped up a bit, but it seemed to wash away as I
> > increased the file count so I kept running it with larger sizes. This
> > seems off so I'll port it over to C code and see how much the numbers
> > change.
> 
> Yeah, using touch/rm becomes fork/exec bound very quickly. You'll
> find that using "echo > <file>" is much faster than "touch <file>"
> because it runs a shell built-in operation without fork/exec
> overhead to create the file. But you can't play tricks like that to
> replace rm:
> 
> $ time for ((i=0;i<1000;i++)); do touch /mnt/scratch/foo; rm /mnt/scratch/foo ; done
> 
> real    0m2.653s
> user    0m0.910s
> sys     0m2.051s
> $ time for ((i=0;i<1000;i++)); do echo > /mnt/scratch/foo; rm /mnt/scratch/foo ; done
> 
> real    0m1.260s
> user    0m0.452s
> sys     0m0.913s
> $ time ./open-unlink 1000 /mnt/scratch/foo
> 
> real    0m0.037s
> user    0m0.001s
> sys     0m0.030s
> $
> 
> Note the difference in system time between the three operations -
> almost all the difference in system CPU time is the overhead of
> fork/exec to run the touch/rm binaries, not do the filesystem
> operations....
> 
> > > > That's just a test of a quick hack, however. Since there is no real
> > > > urgency to inactivate an unlinked inode (it has no potential users until
> > > > it's freed),
> > > 
> > > On the contrary, there is extreme urgency to inactivate inodes
> > > quickly.
> > > 
> > 
> > Ok, I think we're talking about slightly different things. What I mean
> > above is that if a task removes a file and goes off doing unrelated
> > $work, that inode will just sit on the percpu queue indefinitely. That's
> > fine, as there's no functional need for us to process it immediately
> > unless we're around -ENOSPC thresholds or some such that demand reclaim
> > of the inode.
> 
> Yup, an occasional unlink sitting around for a while on an unlinked
> list isn't going to cause a performance problem.  Indeed, such
> workloads are more likely to benefit from the reduced unlink()
> syscall overhead and won't even notice the increase in background
> CPU overhead for inactivation of those occasional inodes.
> 
> > It sounds like what you're talking about is specifically
> > the behavior/performance of sustained file removal (which is important
> > obviously), where apparently there is a notable degradation if the
> > queues become deep enough to push the inode batches out of CPU cache. So
> > that makes sense...
> 
> Yup, sustained bulk throughput is where cache residency really
> matters. And for unlink, sustained unlink workloads are quite
> common; they often are something people wait for on the command line
> or make up a performance critical component of a highly concurrent
> workload so it's pretty important to get this part right.
> 
> > > Darrick made the original assumption that we could delay
> > > inactivation indefinitely and so he allowed really deep queues of up
> > > to 64k deferred inactivations. But with queues this deep, we could
> > > never get that background inactivation code to perform anywhere near
> > > the original synchronous background inactivation code. e.g. I
> > > measured 60-70% performance degradataions on my scalability tests,
> > > and nothing stood out in the profiles until I started looking at
> > > CPU data cache misses.
> > > 
> > 
> > ... but could you elaborate on the scalability tests involved here so I
> > can get a better sense of it in practice and perhaps observe the impact
> > of changes in this path?
> 
> The same conconrrent fsmark create/traverse/unlink workloads I've
> been running for the past decade+ demonstrates it pretty simply. I
> also saw regressions with dbench (both op latency and throughput) as
> the clinet count (concurrency) increased, and with compilebench.  I
> didn't look much further because all the common benchmarks I ran
> showed perf degradations with arbitrary delays that went away with
> the current code we have.  ISTR that parts of aim7/reaim scalability
> workloads that the intel zero-day infrastructure runs are quite
> sensitive to background inactivation delays as well because that's a
> CPU bound workload and hence any reduction in cache residency
> results in a reduction of the number of concurrent jobs that can be
> run.

Curiosity and all that, but has this work produced any intuition on
the sensitivity of the performance/scalability to the delays?  As in
the effect of microseconds vs. tens of microsecond vs. hundreds of
microseconds?

							Thanx, Paul
Brian Foster Jan. 25, 2022, 6:30 p.m. UTC | #15
On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> On Mon, Jan 24, 2022 at 06:29:18PM -0500, Brian Foster wrote:
> > On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> > > > FYI, I modified my repeated alloc/free test to do some batching and form
> > > > it into something more able to measure the potential side effect / cost
> > > > of the grace period sync. The test is a single threaded, file alloc/free
> > > > loop using a variable per iteration batch size. The test runs for ~60s
> > > > and reports how many total files were allocated/freed in that period
> > > > with the specified batch size. Note that this particular test ran
> > > > without any background workload. Results are as follows:
> > > > 
> > > > 	files		baseline	test
> > > > 
> > > > 	1		38480		38437
> > > > 	4		126055		111080
> > > > 	8		218299		134469
> > > > 	16		306619		141968
> > > > 	32		397909		152267
> > > > 	64		418603		200875
> > > > 	128		469077		289365
> > > > 	256		684117		566016
> > > > 	512		931328		878933
> > > > 	1024		1126741		1118891
> > > 
> > > Can you post the test code, because 38,000 alloc/unlinks in 60s is
> > > extremely slow for a single tight open-unlink-close loop. I'd be
> > > expecting at least ~10,000 alloc/unlink iterations per second, not
> > > 650/second.
> > > 
> > 
> > Hm, Ok. My test was just a bash script doing a 'touch <files>; rm
> > <files>' loop. I know there was application overhead because if I
> > tweaked the script to open an fd directly rather than use touch, the
> > single file performance jumped up a bit, but it seemed to wash away as I
> > increased the file count so I kept running it with larger sizes. This
> > seems off so I'll port it over to C code and see how much the numbers
> > change.
> 
> Yeah, using touch/rm becomes fork/exec bound very quickly. You'll
> find that using "echo > <file>" is much faster than "touch <file>"
> because it runs a shell built-in operation without fork/exec
> overhead to create the file. But you can't play tricks like that to
> replace rm:
> 

I had used 'exec' to open an fd (same idea) in the single file case and
tested with that, saw that the increase was consistent and took that
along with the increasing performance as batch sizes increased to mean
that the application overhead wasn't a factor as the test scaled. That
was clearly wrong, because if I port the whole thing to a C program the
baseline numbers are way off. I think what also threw me off is that the
single file test kernel case is actually fairly accurate between the two
tests. Anyways, here's a series of (single run, no averaging, etc.) test
runs with the updated test. Note that I reduced the runtime to 10s here
since the test was running so much faster. Otherwise this is the same
batched open/close -> unlink behavior:

                baseline        test
batch:  1       files:  893579  files:  41841
batch:  2       files:  912502  files:  41922
batch:  4       files:  930424  files:  42084
batch:  8       files:  932072  files:  41536
batch:  16      files:  930624  files:  41616
batch:  32      files:  777088  files:  41120
batch:  64      files:  567936  files:  57216
batch:  128     files:  579840  files:  96256
batch:  256     files:  548608  files:  174080
batch:  512     files:  546816  files:  246784
batch:  1024    files:  509952  files:  328704
batch:  2048    files:  505856  files:  399360
batch:  4096    files:  479232  files:  438272

So this shows that the performance delta is actually massive from the
start. For reference, a single threaded, empty file, non syncing,
fs_mark workload stabilizes at around ~55k files/sec on this fs. Both
kernels sort of converge to that rate as the batch size increases, only
the baseline kernel starts much faster and normalizes while the test
kernel starts much slower and improves (and still really doesn't hit the
mark even at a 4k batch size).

My takeaway from this is that we may need to find a way to mitigate this
overhead somewhat better than what the current patch does. Otherwise,
this is a significant dropoff from even a pure allocation workload in
simple mixed workload scenarios...

> $ time for ((i=0;i<1000;i++)); do touch /mnt/scratch/foo; rm /mnt/scratch/foo ; done
> 
> real    0m2.653s
> user    0m0.910s
> sys     0m2.051s
> $ time for ((i=0;i<1000;i++)); do echo > /mnt/scratch/foo; rm /mnt/scratch/foo ; done
> 
> real    0m1.260s
> user    0m0.452s
> sys     0m0.913s
> $ time ./open-unlink 1000 /mnt/scratch/foo
> 
> real    0m0.037s
> user    0m0.001s
> sys     0m0.030s
> $
> 
> Note the difference in system time between the three operations -
> almost all the difference in system CPU time is the overhead of
> fork/exec to run the touch/rm binaries, not do the filesystem
> operations....
> 
> > > > That's just a test of a quick hack, however. Since there is no real
> > > > urgency to inactivate an unlinked inode (it has no potential users until
> > > > it's freed),
> > > 
> > > On the contrary, there is extreme urgency to inactivate inodes
> > > quickly.
> > > 
> > 
> > Ok, I think we're talking about slightly different things. What I mean
> > above is that if a task removes a file and goes off doing unrelated
> > $work, that inode will just sit on the percpu queue indefinitely. That's
> > fine, as there's no functional need for us to process it immediately
> > unless we're around -ENOSPC thresholds or some such that demand reclaim
> > of the inode.
> 
> Yup, an occasional unlink sitting around for a while on an unlinked
> list isn't going to cause a performance problem.  Indeed, such
> workloads are more likely to benefit from the reduced unlink()
> syscall overhead and won't even notice the increase in background
> CPU overhead for inactivation of those occasional inodes.
> 
> > It sounds like what you're talking about is specifically
> > the behavior/performance of sustained file removal (which is important
> > obviously), where apparently there is a notable degradation if the
> > queues become deep enough to push the inode batches out of CPU cache. So
> > that makes sense...
> 
> Yup, sustained bulk throughput is where cache residency really
> matters. And for unlink, sustained unlink workloads are quite
> common; they often are something people wait for on the command line
> or make up a performance critical component of a highly concurrent
> workload so it's pretty important to get this part right.
> 
> > > Darrick made the original assumption that we could delay
> > > inactivation indefinitely and so he allowed really deep queues of up
> > > to 64k deferred inactivations. But with queues this deep, we could
> > > never get that background inactivation code to perform anywhere near
> > > the original synchronous background inactivation code. e.g. I
> > > measured 60-70% performance degradataions on my scalability tests,
> > > and nothing stood out in the profiles until I started looking at
> > > CPU data cache misses.
> > > 
> > 
> > ... but could you elaborate on the scalability tests involved here so I
> > can get a better sense of it in practice and perhaps observe the impact
> > of changes in this path?
> 
> The same conconrrent fsmark create/traverse/unlink workloads I've
> been running for the past decade+ demonstrates it pretty simply. I
> also saw regressions with dbench (both op latency and throughput) as
> the clinet count (concurrency) increased, and with compilebench.  I
> didn't look much further because all the common benchmarks I ran
> showed perf degradations with arbitrary delays that went away with
> the current code we have.  ISTR that parts of aim7/reaim scalability
> workloads that the intel zero-day infrastructure runs are quite
> sensitive to background inactivation delays as well because that's a
> CPU bound workload and hence any reduction in cache residency
> results in a reduction of the number of concurrent jobs that can be
> run.
> 

Ok, so if I (single threaded) create (via fs_mark), sync and remove 5m
empty files, the remove takes about a minute. If I just bump out the
current queue and block thresholds by 10x and repeat, that time
increases to about ~1m24s. If I hack up a kernel to disable queueing
entirely (i.e. fully synchronous inactivation), then I'm back to about a
minute again. So I'm not producing any performance benefit with
queueing/batching in this single threaded scenario, but I suspect the
10x threshold delta is at least measuring the negative effect of poor
caching..? (Any decent way to confirm that..?).

And of course if I take the baseline kernel and stick a
cond_synchronize_rcu() in xfs_inactive_ifree() it brings the batch test
numbers right back but slows the removal test way down. What I find
interesting however is that if I hack up something more mild like invoke
cond_synchronize_rcu() on the oldest inode in the current inactivation
batch, bump out the blocking threshold as above (but leave the queueing
threshold at 32), and leave the iget side cond_sync_rcu() to catch
whatever falls through, my 5m file remove test now completes ~5-10s
faster than baseline and I see the following results from the batched
alloc/free test:

batch:  1       files:  731923
batch:  2       files:  693020
batch:  4       files:  750948
batch:  8       files:  743296
batch:  16      files:  738720
batch:  32      files:  746240
batch:  64      files:  598464
batch:  128     files:  672896
batch:  256     files:  633856
batch:  512     files:  605184
batch:  1024    files:  569344
batch:  2048    files:  555008
batch:  4096    files:  524288

Hm?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Brian Foster Jan. 25, 2022, 8:07 p.m. UTC | #16
On Tue, Jan 25, 2022 at 01:30:36PM -0500, Brian Foster wrote:
> On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> > On Mon, Jan 24, 2022 at 06:29:18PM -0500, Brian Foster wrote:
> > > On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> > > > > FYI, I modified my repeated alloc/free test to do some batching and form
> > > > > it into something more able to measure the potential side effect / cost
> > > > > of the grace period sync. The test is a single threaded, file alloc/free
> > > > > loop using a variable per iteration batch size. The test runs for ~60s
> > > > > and reports how many total files were allocated/freed in that period
> > > > > with the specified batch size. Note that this particular test ran
> > > > > without any background workload. Results are as follows:
> > > > > 
> > > > > 	files		baseline	test
> > > > > 
> > > > > 	1		38480		38437
> > > > > 	4		126055		111080
> > > > > 	8		218299		134469
> > > > > 	16		306619		141968
> > > > > 	32		397909		152267
> > > > > 	64		418603		200875
> > > > > 	128		469077		289365
> > > > > 	256		684117		566016
> > > > > 	512		931328		878933
> > > > > 	1024		1126741		1118891
> > > > 
> > > > Can you post the test code, because 38,000 alloc/unlinks in 60s is
> > > > extremely slow for a single tight open-unlink-close loop. I'd be
> > > > expecting at least ~10,000 alloc/unlink iterations per second, not
> > > > 650/second.
> > > > 
> > > 
> > > Hm, Ok. My test was just a bash script doing a 'touch <files>; rm
> > > <files>' loop. I know there was application overhead because if I
> > > tweaked the script to open an fd directly rather than use touch, the
> > > single file performance jumped up a bit, but it seemed to wash away as I
> > > increased the file count so I kept running it with larger sizes. This
> > > seems off so I'll port it over to C code and see how much the numbers
> > > change.
> > 
> > Yeah, using touch/rm becomes fork/exec bound very quickly. You'll
> > find that using "echo > <file>" is much faster than "touch <file>"
> > because it runs a shell built-in operation without fork/exec
> > overhead to create the file. But you can't play tricks like that to
> > replace rm:
> > 
> 
> I had used 'exec' to open an fd (same idea) in the single file case and
> tested with that, saw that the increase was consistent and took that
> along with the increasing performance as batch sizes increased to mean
> that the application overhead wasn't a factor as the test scaled. That
> was clearly wrong, because if I port the whole thing to a C program the
> baseline numbers are way off. I think what also threw me off is that the
> single file test kernel case is actually fairly accurate between the two
> tests. Anyways, here's a series of (single run, no averaging, etc.) test
> runs with the updated test. Note that I reduced the runtime to 10s here
> since the test was running so much faster. Otherwise this is the same
> batched open/close -> unlink behavior:
> 
>                 baseline        test
> batch:  1       files:  893579  files:  41841
> batch:  2       files:  912502  files:  41922
> batch:  4       files:  930424  files:  42084
> batch:  8       files:  932072  files:  41536
> batch:  16      files:  930624  files:  41616
> batch:  32      files:  777088  files:  41120
> batch:  64      files:  567936  files:  57216
> batch:  128     files:  579840  files:  96256
> batch:  256     files:  548608  files:  174080
> batch:  512     files:  546816  files:  246784
> batch:  1024    files:  509952  files:  328704
> batch:  2048    files:  505856  files:  399360
> batch:  4096    files:  479232  files:  438272
> 
> So this shows that the performance delta is actually massive from the
> start. For reference, a single threaded, empty file, non syncing,
> fs_mark workload stabilizes at around ~55k files/sec on this fs. Both
> kernels sort of converge to that rate as the batch size increases, only
> the baseline kernel starts much faster and normalizes while the test
> kernel starts much slower and improves (and still really doesn't hit the
> mark even at a 4k batch size).
> 
> My takeaway from this is that we may need to find a way to mitigate this
> overhead somewhat better than what the current patch does. Otherwise,
> this is a significant dropoff from even a pure allocation workload in
> simple mixed workload scenarios...
> 
> > $ time for ((i=0;i<1000;i++)); do touch /mnt/scratch/foo; rm /mnt/scratch/foo ; done
> > 
> > real    0m2.653s
> > user    0m0.910s
> > sys     0m2.051s
> > $ time for ((i=0;i<1000;i++)); do echo > /mnt/scratch/foo; rm /mnt/scratch/foo ; done
> > 
> > real    0m1.260s
> > user    0m0.452s
> > sys     0m0.913s
> > $ time ./open-unlink 1000 /mnt/scratch/foo
> > 
> > real    0m0.037s
> > user    0m0.001s
> > sys     0m0.030s
> > $
> > 
> > Note the difference in system time between the three operations -
> > almost all the difference in system CPU time is the overhead of
> > fork/exec to run the touch/rm binaries, not do the filesystem
> > operations....
> > 
> > > > > That's just a test of a quick hack, however. Since there is no real
> > > > > urgency to inactivate an unlinked inode (it has no potential users until
> > > > > it's freed),
> > > > 
> > > > On the contrary, there is extreme urgency to inactivate inodes
> > > > quickly.
> > > > 
> > > 
> > > Ok, I think we're talking about slightly different things. What I mean
> > > above is that if a task removes a file and goes off doing unrelated
> > > $work, that inode will just sit on the percpu queue indefinitely. That's
> > > fine, as there's no functional need for us to process it immediately
> > > unless we're around -ENOSPC thresholds or some such that demand reclaim
> > > of the inode.
> > 
> > Yup, an occasional unlink sitting around for a while on an unlinked
> > list isn't going to cause a performance problem.  Indeed, such
> > workloads are more likely to benefit from the reduced unlink()
> > syscall overhead and won't even notice the increase in background
> > CPU overhead for inactivation of those occasional inodes.
> > 
> > > It sounds like what you're talking about is specifically
> > > the behavior/performance of sustained file removal (which is important
> > > obviously), where apparently there is a notable degradation if the
> > > queues become deep enough to push the inode batches out of CPU cache. So
> > > that makes sense...
> > 
> > Yup, sustained bulk throughput is where cache residency really
> > matters. And for unlink, sustained unlink workloads are quite
> > common; they often are something people wait for on the command line
> > or make up a performance critical component of a highly concurrent
> > workload so it's pretty important to get this part right.
> > 
> > > > Darrick made the original assumption that we could delay
> > > > inactivation indefinitely and so he allowed really deep queues of up
> > > > to 64k deferred inactivations. But with queues this deep, we could
> > > > never get that background inactivation code to perform anywhere near
> > > > the original synchronous background inactivation code. e.g. I
> > > > measured 60-70% performance degradataions on my scalability tests,
> > > > and nothing stood out in the profiles until I started looking at
> > > > CPU data cache misses.
> > > > 
> > > 
> > > ... but could you elaborate on the scalability tests involved here so I
> > > can get a better sense of it in practice and perhaps observe the impact
> > > of changes in this path?
> > 
> > The same conconrrent fsmark create/traverse/unlink workloads I've
> > been running for the past decade+ demonstrates it pretty simply. I
> > also saw regressions with dbench (both op latency and throughput) as
> > the clinet count (concurrency) increased, and with compilebench.  I
> > didn't look much further because all the common benchmarks I ran
> > showed perf degradations with arbitrary delays that went away with
> > the current code we have.  ISTR that parts of aim7/reaim scalability
> > workloads that the intel zero-day infrastructure runs are quite
> > sensitive to background inactivation delays as well because that's a
> > CPU bound workload and hence any reduction in cache residency
> > results in a reduction of the number of concurrent jobs that can be
> > run.
> > 
> 
> Ok, so if I (single threaded) create (via fs_mark), sync and remove 5m
> empty files, the remove takes about a minute. If I just bump out the
> current queue and block thresholds by 10x and repeat, that time
> increases to about ~1m24s. If I hack up a kernel to disable queueing
> entirely (i.e. fully synchronous inactivation), then I'm back to about a
> minute again. So I'm not producing any performance benefit with
> queueing/batching in this single threaded scenario, but I suspect the
> 10x threshold delta is at least measuring the negative effect of poor
> caching..? (Any decent way to confirm that..?).
> 
> And of course if I take the baseline kernel and stick a
> cond_synchronize_rcu() in xfs_inactive_ifree() it brings the batch test
> numbers right back but slows the removal test way down. What I find
> interesting however is that if I hack up something more mild like invoke
> cond_synchronize_rcu() on the oldest inode in the current inactivation
> batch, bump out the blocking threshold as above (but leave the queueing
> threshold at 32), and leave the iget side cond_sync_rcu() to catch
> whatever falls through, my 5m file remove test now completes ~5-10s
> faster than baseline and I see the following results from the batched
> alloc/free test:
> 
> batch:  1       files:  731923
> batch:  2       files:  693020
> batch:  4       files:  750948
> batch:  8       files:  743296
> batch:  16      files:  738720
> batch:  32      files:  746240
> batch:  64      files:  598464
> batch:  128     files:  672896
> batch:  256     files:  633856
> batch:  512     files:  605184
> batch:  1024    files:  569344
> batch:  2048    files:  555008
> batch:  4096    files:  524288
> 

This experiment had a bug that was dropping some inactivations on the
floor. With that fixed, the numbers aren't quite as good. The batch test
numbers still improve significantly from the posted patch (i.e. up in
the range of 38-45k files/sec), but still lag the normal allocation
rate, and the large rm test goes up to 1m40s (instead of 1m on
baseline).

Brian

> Hm?
> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> >
Dave Chinner Jan. 25, 2022, 10:36 p.m. UTC | #17
On Tue, Jan 25, 2022 at 06:40:44AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> > > Ok, I think we're talking about slightly different things. What I mean
> > > above is that if a task removes a file and goes off doing unrelated
> > > $work, that inode will just sit on the percpu queue indefinitely. That's
> > > fine, as there's no functional need for us to process it immediately
> > > unless we're around -ENOSPC thresholds or some such that demand reclaim
> > > of the inode.
> > 
> > Yup, an occasional unlink sitting around for a while on an unlinked
> > list isn't going to cause a performance problem.  Indeed, such
> > workloads are more likely to benefit from the reduced unlink()
> > syscall overhead and won't even notice the increase in background
> > CPU overhead for inactivation of those occasional inodes.
> > 
> > > It sounds like what you're talking about is specifically
> > > the behavior/performance of sustained file removal (which is important
> > > obviously), where apparently there is a notable degradation if the
> > > queues become deep enough to push the inode batches out of CPU cache. So
> > > that makes sense...
> > 
> > Yup, sustained bulk throughput is where cache residency really
> > matters. And for unlink, sustained unlink workloads are quite
> > common; they often are something people wait for on the command line
> > or make up a performance critical component of a highly concurrent
> > workload so it's pretty important to get this part right.
> > 
> > > > Darrick made the original assumption that we could delay
> > > > inactivation indefinitely and so he allowed really deep queues of up
> > > > to 64k deferred inactivations. But with queues this deep, we could
> > > > never get that background inactivation code to perform anywhere near
> > > > the original synchronous background inactivation code. e.g. I
> > > > measured 60-70% performance degradataions on my scalability tests,
> > > > and nothing stood out in the profiles until I started looking at
> > > > CPU data cache misses.
> > > > 
> > > 
> > > ... but could you elaborate on the scalability tests involved here so I
> > > can get a better sense of it in practice and perhaps observe the impact
> > > of changes in this path?
> > 
> > The same conconrrent fsmark create/traverse/unlink workloads I've
> > been running for the past decade+ demonstrates it pretty simply. I
> > also saw regressions with dbench (both op latency and throughput) as
> > the clinet count (concurrency) increased, and with compilebench.  I
> > didn't look much further because all the common benchmarks I ran
> > showed perf degradations with arbitrary delays that went away with
> > the current code we have.  ISTR that parts of aim7/reaim scalability
> > workloads that the intel zero-day infrastructure runs are quite
> > sensitive to background inactivation delays as well because that's a
> > CPU bound workload and hence any reduction in cache residency
> > results in a reduction of the number of concurrent jobs that can be
> > run.
> 
> Curiosity and all that, but has this work produced any intuition on
> the sensitivity of the performance/scalability to the delays?  As in
> the effect of microseconds vs. tens of microsecond vs. hundreds of
> microseconds?

Some, yes.

The upper delay threshold where performance is measurably
impacted is in the order of single digit milliseconds, not
microseconds.

What I saw was that as the batch processing delay goes beyond ~5ms,
IPC starts to fall. The CPU usage profile does not change shape, nor
does the proportions of where CPU time is spent change. All I see if
data cache misses go up substantially and IPC drop substantially. If
I read my notes correctly, typical change from "fast" to "slow" in
IPC was 0.82 to 0.39 and LLC-load-misses from 3% to 12%. The IPC
degradation was all done by the time the background batch processing
times were longer than a typical scheduler tick (10ms).

Now, I've been testing on Xeon CPUs with 36-76MB of l2-l3 caches, so
there's a fair amount of data that these can hold. I expect that
with smaller caches, the inflection point will be at smaller batch
sizes rather than more. Hence while I could have used larger batches
for background processing (e.g. 64-128 inodes rather than 32), I
chose smaller batch sizes by default so that CPUs with smaller
caches are less likely to be adversely affected by the batch size
being too large. OTOH, I started to measure noticable degradation by
batch sizes of 256 inodes on my machines, which is why the hard
queue limit got set to 256 inodes.

Scaling the delay/batch size down towards single inode queuing also
resulted in perf degradation. This was largely because of all the
extra scheduling overhead that trying to switching between user task
and kernel worker task for every inode entailed. Context switch rate
went from a couple of thousand/sec to over 100,000/s for single
inode batches, and performance went backwards in proportion with the
amount of CPU then spent on context switches. It also lead to
increases in buffer lock contention (hence context switches) as both
user task and kworker try to access the same buffers...

Cheers,

Dave.
Dave Chinner Jan. 25, 2022, 10:45 p.m. UTC | #18
On Tue, Jan 25, 2022 at 01:30:36PM -0500, Brian Foster wrote:
> On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> > On Mon, Jan 24, 2022 at 06:29:18PM -0500, Brian Foster wrote:
> > > On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> > > ... but could you elaborate on the scalability tests involved here so I
> > > can get a better sense of it in practice and perhaps observe the impact
> > > of changes in this path?
> > 
> > The same conconrrent fsmark create/traverse/unlink workloads I've
> > been running for the past decade+ demonstrates it pretty simply. I
> > also saw regressions with dbench (both op latency and throughput) as
> > the clinet count (concurrency) increased, and with compilebench.  I
> > didn't look much further because all the common benchmarks I ran
> > showed perf degradations with arbitrary delays that went away with
> > the current code we have.  ISTR that parts of aim7/reaim scalability
> > workloads that the intel zero-day infrastructure runs are quite
> > sensitive to background inactivation delays as well because that's a
> > CPU bound workload and hence any reduction in cache residency
> > results in a reduction of the number of concurrent jobs that can be
> > run.
> > 
> 
> Ok, so if I (single threaded) create (via fs_mark), sync and remove 5m
> empty files, the remove takes about a minute. If I just bump out the
> current queue and block thresholds by 10x and repeat, that time
> increases to about ~1m24s. If I hack up a kernel to disable queueing
> entirely (i.e. fully synchronous inactivation), then I'm back to about a
> minute again. So I'm not producing any performance benefit with
> queueing/batching in this single threaded scenario, but I suspect the
> 10x threshold delta is at least measuring the negative effect of poor
> caching..? (Any decent way to confirm that..?).

Right, background inactivation does not improve performance - it's
necessary to get the transactions out of the evict() path. All we
wanted was to ensure that there were no performance degradations as
a result of background inactivation, not that it was faster.

If you want to confirm that there is an increase in cold cache
access when the batch size is increased, cpu profiles with 'perf
top'/'perf record/report' and CPU cache performance metric reporting
via 'perf stat -dddd' are your friend. See elsewhere in the thread
where I mention those things to Paul.

Cheers,

Dave.
Paul E. McKenney Jan. 26, 2022, 5:29 a.m. UTC | #19
On Wed, Jan 26, 2022 at 09:36:07AM +1100, Dave Chinner wrote:
> On Tue, Jan 25, 2022 at 06:40:44AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> > > > Ok, I think we're talking about slightly different things. What I mean
> > > > above is that if a task removes a file and goes off doing unrelated
> > > > $work, that inode will just sit on the percpu queue indefinitely. That's
> > > > fine, as there's no functional need for us to process it immediately
> > > > unless we're around -ENOSPC thresholds or some such that demand reclaim
> > > > of the inode.
> > > 
> > > Yup, an occasional unlink sitting around for a while on an unlinked
> > > list isn't going to cause a performance problem.  Indeed, such
> > > workloads are more likely to benefit from the reduced unlink()
> > > syscall overhead and won't even notice the increase in background
> > > CPU overhead for inactivation of those occasional inodes.
> > > 
> > > > It sounds like what you're talking about is specifically
> > > > the behavior/performance of sustained file removal (which is important
> > > > obviously), where apparently there is a notable degradation if the
> > > > queues become deep enough to push the inode batches out of CPU cache. So
> > > > that makes sense...
> > > 
> > > Yup, sustained bulk throughput is where cache residency really
> > > matters. And for unlink, sustained unlink workloads are quite
> > > common; they often are something people wait for on the command line
> > > or make up a performance critical component of a highly concurrent
> > > workload so it's pretty important to get this part right.
> > > 
> > > > > Darrick made the original assumption that we could delay
> > > > > inactivation indefinitely and so he allowed really deep queues of up
> > > > > to 64k deferred inactivations. But with queues this deep, we could
> > > > > never get that background inactivation code to perform anywhere near
> > > > > the original synchronous background inactivation code. e.g. I
> > > > > measured 60-70% performance degradataions on my scalability tests,
> > > > > and nothing stood out in the profiles until I started looking at
> > > > > CPU data cache misses.
> > > > > 
> > > > 
> > > > ... but could you elaborate on the scalability tests involved here so I
> > > > can get a better sense of it in practice and perhaps observe the impact
> > > > of changes in this path?
> > > 
> > > The same conconrrent fsmark create/traverse/unlink workloads I've
> > > been running for the past decade+ demonstrates it pretty simply. I
> > > also saw regressions with dbench (both op latency and throughput) as
> > > the clinet count (concurrency) increased, and with compilebench.  I
> > > didn't look much further because all the common benchmarks I ran
> > > showed perf degradations with arbitrary delays that went away with
> > > the current code we have.  ISTR that parts of aim7/reaim scalability
> > > workloads that the intel zero-day infrastructure runs are quite
> > > sensitive to background inactivation delays as well because that's a
> > > CPU bound workload and hence any reduction in cache residency
> > > results in a reduction of the number of concurrent jobs that can be
> > > run.
> > 
> > Curiosity and all that, but has this work produced any intuition on
> > the sensitivity of the performance/scalability to the delays?  As in
> > the effect of microseconds vs. tens of microsecond vs. hundreds of
> > microseconds?
> 
> Some, yes.
> 
> The upper delay threshold where performance is measurably
> impacted is in the order of single digit milliseconds, not
> microseconds.
> 
> What I saw was that as the batch processing delay goes beyond ~5ms,
> IPC starts to fall. The CPU usage profile does not change shape, nor
> does the proportions of where CPU time is spent change. All I see if
> data cache misses go up substantially and IPC drop substantially. If
> I read my notes correctly, typical change from "fast" to "slow" in
> IPC was 0.82 to 0.39 and LLC-load-misses from 3% to 12%. The IPC
> degradation was all done by the time the background batch processing
> times were longer than a typical scheduler tick (10ms).
> 
> Now, I've been testing on Xeon CPUs with 36-76MB of l2-l3 caches, so
> there's a fair amount of data that these can hold. I expect that
> with smaller caches, the inflection point will be at smaller batch
> sizes rather than more. Hence while I could have used larger batches
> for background processing (e.g. 64-128 inodes rather than 32), I
> chose smaller batch sizes by default so that CPUs with smaller
> caches are less likely to be adversely affected by the batch size
> being too large. OTOH, I started to measure noticable degradation by
> batch sizes of 256 inodes on my machines, which is why the hard
> queue limit got set to 256 inodes.
> 
> Scaling the delay/batch size down towards single inode queuing also
> resulted in perf degradation. This was largely because of all the
> extra scheduling overhead that trying to switching between user task
> and kernel worker task for every inode entailed. Context switch rate
> went from a couple of thousand/sec to over 100,000/s for single
> inode batches, and performance went backwards in proportion with the
> amount of CPU then spent on context switches. It also lead to
> increases in buffer lock contention (hence context switches) as both
> user task and kworker try to access the same buffers...

Makes sense.  Never a guarantee of easy answers.  ;-)

If it would help, I could create expedited-grace-period counterparts
of get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
poll_state_synchronize_rcu(), and cond_synchronize_rcu().  These would
provide sub-millisecond grace periods, in fact, sub-100-microsecond
grace periods on smaller systems.

Of course, nothing comes for free.  Although expedited grace periods
are way way cheaper than they used to be, they still IPI non-idle
non-nohz_full-userspace CPUs, which translates to roughly the CPU overhead
of a wakeup on each IPIed CPU.  And of course disruption to aggressive
non-nohz_full real-time applications.  Shorter latencies also translates
to fewer updates over which to amortize grace-period overhead.

But it should get well under your single-digit milliseconds of delay.

							Thanx, Paul
Brian Foster Jan. 26, 2022, 1:21 p.m. UTC | #20
On Tue, Jan 25, 2022 at 09:29:10PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 26, 2022 at 09:36:07AM +1100, Dave Chinner wrote:
> > On Tue, Jan 25, 2022 at 06:40:44AM -0800, Paul E. McKenney wrote:
> > > On Tue, Jan 25, 2022 at 11:31:20AM +1100, Dave Chinner wrote:
> > > > > Ok, I think we're talking about slightly different things. What I mean
> > > > > above is that if a task removes a file and goes off doing unrelated
> > > > > $work, that inode will just sit on the percpu queue indefinitely. That's
> > > > > fine, as there's no functional need for us to process it immediately
> > > > > unless we're around -ENOSPC thresholds or some such that demand reclaim
> > > > > of the inode.
> > > > 
> > > > Yup, an occasional unlink sitting around for a while on an unlinked
> > > > list isn't going to cause a performance problem.  Indeed, such
> > > > workloads are more likely to benefit from the reduced unlink()
> > > > syscall overhead and won't even notice the increase in background
> > > > CPU overhead for inactivation of those occasional inodes.
> > > > 
> > > > > It sounds like what you're talking about is specifically
> > > > > the behavior/performance of sustained file removal (which is important
> > > > > obviously), where apparently there is a notable degradation if the
> > > > > queues become deep enough to push the inode batches out of CPU cache. So
> > > > > that makes sense...
> > > > 
> > > > Yup, sustained bulk throughput is where cache residency really
> > > > matters. And for unlink, sustained unlink workloads are quite
> > > > common; they often are something people wait for on the command line
> > > > or make up a performance critical component of a highly concurrent
> > > > workload so it's pretty important to get this part right.
> > > > 
> > > > > > Darrick made the original assumption that we could delay
> > > > > > inactivation indefinitely and so he allowed really deep queues of up
> > > > > > to 64k deferred inactivations. But with queues this deep, we could
> > > > > > never get that background inactivation code to perform anywhere near
> > > > > > the original synchronous background inactivation code. e.g. I
> > > > > > measured 60-70% performance degradataions on my scalability tests,
> > > > > > and nothing stood out in the profiles until I started looking at
> > > > > > CPU data cache misses.
> > > > > > 
> > > > > 
> > > > > ... but could you elaborate on the scalability tests involved here so I
> > > > > can get a better sense of it in practice and perhaps observe the impact
> > > > > of changes in this path?
> > > > 
> > > > The same conconrrent fsmark create/traverse/unlink workloads I've
> > > > been running for the past decade+ demonstrates it pretty simply. I
> > > > also saw regressions with dbench (both op latency and throughput) as
> > > > the clinet count (concurrency) increased, and with compilebench.  I
> > > > didn't look much further because all the common benchmarks I ran
> > > > showed perf degradations with arbitrary delays that went away with
> > > > the current code we have.  ISTR that parts of aim7/reaim scalability
> > > > workloads that the intel zero-day infrastructure runs are quite
> > > > sensitive to background inactivation delays as well because that's a
> > > > CPU bound workload and hence any reduction in cache residency
> > > > results in a reduction of the number of concurrent jobs that can be
> > > > run.
> > > 
> > > Curiosity and all that, but has this work produced any intuition on
> > > the sensitivity of the performance/scalability to the delays?  As in
> > > the effect of microseconds vs. tens of microsecond vs. hundreds of
> > > microseconds?
> > 
> > Some, yes.
> > 
> > The upper delay threshold where performance is measurably
> > impacted is in the order of single digit milliseconds, not
> > microseconds.
> > 
> > What I saw was that as the batch processing delay goes beyond ~5ms,
> > IPC starts to fall. The CPU usage profile does not change shape, nor
> > does the proportions of where CPU time is spent change. All I see if
> > data cache misses go up substantially and IPC drop substantially. If
> > I read my notes correctly, typical change from "fast" to "slow" in
> > IPC was 0.82 to 0.39 and LLC-load-misses from 3% to 12%. The IPC
> > degradation was all done by the time the background batch processing
> > times were longer than a typical scheduler tick (10ms).
> > 
> > Now, I've been testing on Xeon CPUs with 36-76MB of l2-l3 caches, so
> > there's a fair amount of data that these can hold. I expect that
> > with smaller caches, the inflection point will be at smaller batch
> > sizes rather than more. Hence while I could have used larger batches
> > for background processing (e.g. 64-128 inodes rather than 32), I
> > chose smaller batch sizes by default so that CPUs with smaller
> > caches are less likely to be adversely affected by the batch size
> > being too large. OTOH, I started to measure noticable degradation by
> > batch sizes of 256 inodes on my machines, which is why the hard
> > queue limit got set to 256 inodes.
> > 
> > Scaling the delay/batch size down towards single inode queuing also
> > resulted in perf degradation. This was largely because of all the
> > extra scheduling overhead that trying to switching between user task
> > and kernel worker task for every inode entailed. Context switch rate
> > went from a couple of thousand/sec to over 100,000/s for single
> > inode batches, and performance went backwards in proportion with the
> > amount of CPU then spent on context switches. It also lead to
> > increases in buffer lock contention (hence context switches) as both
> > user task and kworker try to access the same buffers...
> 
> Makes sense.  Never a guarantee of easy answers.  ;-)
> 
> If it would help, I could create expedited-grace-period counterparts
> of get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
> poll_state_synchronize_rcu(), and cond_synchronize_rcu().  These would
> provide sub-millisecond grace periods, in fact, sub-100-microsecond
> grace periods on smaller systems.
> 

If you have something with enough basic functionality, I'd be interested
in converting this patch over to an expedited variant to run some
tests/experiments. As it is, it seems the current approach is kind of
playing wack-a-mole between disrupting allocation performance by
populating the free inode pool with too many free but "pending rcu grace
period" inodes and sustained remove performance by pushing the internal
inactivation queues too deep and thus losing CPU cache, as Dave
describes above. So if an expedited grace period is possible that fits
within the time window on paper, it certainly seems worthwhile to test.

Otherwise the only thing that comes to mind right now is to start
playing around with the physical inode allocation algorithm to avoid
such pending inodes. I think a scanning approach may ultimately run into
the same problems with the right workload (i.e. such that all free
inodes are pending), so I suspect what this really means is either
figuring a nice enough way to efficiently locate expired inodes (maybe
via our own internal rcu callback to explicitly tag now expired inodes
as good allocation candidates), or to determine when to proceed with
inode chunk allocations when scanning is unlikely to succeed, or
something similar along those general lines..

> Of course, nothing comes for free.  Although expedited grace periods
> are way way cheaper than they used to be, they still IPI non-idle
> non-nohz_full-userspace CPUs, which translates to roughly the CPU overhead
> of a wakeup on each IPIed CPU.  And of course disruption to aggressive
> non-nohz_full real-time applications.  Shorter latencies also translates
> to fewer updates over which to amortize grace-period overhead.
> 
> But it should get well under your single-digit milliseconds of delay.
> 

If the expedited variant were sufficient for the fast path case, I
suppose it might be interesting to see if we could throttle down to
non-expedited variants either based on heuristic or feedback from
allocation side stalls.

Brian

> 							Thanx, Paul
>
Al Viro Jan. 27, 2022, 4:19 a.m. UTC | #21
On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:

> Right, background inactivation does not improve performance - it's
> necessary to get the transactions out of the evict() path. All we
> wanted was to ensure that there were no performance degradations as
> a result of background inactivation, not that it was faster.
> 
> If you want to confirm that there is an increase in cold cache
> access when the batch size is increased, cpu profiles with 'perf
> top'/'perf record/report' and CPU cache performance metric reporting
> via 'perf stat -dddd' are your friend. See elsewhere in the thread
> where I mention those things to Paul.

Dave, do you see a plausible way to eventually drop Ian's bandaid?
I'm not asking for that to happen this cycle and for backports Ian's
patch is obviously fine.

What I really want to avoid is the situation when we are stuck with
keeping that bandaid in fs/namei.c, since all ways to avoid seeing
reused inodes would hurt XFS too badly.  And the benchmarks in this
thread do look like that.

Are there any realistic prospects of having xfs_iget() deal with
reuse case by allocating new in-core inode and flipping whatever
references you've got in XFS journalling data structures to the
new copy?  If I understood what you said on IRC correctly, that is...

Again, I'm not asking if it can be done this cycle; having a
realistic path to doing that eventually would be fine by me.
Dave Chinner Jan. 27, 2022, 5:26 a.m. UTC | #22
On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> 
> > Right, background inactivation does not improve performance - it's
> > necessary to get the transactions out of the evict() path. All we
> > wanted was to ensure that there were no performance degradations as
> > a result of background inactivation, not that it was faster.
> > 
> > If you want to confirm that there is an increase in cold cache
> > access when the batch size is increased, cpu profiles with 'perf
> > top'/'perf record/report' and CPU cache performance metric reporting
> > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > where I mention those things to Paul.
> 
> Dave, do you see a plausible way to eventually drop Ian's bandaid?
> I'm not asking for that to happen this cycle and for backports Ian's
> patch is obviously fine.

Yes, but not in the near term.

> What I really want to avoid is the situation when we are stuck with
> keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> reused inodes would hurt XFS too badly.  And the benchmarks in this
> thread do look like that.

The simplest way I think is to have the XFS inode allocation track
"busy inodes" in the same way we track "busy extents". A busy extent
is an extent that has been freed by the user, but is not yet marked
free in the journal/on disk. If we try to reallocate that busy
extent, we either select a different free extent to allocate, or if
we can't find any we force the journal to disk, wait for it to
complete (hence unbusying the extents) and retry the allocation
again.

We can do something similar for inode allocation - it's actually a
lockless tag lookup on the radix tree entry for the candidate inode
number. If we find the reclaimable radix tree tag set, the we select
a different inode. If we can't allocate a new inode, then we kick
synchronize_rcu() and retry the allocation, allowing inodes to be
recycled this time.

> Are there any realistic prospects of having xfs_iget() deal with
> reuse case by allocating new in-core inode and flipping whatever
> references you've got in XFS journalling data structures to the
> new copy?  If I understood what you said on IRC correctly, that is...

That's ... much harder.

One of the problems is that once an inode has a log item attached to
it, it assumes that it can be accessed without specific locking,
etc. see xfs_inode_clean(), for example. So there's some life-cycle
stuff that needs to be taken care of in XFS first, and the inode <->
log item relationship is tangled.

I've been working towards removing that tangle - but taht stuff is
quite a distance down my logging rework patch queue. THat queue has
been stuck now for a year trying to get the first handful of rework
and scalability modifications reviewed and merged, so I'm not
holding my breathe as to how long a more substantial rework of
internal logging code will take to review and merge.

Really, though, we need the inactivation stuff to be done as part of
the VFS inode lifecycle. I have some ideas on what to do here, but I
suspect we'll need some changes to iput_final()/evict() to allow us
to process final unlinks in the bakground and then call evict()
ourselves when the unlink completes. That way ->destroy_inode() can
just call xfs_reclaim_inode() to free it directly, which also helps
us get rid of background inode freeing and hence inode recycling
from XFS altogether. I think we _might_ be able to do this without
needing to change any of the logging code in XFS, but I haven't
looked any further than this into it as yet.

> Again, I'm not asking if it can be done this cycle; having a
> realistic path to doing that eventually would be fine by me.

We're talking a year at least, probably two, before we get there...

Cheers,

Dave.
Brian Foster Jan. 27, 2022, 7:01 p.m. UTC | #23
On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > 
> > > Right, background inactivation does not improve performance - it's
> > > necessary to get the transactions out of the evict() path. All we
> > > wanted was to ensure that there were no performance degradations as
> > > a result of background inactivation, not that it was faster.
> > > 
> > > If you want to confirm that there is an increase in cold cache
> > > access when the batch size is increased, cpu profiles with 'perf
> > > top'/'perf record/report' and CPU cache performance metric reporting
> > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > where I mention those things to Paul.
> > 
> > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > I'm not asking for that to happen this cycle and for backports Ian's
> > patch is obviously fine.
> 
> Yes, but not in the near term.
> 
> > What I really want to avoid is the situation when we are stuck with
> > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > thread do look like that.
> 
> The simplest way I think is to have the XFS inode allocation track
> "busy inodes" in the same way we track "busy extents". A busy extent
> is an extent that has been freed by the user, but is not yet marked
> free in the journal/on disk. If we try to reallocate that busy
> extent, we either select a different free extent to allocate, or if
> we can't find any we force the journal to disk, wait for it to
> complete (hence unbusying the extents) and retry the allocation
> again.
> 
> We can do something similar for inode allocation - it's actually a
> lockless tag lookup on the radix tree entry for the candidate inode
> number. If we find the reclaimable radix tree tag set, the we select
> a different inode. If we can't allocate a new inode, then we kick
> synchronize_rcu() and retry the allocation, allowing inodes to be
> recycled this time.
> 

I'm starting to poke around this area since it's become clear that the
currently proposed scheme just involves too much latency (unless Paul
chimes in with his expedited grace period variant, at which point I will
revisit) in the fast allocation/recycle path. ISTM so far that a simple
"skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
have pretty much the same pattern of behavior as this patch: one
synchronize_rcu() per batch.

IOW, background reclaim only kicks in after 30s by default, so the pool
of free inodes pretty much always consists of 100% reclaimable inodes.
On top of that, at smaller batch sizes, the pool tends to have a uniform
(!elapsed) grace period cookie, so a stall is required to be able to
allocate any of them. As the batch size increases, I do see the
population of free inodes start to contain a mix of expired and
non-expired grace period cookies. It's fairly easy to hack up an
internal icwalk scan to locate already expired inodes, but the problem
is that the recycle rate is so much faster than the grace period latency
that it doesn't really matter. We'll still have to stall by the time we
get to the non-expired inodes, and so we're back to one stall per batch
and the same general performance characteristic of this patch.

So given all of this, I'm wondering about something like the following
high level inode allocation algorithm:

1. If the AG has any reclaimable inodes, scan for one with an expired
grace period. If found, target that inode for physical allocation.

2. If the AG free inode count == the AG reclaimable count and we know
all reclaimable inodes are most likely pending a grace period (because
the previous step failed), allocate a new inode chunk (and target it in
this allocation).

3. If the AG free inode count > the reclaimable count, scan the finobt
for an inode that is not present in the radix tree (i.e. Dave's logic
above).

Each of those steps could involve some heuristics to maintain
predictable behavior and avoid large scans and such, but the general
idea is that the repeated alloc/free inode workload naturally populates
the AG with enough physical inodes to always be able to satisfy an
allocation without waiting on a grace period. IOW, this is effectively
similar behavior to if physical inode freeing was delayed to an rcu
callback, with the tradeoff of complicating the allocation path rather
than stalling in the inactivation pipeline. Thoughts?

This of course is more involved than this patch (or similarly simple
variants of RCU delaying preexisting bits of code) and requires some
more investigation, but certainly shouldn't be a multi-year thing. The
question is probably more of whether it's enough complexity to justify
in the meantime...

> > Are there any realistic prospects of having xfs_iget() deal with
> > reuse case by allocating new in-core inode and flipping whatever
> > references you've got in XFS journalling data structures to the
> > new copy?  If I understood what you said on IRC correctly, that is...
> 
> That's ... much harder.
> 
> One of the problems is that once an inode has a log item attached to
> it, it assumes that it can be accessed without specific locking,
> etc. see xfs_inode_clean(), for example. So there's some life-cycle
> stuff that needs to be taken care of in XFS first, and the inode <->
> log item relationship is tangled.
> 
> I've been working towards removing that tangle - but taht stuff is
> quite a distance down my logging rework patch queue. THat queue has
> been stuck now for a year trying to get the first handful of rework
> and scalability modifications reviewed and merged, so I'm not
> holding my breathe as to how long a more substantial rework of
> internal logging code will take to review and merge.
> 
> Really, though, we need the inactivation stuff to be done as part of
> the VFS inode lifecycle. I have some ideas on what to do here, but I
> suspect we'll need some changes to iput_final()/evict() to allow us
> to process final unlinks in the bakground and then call evict()
> ourselves when the unlink completes. That way ->destroy_inode() can
> just call xfs_reclaim_inode() to free it directly, which also helps
> us get rid of background inode freeing and hence inode recycling
> from XFS altogether. I think we _might_ be able to do this without
> needing to change any of the logging code in XFS, but I haven't
> looked any further than this into it as yet.
> 

... of whatever this ends up looking like.

Can you elaborate on what you mean by processing unlinks in the
background? I can see the value of being able to eliminate the recycle
code in XFS, but wouldn't we still have to limit and throttle against
background work to maintain sustained removal performance? IOW, what's
the general teardown behavior you're getting at here, aside from what
parts push into the vfs or not?

Brian

> > Again, I'm not asking if it can be done this cycle; having a
> > realistic path to doing that eventually would be fine by me.
> 
> We're talking a year at least, probably two, before we get there...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Dave Chinner Jan. 27, 2022, 10:18 p.m. UTC | #24
On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > 
> > > > Right, background inactivation does not improve performance - it's
> > > > necessary to get the transactions out of the evict() path. All we
> > > > wanted was to ensure that there were no performance degradations as
> > > > a result of background inactivation, not that it was faster.
> > > > 
> > > > If you want to confirm that there is an increase in cold cache
> > > > access when the batch size is increased, cpu profiles with 'perf
> > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > where I mention those things to Paul.
> > > 
> > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > I'm not asking for that to happen this cycle and for backports Ian's
> > > patch is obviously fine.
> > 
> > Yes, but not in the near term.
> > 
> > > What I really want to avoid is the situation when we are stuck with
> > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > thread do look like that.
> > 
> > The simplest way I think is to have the XFS inode allocation track
> > "busy inodes" in the same way we track "busy extents". A busy extent
> > is an extent that has been freed by the user, but is not yet marked
> > free in the journal/on disk. If we try to reallocate that busy
> > extent, we either select a different free extent to allocate, or if
> > we can't find any we force the journal to disk, wait for it to
> > complete (hence unbusying the extents) and retry the allocation
> > again.
> > 
> > We can do something similar for inode allocation - it's actually a
> > lockless tag lookup on the radix tree entry for the candidate inode
> > number. If we find the reclaimable radix tree tag set, the we select
> > a different inode. If we can't allocate a new inode, then we kick
> > synchronize_rcu() and retry the allocation, allowing inodes to be
> > recycled this time.
> > 
> 
> I'm starting to poke around this area since it's become clear that the
> currently proposed scheme just involves too much latency (unless Paul
> chimes in with his expedited grace period variant, at which point I will
> revisit) in the fast allocation/recycle path. ISTM so far that a simple
> "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> have pretty much the same pattern of behavior as this patch: one
> synchronize_rcu() per batch.

That's not really what I proposed - what I suggested was that if we
can't allocate a usable inode from the finobt, and we can't allocate
a new inode cluster from the AG (i.e. populate the finobt with more
inodes), only then call synchronise_rcu() and recycle an inode.

We don't need to scan the inode cache or the finobt to determine if
there are reclaimable inodes immediately available - do a gang tag
lookup on the radix tree for newino.
If it comes back with an inode number that is not
equal to the node number we looked up, then we can allocate an
newino immediately.

If it comes back with newino, then check the first inode in the
finobt. If that comes back with an inode that is not the first inode
in the finobt, we can immediately allocate the first inode in the
finobt. If not, check the last inode. if that fails, assume all
inodes in the finobt need recycling and allocate a new cluster,
pointing newino at it.

Then we get another 64 inodes starting at the newino cursor we can
allocate from while we wait for the current RCU grace period to
expire for inodes already in the reclaimable state. An algorithm
like this will allow the free inode pool to resize automatically
based on the unlink frequency of the workload and RCU grace period
latency...

> IOW, background reclaim only kicks in after 30s by default,

5 seconds, by default, not 30s.

> so the pool
> of free inodes pretty much always consists of 100% reclaimable inodes.
> On top of that, at smaller batch sizes, the pool tends to have a uniform
> (!elapsed) grace period cookie, so a stall is required to be able to
> allocate any of them. As the batch size increases, I do see the
> population of free inodes start to contain a mix of expired and
> non-expired grace period cookies. It's fairly easy to hack up an
> internal icwalk scan to locate already expired inodes,

We don't want or need to do exhaustive, exactly correct scans here.
We want *fast and loose* because this is a critical performance fast
path. We don't care if we skip the occasional recyclable inode, what
we need to to is minimise the CPU overhead and search latency for
the case where recycling will never occur.

> but the problem
> is that the recycle rate is so much faster than the grace period latency
> that it doesn't really matter. We'll still have to stall by the time we
> get to the non-expired inodes, and so we're back to one stall per batch
> and the same general performance characteristic of this patch.

Yes, but that's why I suggested that we allocate a new inode cluster
rather than calling synchronise_rcu() when we don't have a
recyclable inode candidate.

> So given all of this, I'm wondering about something like the following
> high level inode allocation algorithm:
> 
> 1. If the AG has any reclaimable inodes, scan for one with an expired
> grace period. If found, target that inode for physical allocation.

How do you efficiently discriminate between "reclaimable w/ nlink >
0" and "reclaimable w/ nlink == 0" so we don't get hung up searching
millions of reclaimable inodes for the one that has been unlinked
and has an expired grace period?

Also, this will need to be done on every inode allocation when we
have inodes in reclaimable state (which is almost always on a busy
system).  Workloads with sequential allocation (as per untar, rsync,
git checkout, cp -r, etc) will do this scan unnecessarily as they
will almost never hit this inode recycle path as there aren't a lot
of unlinks occurring while they are working.

> 2. If the AG free inode count == the AG reclaimable count and we know
> all reclaimable inodes are most likely pending a grace period (because
> the previous step failed), allocate a new inode chunk (and target it in
> this allocation).

That's good for the allocation that allocates the chunk, but...

> 3. If the AG free inode count > the reclaimable count, scan the finobt
> for an inode that is not present in the radix tree (i.e. Dave's logic
> above).

... now we are repeating the radix tree walk that we've already done
in #1 to find the newly allocated inodes we allocated in #2.

We don't need to walk the inodes in the inode radix tree to look at
individual inode state - we can use the reclaimable radix tree tag
to shortcut those walks and minimise the number of actual lookups we
need to do. By definition, and inode in the finobt and
XFS_IRECLAIMABLE state is an inode that needs recycling, so we can
just use the finobt and the inode radix tree tags to avoid inodes
that need recycling altogether.  i.e. If we fail a tag lookup, we
have no reclaimable inodes in the range we asked the lookup to
search so we can immediately allocate - we don't need to actually
need to look at the inode in the fast path no-recycling case at all. 

Keep in mind that the fast path we really care about is not the
unlink/allocate looping case, it's the allocation case where no
recycling will ever occur and so that's the one we really have to
try hard to minimise the overhead for. The moment we get into
reclaimable inodes within the finobt range  we're hitting the "lots
of temp files" use case, so we can detect that and keep the overhead
of that algorithm as separate as we possibly can.

Hence we need the initial "can we allocate this inode number"
decision to be as fast and as low overhead as possible so we can
determine which algorithm we need to run. A lockless radix tree gang
tag lookup will give us that and if the lookup finds a reclaimable
inode only then do we move into the "recycle RCU avoidance"
algorithm path....

> > > Are there any realistic prospects of having xfs_iget() deal with
> > > reuse case by allocating new in-core inode and flipping whatever
> > > references you've got in XFS journalling data structures to the
> > > new copy?  If I understood what you said on IRC correctly, that is...
> > 
> > That's ... much harder.
> > 
> > One of the problems is that once an inode has a log item attached to
> > it, it assumes that it can be accessed without specific locking,
> > etc. see xfs_inode_clean(), for example. So there's some life-cycle
> > stuff that needs to be taken care of in XFS first, and the inode <->
> > log item relationship is tangled.
> > 
> > I've been working towards removing that tangle - but taht stuff is
> > quite a distance down my logging rework patch queue. THat queue has
> > been stuck now for a year trying to get the first handful of rework
> > and scalability modifications reviewed and merged, so I'm not
> > holding my breathe as to how long a more substantial rework of
> > internal logging code will take to review and merge.
> > 
> > Really, though, we need the inactivation stuff to be done as part of
> > the VFS inode lifecycle. I have some ideas on what to do here, but I
> > suspect we'll need some changes to iput_final()/evict() to allow us
> > to process final unlinks in the bakground and then call evict()
> > ourselves when the unlink completes. That way ->destroy_inode() can
> > just call xfs_reclaim_inode() to free it directly, which also helps
> > us get rid of background inode freeing and hence inode recycling
> > from XFS altogether. I think we _might_ be able to do this without
> > needing to change any of the logging code in XFS, but I haven't
> > looked any further than this into it as yet.
> > 
> 
> ... of whatever this ends up looking like.
> 
> Can you elaborate on what you mean by processing unlinks in the
> background? I can see the value of being able to eliminate the recycle
> code in XFS, but wouldn't we still have to limit and throttle against
> background work to maintain sustained removal performance?

Yes, but that's irrelevant because all we would be doing is slightly
changing where that throttling occurs (i.e. in
iput_final->drop_inode instead of iput_final->evict->destroy_inode).

However, moving the throttling up the stack is a good thing because
it gets rid of the current problem with the inactivation throttling
blocking the shrinker via shrinker->super_cache_scan->
prune_icache_sb->dispose_list->evict-> destroy_inode->throttle on
full inactivation queue because all the inodes need EOF block
trimming to be done.

> IOW, what's
> the general teardown behavior you're getting at here, aside from what
> parts push into the vfs or not?

->drop_inode() triggers background inactivation for both blockgc and
inode unlink. For unlink, we set I_WILL_FREE so the VFS will not
attempt to re-use it, add the inode # to the internal AG "busy
inode" tree and return drop = true and the VFS then stops processing
that inode. For blockgc, we queue the work and return drop = false
and the VFS puts it onto the LRU. Now we have asynchronous
inactivation while the inode is still present and visible at the VFS
level.

For background blockgc - that now happens while the inode is idle on
the LRU before it gets reclaimed by the shrinker. i.e. we trigger
block gc when the last reference to the inode goes away instead of
when it gets removed from memory by the shrinker.

For unlink, that now runs in the bacgrkoud until the inode unlink
has been journalled and the cleared inode written to the backing
inode cluster buffer. The inode is then no longer visisble to the
journal and it can't be reallocated because it is still busy. We
then change the inode state from I_WILL_FREE to I_FREEING and call
evict(). The inode then gets torn down, and in ->destroy_inode we
remove the inode from the radix tree, clear the per-ag busy record
and free the inode via RCU as expected by the VFS.

Another possible mechanism instead of exporting evict() is that
background inactivation takes a new reference to the inode from
->drop_inode so that even if we put it on the LRU the inode cache
shrinker will skip it while we are doing background inactivation.
That would mean that when background inactivation is done, we call
iput_final() again. The inode will either then be left on the LRU or
go through the normal evict() path.

This also it gets the memory demand and overhead of EOF block
trimming out of the memory reclaim path, and it also gets rid of
the need for the special superblock shrinker hooks that XFS has for
reclaiming it's internal inode cache.

Overall, lifting this stuff up to the VFS is full of "less
complexity in XFS" wins if we can make it work...

Cheers,

Dave.
Brian Foster Jan. 28, 2022, 2:11 p.m. UTC | #25
On Fri, Jan 28, 2022 at 09:18:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > 
> > > > > Right, background inactivation does not improve performance - it's
> > > > > necessary to get the transactions out of the evict() path. All we
> > > > > wanted was to ensure that there were no performance degradations as
> > > > > a result of background inactivation, not that it was faster.
> > > > > 
> > > > > If you want to confirm that there is an increase in cold cache
> > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > where I mention those things to Paul.
> > > > 
> > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > patch is obviously fine.
> > > 
> > > Yes, but not in the near term.
> > > 
> > > > What I really want to avoid is the situation when we are stuck with
> > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > thread do look like that.
> > > 
> > > The simplest way I think is to have the XFS inode allocation track
> > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > is an extent that has been freed by the user, but is not yet marked
> > > free in the journal/on disk. If we try to reallocate that busy
> > > extent, we either select a different free extent to allocate, or if
> > > we can't find any we force the journal to disk, wait for it to
> > > complete (hence unbusying the extents) and retry the allocation
> > > again.
> > > 
> > > We can do something similar for inode allocation - it's actually a
> > > lockless tag lookup on the radix tree entry for the candidate inode
> > > number. If we find the reclaimable radix tree tag set, the we select
> > > a different inode. If we can't allocate a new inode, then we kick
> > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > recycled this time.
> > > 
> > 
> > I'm starting to poke around this area since it's become clear that the
> > currently proposed scheme just involves too much latency (unless Paul
> > chimes in with his expedited grace period variant, at which point I will
> > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > have pretty much the same pattern of behavior as this patch: one
> > synchronize_rcu() per batch.
> 
> That's not really what I proposed - what I suggested was that if we
> can't allocate a usable inode from the finobt, and we can't allocate
> a new inode cluster from the AG (i.e. populate the finobt with more
> inodes), only then call synchronise_rcu() and recycle an inode.
> 

That's not how I read it... Regardless, that was my suggestion as well,
so we're on the same page on that front.

> We don't need to scan the inode cache or the finobt to determine if
> there are reclaimable inodes immediately available - do a gang tag
> lookup on the radix tree for newino.
> If it comes back with an inode number that is not
> equal to the node number we looked up, then we can allocate an
> newino immediately.
> 
> If it comes back with newino, then check the first inode in the
> finobt. If that comes back with an inode that is not the first inode
> in the finobt, we can immediately allocate the first inode in the
> finobt. If not, check the last inode. if that fails, assume all
> inodes in the finobt need recycling and allocate a new cluster,
> pointing newino at it.
> 

Hrm, I'll have to think about this some more. I don't mind something
like this as a possible scanning allocation algorithm, but I don't love
the idea of doing a few predictable btree/radix tree lookups and
inferring broader AG state from that, particularly when I think it's
possible to get more accurate information in a way that's easier and
probably more efficient.

For example, we already have counts of the number of reclaimable and
free inodes in the perag. We could fairly easily add a counter to track
the subset of reclaimable inodes that are unlinked. With something like
that, it's easier to make higher level decisions like when to just
allocate a new inode chunk (because the free inode pool consists mostly
of reclaimable inodes) or just scanning through the finobt for a good
candidate (because there are none or very few unlinked reclaimable
inodes relative to the number of free inodes in the btree).

So in general I think the two obvious ends of the spectrum (i.e. the
repeated alloc/free workload I'm testing above vs. the tar/cp use case
where there are many allocs and few unlinks) are probably the most
straightforward to handle and don't require major search algorithm
changes.  It's the middle ground (i.e. a large number of free inodes
with half or whatever more sitting in the radix tree) that I think
requires some more thought and I don't quite have an answer for atm. I
don't want to go off allocating new inode chunks too aggressively, but
also don't want to turn the finobt allocation algorithm into something
like the historical inobt search algorithm with poor worst case
behavior.

> Then we get another 64 inodes starting at the newino cursor we can
> allocate from while we wait for the current RCU grace period to
> expire for inodes already in the reclaimable state. An algorithm
> like this will allow the free inode pool to resize automatically
> based on the unlink frequency of the workload and RCU grace period
> latency...
> 
> > IOW, background reclaim only kicks in after 30s by default,
> 
> 5 seconds, by default, not 30s.
> 

xfs_reclaim_work_queue() keys off xfs_syncd_centisecs, which corresponds
to xfs_params.syncd_timer, which is initialized as:

        .syncd_timer    = {     1*100,          30*100,         7200*100},

Am I missing something? Not that it really matters much for this
discussion anyways. Whether it's 30s or 5s, either way the reallocation
workload is going to pretty much always recycle these inodes long before
background reclaim gets to them.

> > so the pool
> > of free inodes pretty much always consists of 100% reclaimable inodes.
> > On top of that, at smaller batch sizes, the pool tends to have a uniform
> > (!elapsed) grace period cookie, so a stall is required to be able to
> > allocate any of them. As the batch size increases, I do see the
> > population of free inodes start to contain a mix of expired and
> > non-expired grace period cookies. It's fairly easy to hack up an
> > internal icwalk scan to locate already expired inodes,
> 
> We don't want or need to do exhaustive, exactly correct scans here.
> We want *fast and loose* because this is a critical performance fast
> path. We don't care if we skip the occasional recyclable inode, what
> we need to to is minimise the CPU overhead and search latency for
> the case where recycling will never occur.
> 

Agreed. That's what I meant by my comment about having heuristics to
avoid large/long scans.

> > but the problem
> > is that the recycle rate is so much faster than the grace period latency
> > that it doesn't really matter. We'll still have to stall by the time we
> > get to the non-expired inodes, and so we're back to one stall per batch
> > and the same general performance characteristic of this patch.
> 
> Yes, but that's why I suggested that we allocate a new inode cluster
> rather than calling synchronise_rcu() when we don't have a
> recyclable inode candidate.
> 

Ok.

> > So given all of this, I'm wondering about something like the following
> > high level inode allocation algorithm:
> > 
> > 1. If the AG has any reclaimable inodes, scan for one with an expired
> > grace period. If found, target that inode for physical allocation.
> 
> How do you efficiently discriminate between "reclaimable w/ nlink >
> 0" and "reclaimable w/ nlink == 0" so we don't get hung up searching
> millions of reclaimable inodes for the one that has been unlinked
> and has an expired grace period?
> 

A counter or some other form of hinting structure..

> Also, this will need to be done on every inode allocation when we
> have inodes in reclaimable state (which is almost always on a busy
> system).  Workloads with sequential allocation (as per untar, rsync,
> git checkout, cp -r, etc) will do this scan unnecessarily as they
> will almost never hit this inode recycle path as there aren't a lot
> of unlinks occurring while they are working.
> 

I'm not necessarily suggesting a full radix tree scan per inode
allocation. I was more thinking about an occasionally updated hinting
structure to efficiently locate the least recently freed inode numbers,
or something similar. This would serve no purpose in scenarios where it
just makes more sense to allocate new chunks, but otherwise could just
serve as an allocation target, a metric to determine likelihood of
reclaimable inodes w/ expired grace periods being present, or just a
starting point for a finobt search algorithm like what you describe
above, etc.

> > 2. If the AG free inode count == the AG reclaimable count and we know
> > all reclaimable inodes are most likely pending a grace period (because
> > the previous step failed), allocate a new inode chunk (and target it in
> > this allocation).
> 
> That's good for the allocation that allocates the chunk, but...
> 
> > 3. If the AG free inode count > the reclaimable count, scan the finobt
> > for an inode that is not present in the radix tree (i.e. Dave's logic
> > above).
> 
> ... now we are repeating the radix tree walk that we've already done
> in #1 to find the newly allocated inodes we allocated in #2.
> 
> We don't need to walk the inodes in the inode radix tree to look at
> individual inode state - we can use the reclaimable radix tree tag
> to shortcut those walks and minimise the number of actual lookups we
> need to do. By definition, and inode in the finobt and
> XFS_IRECLAIMABLE state is an inode that needs recycling, so we can
> just use the finobt and the inode radix tree tags to avoid inodes
> that need recycling altogether.  i.e. If we fail a tag lookup, we
> have no reclaimable inodes in the range we asked the lookup to
> search so we can immediately allocate - we don't need to actually
> need to look at the inode in the fast path no-recycling case at all. 
> 

This is starting to make some odd (to me) assumptions about thus far
undefined implementation details. For example, the very little amount of
code I have already for experimentation purposes only scans tagged
reclaimable inodes, so that you suggest doing exactly that instead of
full radix tree scans suggests to me that there are some details here
that are clearly not getting across in email. ;)

That's fine, I'm not trying to cover details. Details are easier to work
through with code, and TBH I don't have enough concrete ideas to hash
through details in email just yet anyways. The primary concepts in my
previous description were that we should prioritize allocation of new
chunks over taking RCU stalls whenever possible, and that there might be
ways to use existing radix tree state to maintain predictable worst case
performance for finobt searches (TBD). With regard to the general
principles you mention of avoiding repeated large scans, maintaing
common workload and fast path performance, etc., I think we're pretty
much on the same page.

> Keep in mind that the fast path we really care about is not the
> unlink/allocate looping case, it's the allocation case where no
> recycling will ever occur and so that's the one we really have to
> try hard to minimise the overhead for. The moment we get into
> reclaimable inodes within the finobt range  we're hitting the "lots
> of temp files" use case, so we can detect that and keep the overhead
> of that algorithm as separate as we possibly can.
> 
> Hence we need the initial "can we allocate this inode number"
> decision to be as fast and as low overhead as possible so we can
> determine which algorithm we need to run. A lockless radix tree gang
> tag lookup will give us that and if the lookup finds a reclaimable
> inode only then do we move into the "recycle RCU avoidance"
> algorithm path....
> 
> > > > Are there any realistic prospects of having xfs_iget() deal with
> > > > reuse case by allocating new in-core inode and flipping whatever
> > > > references you've got in XFS journalling data structures to the
> > > > new copy?  If I understood what you said on IRC correctly, that is...
> > > 
> > > That's ... much harder.
> > > 
> > > One of the problems is that once an inode has a log item attached to
> > > it, it assumes that it can be accessed without specific locking,
> > > etc. see xfs_inode_clean(), for example. So there's some life-cycle
> > > stuff that needs to be taken care of in XFS first, and the inode <->
> > > log item relationship is tangled.
> > > 
> > > I've been working towards removing that tangle - but taht stuff is
> > > quite a distance down my logging rework patch queue. THat queue has
> > > been stuck now for a year trying to get the first handful of rework
> > > and scalability modifications reviewed and merged, so I'm not
> > > holding my breathe as to how long a more substantial rework of
> > > internal logging code will take to review and merge.
> > > 
> > > Really, though, we need the inactivation stuff to be done as part of
> > > the VFS inode lifecycle. I have some ideas on what to do here, but I
> > > suspect we'll need some changes to iput_final()/evict() to allow us
> > > to process final unlinks in the bakground and then call evict()
> > > ourselves when the unlink completes. That way ->destroy_inode() can
> > > just call xfs_reclaim_inode() to free it directly, which also helps
> > > us get rid of background inode freeing and hence inode recycling
> > > from XFS altogether. I think we _might_ be able to do this without
> > > needing to change any of the logging code in XFS, but I haven't
> > > looked any further than this into it as yet.
> > > 
> > 
> > ... of whatever this ends up looking like.
> > 
> > Can you elaborate on what you mean by processing unlinks in the
> > background? I can see the value of being able to eliminate the recycle
> > code in XFS, but wouldn't we still have to limit and throttle against
> > background work to maintain sustained removal performance?
> 
> Yes, but that's irrelevant because all we would be doing is slightly
> changing where that throttling occurs (i.e. in
> iput_final->drop_inode instead of iput_final->evict->destroy_inode).
> 
> However, moving the throttling up the stack is a good thing because
> it gets rid of the current problem with the inactivation throttling
> blocking the shrinker via shrinker->super_cache_scan->
> prune_icache_sb->dispose_list->evict-> destroy_inode->throttle on
> full inactivation queue because all the inodes need EOF block
> trimming to be done.
> 

What I'm trying to understand is whether inodes will have cycled through
the requisite grace period before ->destroy_inode() or not, and if so,
how that is done to avoid the sustained removal performance problem
we've run into here (caused by the extra latency leading to increasing
cacheline misses)..?

> > IOW, what's
> > the general teardown behavior you're getting at here, aside from what
> > parts push into the vfs or not?
> 
> ->drop_inode() triggers background inactivation for both blockgc and
> inode unlink. For unlink, we set I_WILL_FREE so the VFS will not
> attempt to re-use it, add the inode # to the internal AG "busy
> inode" tree and return drop = true and the VFS then stops processing
> that inode. For blockgc, we queue the work and return drop = false
> and the VFS puts it onto the LRU. Now we have asynchronous
> inactivation while the inode is still present and visible at the VFS
> level.
> 
> For background blockgc - that now happens while the inode is idle on
> the LRU before it gets reclaimed by the shrinker. i.e. we trigger
> block gc when the last reference to the inode goes away instead of
> when it gets removed from memory by the shrinker.
> 
> For unlink, that now runs in the bacgrkoud until the inode unlink
> has been journalled and the cleared inode written to the backing
> inode cluster buffer. The inode is then no longer visisble to the
> journal and it can't be reallocated because it is still busy. We
> then change the inode state from I_WILL_FREE to I_FREEING and call
> evict(). The inode then gets torn down, and in ->destroy_inode we
> remove the inode from the radix tree, clear the per-ag busy record
> and free the inode via RCU as expected by the VFS.
> 

Ok, so this sort of sounds like these are separate things. I'm all for
creating more flexibility with the VFS to allow XFS to remove or
simplify codepaths, but this still depends on some form of grace period
tracking to avoid allocation of inodes that are free in the btrees but
still might have in-core struct inode's laying around, yes?

The reason I'm asking about this is because as this patch to avoid
recycling non-expired inodes becomes more complex in order to satisfy
performance requirements, longer term usefulness becomes more relevant.
I don't want us to come up with some complex scheme to avoid RCU stalls
when there's already a plan to rip it out and replace it in a year or
so. OTOH if the resulting logic is part of that longer term strategy,
then this is less of a concern.

Brian

> Another possible mechanism instead of exporting evict() is that
> background inactivation takes a new reference to the inode from
> ->drop_inode so that even if we put it on the LRU the inode cache
> shrinker will skip it while we are doing background inactivation.
> That would mean that when background inactivation is done, we call
> iput_final() again. The inode will either then be left on the LRU or
> go through the normal evict() path.
> 
> This also it gets the memory demand and overhead of EOF block
> trimming out of the memory reclaim path, and it also gets rid of
> the need for the special superblock shrinker hooks that XFS has for
> reclaiming it's internal inode cache.
> 
> Overall, lifting this stuff up to the VFS is full of "less
> complexity in XFS" wins if we can make it work...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Paul E. McKenney Jan. 28, 2022, 9:39 p.m. UTC | #26
On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > 
> > > > Right, background inactivation does not improve performance - it's
> > > > necessary to get the transactions out of the evict() path. All we
> > > > wanted was to ensure that there were no performance degradations as
> > > > a result of background inactivation, not that it was faster.
> > > > 
> > > > If you want to confirm that there is an increase in cold cache
> > > > access when the batch size is increased, cpu profiles with 'perf
> > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > where I mention those things to Paul.
> > > 
> > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > I'm not asking for that to happen this cycle and for backports Ian's
> > > patch is obviously fine.
> > 
> > Yes, but not in the near term.
> > 
> > > What I really want to avoid is the situation when we are stuck with
> > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > thread do look like that.
> > 
> > The simplest way I think is to have the XFS inode allocation track
> > "busy inodes" in the same way we track "busy extents". A busy extent
> > is an extent that has been freed by the user, but is not yet marked
> > free in the journal/on disk. If we try to reallocate that busy
> > extent, we either select a different free extent to allocate, or if
> > we can't find any we force the journal to disk, wait for it to
> > complete (hence unbusying the extents) and retry the allocation
> > again.
> > 
> > We can do something similar for inode allocation - it's actually a
> > lockless tag lookup on the radix tree entry for the candidate inode
> > number. If we find the reclaimable radix tree tag set, the we select
> > a different inode. If we can't allocate a new inode, then we kick
> > synchronize_rcu() and retry the allocation, allowing inodes to be
> > recycled this time.
> 
> I'm starting to poke around this area since it's become clear that the
> currently proposed scheme just involves too much latency (unless Paul
> chimes in with his expedited grace period variant, at which point I will
> revisit) in the fast allocation/recycle path. ISTM so far that a simple
> "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> have pretty much the same pattern of behavior as this patch: one
> synchronize_rcu() per batch.

Apologies for being slow, but there have been some distractions.
One of the distractions was trying to put together atheoretically
attractive but massively overcomplicated implementation of
poll_state_synchronize_rcu_expedited().  It currently looks like a
somewhat suboptimal but much simpler approach is available.  This
assumes that XFS is not in the picture until after both the scheduler
and workqueues are operational.

And yes, the complicated version might prove necessary, but let's
see if this whole thing is even useful first.  ;-)

In the meantime, if you want to look at an extremely unbaked view,
here you go:

https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing

							Thanx, Paul

> IOW, background reclaim only kicks in after 30s by default, so the pool
> of free inodes pretty much always consists of 100% reclaimable inodes.
> On top of that, at smaller batch sizes, the pool tends to have a uniform
> (!elapsed) grace period cookie, so a stall is required to be able to
> allocate any of them. As the batch size increases, I do see the
> population of free inodes start to contain a mix of expired and
> non-expired grace period cookies. It's fairly easy to hack up an
> internal icwalk scan to locate already expired inodes, but the problem
> is that the recycle rate is so much faster than the grace period latency
> that it doesn't really matter. We'll still have to stall by the time we
> get to the non-expired inodes, and so we're back to one stall per batch
> and the same general performance characteristic of this patch.
> 
> So given all of this, I'm wondering about something like the following
> high level inode allocation algorithm:
> 
> 1. If the AG has any reclaimable inodes, scan for one with an expired
> grace period. If found, target that inode for physical allocation.
> 
> 2. If the AG free inode count == the AG reclaimable count and we know
> all reclaimable inodes are most likely pending a grace period (because
> the previous step failed), allocate a new inode chunk (and target it in
> this allocation).
> 
> 3. If the AG free inode count > the reclaimable count, scan the finobt
> for an inode that is not present in the radix tree (i.e. Dave's logic
> above).
> 
> Each of those steps could involve some heuristics to maintain
> predictable behavior and avoid large scans and such, but the general
> idea is that the repeated alloc/free inode workload naturally populates
> the AG with enough physical inodes to always be able to satisfy an
> allocation without waiting on a grace period. IOW, this is effectively
> similar behavior to if physical inode freeing was delayed to an rcu
> callback, with the tradeoff of complicating the allocation path rather
> than stalling in the inactivation pipeline. Thoughts?
> 
> This of course is more involved than this patch (or similarly simple
> variants of RCU delaying preexisting bits of code) and requires some
> more investigation, but certainly shouldn't be a multi-year thing. The
> question is probably more of whether it's enough complexity to justify
> in the meantime...
> 
> > > Are there any realistic prospects of having xfs_iget() deal with
> > > reuse case by allocating new in-core inode and flipping whatever
> > > references you've got in XFS journalling data structures to the
> > > new copy?  If I understood what you said on IRC correctly, that is...
> > 
> > That's ... much harder.
> > 
> > One of the problems is that once an inode has a log item attached to
> > it, it assumes that it can be accessed without specific locking,
> > etc. see xfs_inode_clean(), for example. So there's some life-cycle
> > stuff that needs to be taken care of in XFS first, and the inode <->
> > log item relationship is tangled.
> > 
> > I've been working towards removing that tangle - but taht stuff is
> > quite a distance down my logging rework patch queue. THat queue has
> > been stuck now for a year trying to get the first handful of rework
> > and scalability modifications reviewed and merged, so I'm not
> > holding my breathe as to how long a more substantial rework of
> > internal logging code will take to review and merge.
> > 
> > Really, though, we need the inactivation stuff to be done as part of
> > the VFS inode lifecycle. I have some ideas on what to do here, but I
> > suspect we'll need some changes to iput_final()/evict() to allow us
> > to process final unlinks in the bakground and then call evict()
> > ourselves when the unlink completes. That way ->destroy_inode() can
> > just call xfs_reclaim_inode() to free it directly, which also helps
> > us get rid of background inode freeing and hence inode recycling
> > from XFS altogether. I think we _might_ be able to do this without
> > needing to change any of the logging code in XFS, but I haven't
> > looked any further than this into it as yet.
> > 
> 
> ... of whatever this ends up looking like.
> 
> Can you elaborate on what you mean by processing unlinks in the
> background? I can see the value of being able to eliminate the recycle
> code in XFS, but wouldn't we still have to limit and throttle against
> background work to maintain sustained removal performance? IOW, what's
> the general teardown behavior you're getting at here, aside from what
> parts push into the vfs or not?
> 
> Brian
> 
> > > Again, I'm not asking if it can be done this cycle; having a
> > > realistic path to doing that eventually would be fine by me.
> > 
> > We're talking a year at least, probably two, before we get there...
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > 
>
Dave Chinner Jan. 28, 2022, 11:53 p.m. UTC | #27
On Fri, Jan 28, 2022 at 09:11:07AM -0500, Brian Foster wrote:
> On Fri, Jan 28, 2022 at 09:18:17AM +1100, Dave Chinner wrote:
> > On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > > 
> > > > > > Right, background inactivation does not improve performance - it's
> > > > > > necessary to get the transactions out of the evict() path. All we
> > > > > > wanted was to ensure that there were no performance degradations as
> > > > > > a result of background inactivation, not that it was faster.
> > > > > > 
> > > > > > If you want to confirm that there is an increase in cold cache
> > > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > > where I mention those things to Paul.
> > > > > 
> > > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > > patch is obviously fine.
> > > > 
> > > > Yes, but not in the near term.
> > > > 
> > > > > What I really want to avoid is the situation when we are stuck with
> > > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > > thread do look like that.
> > > > 
> > > > The simplest way I think is to have the XFS inode allocation track
> > > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > > is an extent that has been freed by the user, but is not yet marked
> > > > free in the journal/on disk. If we try to reallocate that busy
> > > > extent, we either select a different free extent to allocate, or if
> > > > we can't find any we force the journal to disk, wait for it to
> > > > complete (hence unbusying the extents) and retry the allocation
> > > > again.
> > > > 
> > > > We can do something similar for inode allocation - it's actually a
> > > > lockless tag lookup on the radix tree entry for the candidate inode
> > > > number. If we find the reclaimable radix tree tag set, the we select
> > > > a different inode. If we can't allocate a new inode, then we kick
> > > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > > recycled this time.
> > > > 
> > > 
> > > I'm starting to poke around this area since it's become clear that the
> > > currently proposed scheme just involves too much latency (unless Paul
> > > chimes in with his expedited grace period variant, at which point I will
> > > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > > have pretty much the same pattern of behavior as this patch: one
> > > synchronize_rcu() per batch.
> > 
> > That's not really what I proposed - what I suggested was that if we
> > can't allocate a usable inode from the finobt, and we can't allocate
> > a new inode cluster from the AG (i.e. populate the finobt with more
> > inodes), only then call synchronise_rcu() and recycle an inode.
> > 
> 
> That's not how I read it... Regardless, that was my suggestion as well,
> so we're on the same page on that front.
> 
> > We don't need to scan the inode cache or the finobt to determine if
> > there are reclaimable inodes immediately available - do a gang tag
> > lookup on the radix tree for newino.
> > If it comes back with an inode number that is not
> > equal to the node number we looked up, then we can allocate an
> > newino immediately.
> > 
> > If it comes back with newino, then check the first inode in the
> > finobt. If that comes back with an inode that is not the first inode
> > in the finobt, we can immediately allocate the first inode in the
> > finobt. If not, check the last inode. if that fails, assume all
> > inodes in the finobt need recycling and allocate a new cluster,
> > pointing newino at it.
> > 
> 
> Hrm, I'll have to think about this some more. I don't mind something
> like this as a possible scanning allocation algorithm, but I don't love
> the idea of doing a few predictable btree/radix tree lookups and
> inferring broader AG state from that, particularly when I think it's
> possible to get more accurate information in a way that's easier and
> probably more efficient.
> 
> For example, we already have counts of the number of reclaimable and
> free inodes in the perag. We could fairly easily add a counter to track
> the subset of reclaimable inodes that are unlinked. With something like
> that, it's easier to make higher level decisions like when to just
> allocate a new inode chunk (because the free inode pool consists mostly
> of reclaimable inodes) or just scanning through the finobt for a good
> candidate (because there are none or very few unlinked reclaimable
> inodes relative to the number of free inodes in the btree).
> 
> So in general I think the two obvious ends of the spectrum (i.e. the
> repeated alloc/free workload I'm testing above vs. the tar/cp use case
> where there are many allocs and few unlinks) are probably the most
> straightforward to handle and don't require major search algorithm
> changes.  It's the middle ground (i.e. a large number of free inodes
> with half or whatever more sitting in the radix tree) that I think
> requires some more thought and I don't quite have an answer for atm. I
> don't want to go off allocating new inode chunks too aggressively, but
> also don't want to turn the finobt allocation algorithm into something
> like the historical inobt search algorithm with poor worst case
> behavior.
> 
> > Then we get another 64 inodes starting at the newino cursor we can
> > allocate from while we wait for the current RCU grace period to
> > expire for inodes already in the reclaimable state. An algorithm
> > like this will allow the free inode pool to resize automatically
> > based on the unlink frequency of the workload and RCU grace period
> > latency...
> > 
> > > IOW, background reclaim only kicks in after 30s by default,
> > 
> > 5 seconds, by default, not 30s.
> > 
> 
> xfs_reclaim_work_queue() keys off xfs_syncd_centisecs, which corresponds
> to xfs_params.syncd_timer, which is initialized as:
> 
>         .syncd_timer    = {     1*100,          30*100,         7200*100},
> 
> Am I missing something?

static void
xfs_reclaim_work_queue(
        struct xfs_mount        *mp)
{

        rcu_read_lock();
        if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
                queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
                        msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
        }
        rcu_read_unlock();
}

....

> > > > Really, though, we need the inactivation stuff to be done as part of
> > > > the VFS inode lifecycle. I have some ideas on what to do here, but I
> > > > suspect we'll need some changes to iput_final()/evict() to allow us
> > > > to process final unlinks in the bakground and then call evict()
> > > > ourselves when the unlink completes. That way ->destroy_inode() can
> > > > just call xfs_reclaim_inode() to free it directly, which also helps
> > > > us get rid of background inode freeing and hence inode recycling
> > > > from XFS altogether. I think we _might_ be able to do this without
> > > > needing to change any of the logging code in XFS, but I haven't
> > > > looked any further than this into it as yet.
> > > > 
> > > 
> > > ... of whatever this ends up looking like.
> > > 
> > > Can you elaborate on what you mean by processing unlinks in the
> > > background? I can see the value of being able to eliminate the recycle
> > > code in XFS, but wouldn't we still have to limit and throttle against
> > > background work to maintain sustained removal performance?
> > 
> > Yes, but that's irrelevant because all we would be doing is slightly
> > changing where that throttling occurs (i.e. in
> > iput_final->drop_inode instead of iput_final->evict->destroy_inode).
> > 
> > However, moving the throttling up the stack is a good thing because
> > it gets rid of the current problem with the inactivation throttling
> > blocking the shrinker via shrinker->super_cache_scan->
> > prune_icache_sb->dispose_list->evict-> destroy_inode->throttle on
> > full inactivation queue because all the inodes need EOF block
> > trimming to be done.
> > 
> 
> What I'm trying to understand is whether inodes will have cycled through
> the requisite grace period before ->destroy_inode() or not, and if so,

The whole point of moving stuff up in the VFS is that inodes
don't get recycled by XFS at all so we don't even have to think
about RCU grace periods anywhere inside XFS.

> how that is done to avoid the sustained removal performance problem
> we've run into here (caused by the extra latency leading to increasing
> cacheline misses)..?

The background work is done _before_ evict() is called by the VFS to
get the inode freed via RCU callbacks. The perf constraints are
unchanged, we just change the layer at which the background work is
performance.

> > > IOW, what's
> > > the general teardown behavior you're getting at here, aside from what
> > > parts push into the vfs or not?
> > 
> > ->drop_inode() triggers background inactivation for both blockgc and
> > inode unlink. For unlink, we set I_WILL_FREE so the VFS will not
> > attempt to re-use it, add the inode # to the internal AG "busy
> > inode" tree and return drop = true and the VFS then stops processing
> > that inode. For blockgc, we queue the work and return drop = false
> > and the VFS puts it onto the LRU. Now we have asynchronous
> > inactivation while the inode is still present and visible at the VFS
> > level.
> > 
> > For background blockgc - that now happens while the inode is idle on
> > the LRU before it gets reclaimed by the shrinker. i.e. we trigger
> > block gc when the last reference to the inode goes away instead of
> > when it gets removed from memory by the shrinker.
> > 
> > For unlink, that now runs in the bacgrkoud until the inode unlink
> > has been journalled and the cleared inode written to the backing
> > inode cluster buffer. The inode is then no longer visisble to the
> > journal and it can't be reallocated because it is still busy. We
> > then change the inode state from I_WILL_FREE to I_FREEING and call
> > evict(). The inode then gets torn down, and in ->destroy_inode we
> > remove the inode from the radix tree, clear the per-ag busy record
> > and free the inode via RCU as expected by the VFS.
> > 
> 
> Ok, so this sort of sounds like these are separate things. I'm all for
> creating more flexibility with the VFS to allow XFS to remove or
> simplify codepaths, but this still depends on some form of grace period
> tracking to avoid allocation of inodes that are free in the btrees but
> still might have in-core struct inode's laying around, yes?

> The reason I'm asking about this is because as this patch to avoid
> recycling non-expired inodes becomes more complex in order to satisfy
> performance requirements, longer term usefulness becomes more relevant.

You say this like I haven't already thought about this....

> I don't want us to come up with some complex scheme to avoid RCU stalls
> when there's already a plan to rip it out and replace it in a year or
> so. OTOH if the resulting logic is part of that longer term strategy,
> then this is less of a concern.

.... and so maybe you haven't realised why I keep suggesting
something along the lines of a busy inode mechanism similar to busy
extent tracking?

Essentially, we can't reallocate the inode until the previous use
has been retired. Which means we'd create the busy inode record in
xfs_inactive() before we free the inode and xfs_reclaim_inode()
would remove the inode from the busy tree when it reclaims the inode
and removes it from the radix tree after marking it dead for RCU
lookup purposes. That would prevent reallocation of the inode until
we can allocate a new in-core inode structure for the inode.

In the lifted VFS case I describe, ->drop_inode() would result in
background inactivation inserting the inode into the busy tree. Once
that is all done and we call evict() on the inode, ->destroy_inode
calls xfs-reclaim_inode() directly. IOWs, the busy inode mechanism
works for both existing and future inactivation mechanisms.

Now, lets take a step further back from this, and consider the
current inode cache implementation.  The fast and dirty method for
tracking busy inodes is to use the fact that a busy inode is defined
as being in the finobt whilst the in-core inode is in an
IRECLAIMABLE state.

Hence, at least initially, we don't need a separate tree to
determine if an inode is "busy" efficiently. The allocation policy
that selects the inode to allocate doesn't care what mechanism we
use to determine if an inode is busy - it's just concerned with
finding a non-busy inode efficiently. Hence we can use a simple
"best, first, last" hueristic to determine if the finobt is likely
to be largely made up of busy inodes and decide to allocate new
inode chunks instead of searching the finobt for an unbusy inode.

IOWs, the "busy extent tracking" implementation will need to change
to be something more explicit as we move inactivation up in the VFS
because the IRCELAIMABLE state goes away, but that doesn't change
the allocation algorithm or heuristics that are based on detecting
busy inodes at allocation time.


Cheers,

Dave.
Brian Foster Jan. 31, 2022, 1:22 p.m. UTC | #28
On Fri, Jan 28, 2022 at 01:39:11PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > 
> > > > > Right, background inactivation does not improve performance - it's
> > > > > necessary to get the transactions out of the evict() path. All we
> > > > > wanted was to ensure that there were no performance degradations as
> > > > > a result of background inactivation, not that it was faster.
> > > > > 
> > > > > If you want to confirm that there is an increase in cold cache
> > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > where I mention those things to Paul.
> > > > 
> > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > patch is obviously fine.
> > > 
> > > Yes, but not in the near term.
> > > 
> > > > What I really want to avoid is the situation when we are stuck with
> > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > thread do look like that.
> > > 
> > > The simplest way I think is to have the XFS inode allocation track
> > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > is an extent that has been freed by the user, but is not yet marked
> > > free in the journal/on disk. If we try to reallocate that busy
> > > extent, we either select a different free extent to allocate, or if
> > > we can't find any we force the journal to disk, wait for it to
> > > complete (hence unbusying the extents) and retry the allocation
> > > again.
> > > 
> > > We can do something similar for inode allocation - it's actually a
> > > lockless tag lookup on the radix tree entry for the candidate inode
> > > number. If we find the reclaimable radix tree tag set, the we select
> > > a different inode. If we can't allocate a new inode, then we kick
> > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > recycled this time.
> > 
> > I'm starting to poke around this area since it's become clear that the
> > currently proposed scheme just involves too much latency (unless Paul
> > chimes in with his expedited grace period variant, at which point I will
> > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > have pretty much the same pattern of behavior as this patch: one
> > synchronize_rcu() per batch.
> 
> Apologies for being slow, but there have been some distractions.
> One of the distractions was trying to put together atheoretically
> attractive but massively overcomplicated implementation of
> poll_state_synchronize_rcu_expedited().  It currently looks like a
> somewhat suboptimal but much simpler approach is available.  This
> assumes that XFS is not in the picture until after both the scheduler
> and workqueues are operational.
> 

No worries.. I don't think that would be a roadblock for us. ;)

> And yes, the complicated version might prove necessary, but let's
> see if this whole thing is even useful first.  ;-)
> 

Indeed. This patch only really requires a single poll/sync pair of
calls, so assuming the expedited grace period usage plays nice enough
with typical !expedited usage elsewhere in the kernel for some basic
tests, it would be fairly trivial to port this over and at least get an
idea of what the worst case behavior might be with expedited grace
periods, whether it satisfies the existing latency requirements, etc.

Brian

> In the meantime, if you want to look at an extremely unbaked view,
> here you go:
> 
> https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
> 
> 							Thanx, Paul
> 
> > IOW, background reclaim only kicks in after 30s by default, so the pool
> > of free inodes pretty much always consists of 100% reclaimable inodes.
> > On top of that, at smaller batch sizes, the pool tends to have a uniform
> > (!elapsed) grace period cookie, so a stall is required to be able to
> > allocate any of them. As the batch size increases, I do see the
> > population of free inodes start to contain a mix of expired and
> > non-expired grace period cookies. It's fairly easy to hack up an
> > internal icwalk scan to locate already expired inodes, but the problem
> > is that the recycle rate is so much faster than the grace period latency
> > that it doesn't really matter. We'll still have to stall by the time we
> > get to the non-expired inodes, and so we're back to one stall per batch
> > and the same general performance characteristic of this patch.
> > 
> > So given all of this, I'm wondering about something like the following
> > high level inode allocation algorithm:
> > 
> > 1. If the AG has any reclaimable inodes, scan for one with an expired
> > grace period. If found, target that inode for physical allocation.
> > 
> > 2. If the AG free inode count == the AG reclaimable count and we know
> > all reclaimable inodes are most likely pending a grace period (because
> > the previous step failed), allocate a new inode chunk (and target it in
> > this allocation).
> > 
> > 3. If the AG free inode count > the reclaimable count, scan the finobt
> > for an inode that is not present in the radix tree (i.e. Dave's logic
> > above).
> > 
> > Each of those steps could involve some heuristics to maintain
> > predictable behavior and avoid large scans and such, but the general
> > idea is that the repeated alloc/free inode workload naturally populates
> > the AG with enough physical inodes to always be able to satisfy an
> > allocation without waiting on a grace period. IOW, this is effectively
> > similar behavior to if physical inode freeing was delayed to an rcu
> > callback, with the tradeoff of complicating the allocation path rather
> > than stalling in the inactivation pipeline. Thoughts?
> > 
> > This of course is more involved than this patch (or similarly simple
> > variants of RCU delaying preexisting bits of code) and requires some
> > more investigation, but certainly shouldn't be a multi-year thing. The
> > question is probably more of whether it's enough complexity to justify
> > in the meantime...
> > 
> > > > Are there any realistic prospects of having xfs_iget() deal with
> > > > reuse case by allocating new in-core inode and flipping whatever
> > > > references you've got in XFS journalling data structures to the
> > > > new copy?  If I understood what you said on IRC correctly, that is...
> > > 
> > > That's ... much harder.
> > > 
> > > One of the problems is that once an inode has a log item attached to
> > > it, it assumes that it can be accessed without specific locking,
> > > etc. see xfs_inode_clean(), for example. So there's some life-cycle
> > > stuff that needs to be taken care of in XFS first, and the inode <->
> > > log item relationship is tangled.
> > > 
> > > I've been working towards removing that tangle - but taht stuff is
> > > quite a distance down my logging rework patch queue. THat queue has
> > > been stuck now for a year trying to get the first handful of rework
> > > and scalability modifications reviewed and merged, so I'm not
> > > holding my breathe as to how long a more substantial rework of
> > > internal logging code will take to review and merge.
> > > 
> > > Really, though, we need the inactivation stuff to be done as part of
> > > the VFS inode lifecycle. I have some ideas on what to do here, but I
> > > suspect we'll need some changes to iput_final()/evict() to allow us
> > > to process final unlinks in the bakground and then call evict()
> > > ourselves when the unlink completes. That way ->destroy_inode() can
> > > just call xfs_reclaim_inode() to free it directly, which also helps
> > > us get rid of background inode freeing and hence inode recycling
> > > from XFS altogether. I think we _might_ be able to do this without
> > > needing to change any of the logging code in XFS, but I haven't
> > > looked any further than this into it as yet.
> > > 
> > 
> > ... of whatever this ends up looking like.
> > 
> > Can you elaborate on what you mean by processing unlinks in the
> > background? I can see the value of being able to eliminate the recycle
> > code in XFS, but wouldn't we still have to limit and throttle against
> > background work to maintain sustained removal performance? IOW, what's
> > the general teardown behavior you're getting at here, aside from what
> > parts push into the vfs or not?
> > 
> > Brian
> > 
> > > > Again, I'm not asking if it can be done this cycle; having a
> > > > realistic path to doing that eventually would be fine by me.
> > > 
> > > We're talking a year at least, probably two, before we get there...
> > > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com
> > > 
> > 
>
Brian Foster Jan. 31, 2022, 1:28 p.m. UTC | #29
On Sat, Jan 29, 2022 at 10:53:13AM +1100, Dave Chinner wrote:
> On Fri, Jan 28, 2022 at 09:11:07AM -0500, Brian Foster wrote:
> > On Fri, Jan 28, 2022 at 09:18:17AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > > > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > > > 
> > > > > > > Right, background inactivation does not improve performance - it's
> > > > > > > necessary to get the transactions out of the evict() path. All we
> > > > > > > wanted was to ensure that there were no performance degradations as
> > > > > > > a result of background inactivation, not that it was faster.
> > > > > > > 
> > > > > > > If you want to confirm that there is an increase in cold cache
> > > > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > > > where I mention those things to Paul.
> > > > > > 
> > > > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > > > patch is obviously fine.
> > > > > 
> > > > > Yes, but not in the near term.
> > > > > 
> > > > > > What I really want to avoid is the situation when we are stuck with
> > > > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > > > thread do look like that.
> > > > > 
> > > > > The simplest way I think is to have the XFS inode allocation track
> > > > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > > > is an extent that has been freed by the user, but is not yet marked
> > > > > free in the journal/on disk. If we try to reallocate that busy
> > > > > extent, we either select a different free extent to allocate, or if
> > > > > we can't find any we force the journal to disk, wait for it to
> > > > > complete (hence unbusying the extents) and retry the allocation
> > > > > again.
> > > > > 
> > > > > We can do something similar for inode allocation - it's actually a
> > > > > lockless tag lookup on the radix tree entry for the candidate inode
> > > > > number. If we find the reclaimable radix tree tag set, the we select
> > > > > a different inode. If we can't allocate a new inode, then we kick
> > > > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > > > recycled this time.
> > > > > 
> > > > 
> > > > I'm starting to poke around this area since it's become clear that the
> > > > currently proposed scheme just involves too much latency (unless Paul
> > > > chimes in with his expedited grace period variant, at which point I will
> > > > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > > > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > > > have pretty much the same pattern of behavior as this patch: one
> > > > synchronize_rcu() per batch.
> > > 
> > > That's not really what I proposed - what I suggested was that if we
> > > can't allocate a usable inode from the finobt, and we can't allocate
> > > a new inode cluster from the AG (i.e. populate the finobt with more
> > > inodes), only then call synchronise_rcu() and recycle an inode.
> > > 
> > 
> > That's not how I read it... Regardless, that was my suggestion as well,
> > so we're on the same page on that front.
> > 
> > > We don't need to scan the inode cache or the finobt to determine if
> > > there are reclaimable inodes immediately available - do a gang tag
> > > lookup on the radix tree for newino.
> > > If it comes back with an inode number that is not
> > > equal to the node number we looked up, then we can allocate an
> > > newino immediately.
> > > 
> > > If it comes back with newino, then check the first inode in the
> > > finobt. If that comes back with an inode that is not the first inode
> > > in the finobt, we can immediately allocate the first inode in the
> > > finobt. If not, check the last inode. if that fails, assume all
> > > inodes in the finobt need recycling and allocate a new cluster,
> > > pointing newino at it.
> > > 
> > 
> > Hrm, I'll have to think about this some more. I don't mind something
> > like this as a possible scanning allocation algorithm, but I don't love
> > the idea of doing a few predictable btree/radix tree lookups and
> > inferring broader AG state from that, particularly when I think it's
> > possible to get more accurate information in a way that's easier and
> > probably more efficient.
> > 
> > For example, we already have counts of the number of reclaimable and
> > free inodes in the perag. We could fairly easily add a counter to track
> > the subset of reclaimable inodes that are unlinked. With something like
> > that, it's easier to make higher level decisions like when to just
> > allocate a new inode chunk (because the free inode pool consists mostly
> > of reclaimable inodes) or just scanning through the finobt for a good
> > candidate (because there are none or very few unlinked reclaimable
> > inodes relative to the number of free inodes in the btree).
> > 
> > So in general I think the two obvious ends of the spectrum (i.e. the
> > repeated alloc/free workload I'm testing above vs. the tar/cp use case
> > where there are many allocs and few unlinks) are probably the most
> > straightforward to handle and don't require major search algorithm
> > changes.  It's the middle ground (i.e. a large number of free inodes
> > with half or whatever more sitting in the radix tree) that I think
> > requires some more thought and I don't quite have an answer for atm. I
> > don't want to go off allocating new inode chunks too aggressively, but
> > also don't want to turn the finobt allocation algorithm into something
> > like the historical inobt search algorithm with poor worst case
> > behavior.
> > 
> > > Then we get another 64 inodes starting at the newino cursor we can
> > > allocate from while we wait for the current RCU grace period to
> > > expire for inodes already in the reclaimable state. An algorithm
> > > like this will allow the free inode pool to resize automatically
> > > based on the unlink frequency of the workload and RCU grace period
> > > latency...
> > > 
> > > > IOW, background reclaim only kicks in after 30s by default,
> > > 
> > > 5 seconds, by default, not 30s.
> > > 
> > 
> > xfs_reclaim_work_queue() keys off xfs_syncd_centisecs, which corresponds
> > to xfs_params.syncd_timer, which is initialized as:
> > 
> >         .syncd_timer    = {     1*100,          30*100,         7200*100},
> > 
> > Am I missing something?
> 
> static void
> xfs_reclaim_work_queue(
>         struct xfs_mount        *mp)
> {
> 
>         rcu_read_lock();
>         if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
>                 queue_delayed_work(mp->m_reclaim_workqueue, &mp->m_reclaim_work,
>                         msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
>         }
>         rcu_read_unlock();
> }
> 

Ah, thanks.

> ....
> 
> > > > > Really, though, we need the inactivation stuff to be done as part of
> > > > > the VFS inode lifecycle. I have some ideas on what to do here, but I
> > > > > suspect we'll need some changes to iput_final()/evict() to allow us
> > > > > to process final unlinks in the bakground and then call evict()
> > > > > ourselves when the unlink completes. That way ->destroy_inode() can
> > > > > just call xfs_reclaim_inode() to free it directly, which also helps
> > > > > us get rid of background inode freeing and hence inode recycling
> > > > > from XFS altogether. I think we _might_ be able to do this without
> > > > > needing to change any of the logging code in XFS, but I haven't
> > > > > looked any further than this into it as yet.
> > > > > 
> > > > 
> > > > ... of whatever this ends up looking like.
> > > > 
> > > > Can you elaborate on what you mean by processing unlinks in the
> > > > background? I can see the value of being able to eliminate the recycle
> > > > code in XFS, but wouldn't we still have to limit and throttle against
> > > > background work to maintain sustained removal performance?
> > > 
> > > Yes, but that's irrelevant because all we would be doing is slightly
> > > changing where that throttling occurs (i.e. in
> > > iput_final->drop_inode instead of iput_final->evict->destroy_inode).
> > > 
> > > However, moving the throttling up the stack is a good thing because
> > > it gets rid of the current problem with the inactivation throttling
> > > blocking the shrinker via shrinker->super_cache_scan->
> > > prune_icache_sb->dispose_list->evict-> destroy_inode->throttle on
> > > full inactivation queue because all the inodes need EOF block
> > > trimming to be done.
> > > 
> > 
> > What I'm trying to understand is whether inodes will have cycled through
> > the requisite grace period before ->destroy_inode() or not, and if so,
> 
> The whole point of moving stuff up in the VFS is that inodes
> don't get recycled by XFS at all so we don't even have to think
> about RCU grace periods anywhere inside XFS.
> 
> > how that is done to avoid the sustained removal performance problem
> > we've run into here (caused by the extra latency leading to increasing
> > cacheline misses)..?
> 
> The background work is done _before_ evict() is called by the VFS to
> get the inode freed via RCU callbacks. The perf constraints are
> unchanged, we just change the layer at which the background work is
> performance.
> 

Ok.

> > > > IOW, what's
> > > > the general teardown behavior you're getting at here, aside from what
> > > > parts push into the vfs or not?
> > > 
> > > ->drop_inode() triggers background inactivation for both blockgc and
> > > inode unlink. For unlink, we set I_WILL_FREE so the VFS will not
> > > attempt to re-use it, add the inode # to the internal AG "busy
> > > inode" tree and return drop = true and the VFS then stops processing
> > > that inode. For blockgc, we queue the work and return drop = false
> > > and the VFS puts it onto the LRU. Now we have asynchronous
> > > inactivation while the inode is still present and visible at the VFS
> > > level.
> > > 
> > > For background blockgc - that now happens while the inode is idle on
> > > the LRU before it gets reclaimed by the shrinker. i.e. we trigger
> > > block gc when the last reference to the inode goes away instead of
> > > when it gets removed from memory by the shrinker.
> > > 
> > > For unlink, that now runs in the bacgrkoud until the inode unlink
> > > has been journalled and the cleared inode written to the backing
> > > inode cluster buffer. The inode is then no longer visisble to the
> > > journal and it can't be reallocated because it is still busy. We
> > > then change the inode state from I_WILL_FREE to I_FREEING and call
> > > evict(). The inode then gets torn down, and in ->destroy_inode we
> > > remove the inode from the radix tree, clear the per-ag busy record
> > > and free the inode via RCU as expected by the VFS.
> > > 
> > 
> > Ok, so this sort of sounds like these are separate things. I'm all for
> > creating more flexibility with the VFS to allow XFS to remove or
> > simplify codepaths, but this still depends on some form of grace period
> > tracking to avoid allocation of inodes that are free in the btrees but
> > still might have in-core struct inode's laying around, yes?
> 
> > The reason I'm asking about this is because as this patch to avoid
> > recycling non-expired inodes becomes more complex in order to satisfy
> > performance requirements, longer term usefulness becomes more relevant.
> 
> You say this like I haven't already thought about this....
> 
> > I don't want us to come up with some complex scheme to avoid RCU stalls
> > when there's already a plan to rip it out and replace it in a year or
> > so. OTOH if the resulting logic is part of that longer term strategy,
> > then this is less of a concern.
> 
> .... and so maybe you haven't realised why I keep suggesting
> something along the lines of a busy inode mechanism similar to busy
> extent tracking?
> 
> Essentially, we can't reallocate the inode until the previous use
> has been retired. Which means we'd create the busy inode record in
> xfs_inactive() before we free the inode and xfs_reclaim_inode()
> would remove the inode from the busy tree when it reclaims the inode
> and removes it from the radix tree after marking it dead for RCU
> lookup purposes. That would prevent reallocation of the inode until
> we can allocate a new in-core inode structure for the inode.
> 
> In the lifted VFS case I describe, ->drop_inode() would result in
> background inactivation inserting the inode into the busy tree. Once
> that is all done and we call evict() on the inode, ->destroy_inode
> calls xfs-reclaim_inode() directly. IOWs, the busy inode mechanism
> works for both existing and future inactivation mechanisms.
> 

This is what I was trying to understand. The discussion to this point
around eventually moving lifecycle bits into the VFS gave the impression
that the grace period sequence would essentially be hidden from XFS, so
that's why I've been asking how we expect to accomplish that. ISTM
that's not necessarily the case... the notion of a free (on disk) inode
that cannot be used due to a pending grace period still exists, it's
just abstracted as a "busy inode" and used to implement a rule that such
inodes cannot be reallocated until the VFS indicates so. At that point
we reclaim the struct inode so this presumably eliminates the need for
the recycling logic and perhaps various other lifecycle related bits
(that I've not thought through) in XFS, providing further simplification
opportunities, etc.

If I'm following the general idea correctly, this makes more sense to
me. Thanks.

Brian

> Now, lets take a step further back from this, and consider the
> current inode cache implementation.  The fast and dirty method for
> tracking busy inodes is to use the fact that a busy inode is defined
> as being in the finobt whilst the in-core inode is in an
> IRECLAIMABLE state.
> 
> Hence, at least initially, we don't need a separate tree to
> determine if an inode is "busy" efficiently. The allocation policy
> that selects the inode to allocate doesn't care what mechanism we
> use to determine if an inode is busy - it's just concerned with
> finding a non-busy inode efficiently. Hence we can use a simple
> "best, first, last" hueristic to determine if the finobt is likely
> to be largely made up of busy inodes and decide to allocate new
> inode chunks instead of searching the finobt for an unbusy inode.
> 
> IOWs, the "busy extent tracking" implementation will need to change
> to be something more explicit as we move inactivation up in the VFS
> because the IRCELAIMABLE state goes away, but that doesn't change
> the allocation algorithm or heuristics that are based on detecting
> busy inodes at allocation time.
> 
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Paul E. McKenney Feb. 1, 2022, 10 p.m. UTC | #30
On Mon, Jan 31, 2022 at 08:22:43AM -0500, Brian Foster wrote:
> On Fri, Jan 28, 2022 at 01:39:11PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > > 
> > > > > > Right, background inactivation does not improve performance - it's
> > > > > > necessary to get the transactions out of the evict() path. All we
> > > > > > wanted was to ensure that there were no performance degradations as
> > > > > > a result of background inactivation, not that it was faster.
> > > > > > 
> > > > > > If you want to confirm that there is an increase in cold cache
> > > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > > where I mention those things to Paul.
> > > > > 
> > > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > > patch is obviously fine.
> > > > 
> > > > Yes, but not in the near term.
> > > > 
> > > > > What I really want to avoid is the situation when we are stuck with
> > > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > > thread do look like that.
> > > > 
> > > > The simplest way I think is to have the XFS inode allocation track
> > > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > > is an extent that has been freed by the user, but is not yet marked
> > > > free in the journal/on disk. If we try to reallocate that busy
> > > > extent, we either select a different free extent to allocate, or if
> > > > we can't find any we force the journal to disk, wait for it to
> > > > complete (hence unbusying the extents) and retry the allocation
> > > > again.
> > > > 
> > > > We can do something similar for inode allocation - it's actually a
> > > > lockless tag lookup on the radix tree entry for the candidate inode
> > > > number. If we find the reclaimable radix tree tag set, the we select
> > > > a different inode. If we can't allocate a new inode, then we kick
> > > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > > recycled this time.
> > > 
> > > I'm starting to poke around this area since it's become clear that the
> > > currently proposed scheme just involves too much latency (unless Paul
> > > chimes in with his expedited grace period variant, at which point I will
> > > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > > have pretty much the same pattern of behavior as this patch: one
> > > synchronize_rcu() per batch.
> > 
> > Apologies for being slow, but there have been some distractions.
> > One of the distractions was trying to put together atheoretically
> > attractive but massively overcomplicated implementation of
> > poll_state_synchronize_rcu_expedited().  It currently looks like a
> > somewhat suboptimal but much simpler approach is available.  This
> > assumes that XFS is not in the picture until after both the scheduler
> > and workqueues are operational.
> > 
> 
> No worries.. I don't think that would be a roadblock for us. ;)
> 
> > And yes, the complicated version might prove necessary, but let's
> > see if this whole thing is even useful first.  ;-)
> > 
> 
> Indeed. This patch only really requires a single poll/sync pair of
> calls, so assuming the expedited grace period usage plays nice enough
> with typical !expedited usage elsewhere in the kernel for some basic
> tests, it would be fairly trivial to port this over and at least get an
> idea of what the worst case behavior might be with expedited grace
> periods, whether it satisfies the existing latency requirements, etc.
> 
> Brian
> 
> > In the meantime, if you want to look at an extremely unbaked view,
> > here you go:
> > 
> > https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing

And here is a version that passes moderate rcutorture testing.  So no
obvious bugs.  Probably a few non-obvious ones, though!  ;-)

This commit is on -rcu's "dev" branch along with this rcutorture
addition:

cd7bd64af59f ("EXP rcutorture: Test polled expedited grace-period primitives")

I will carry these in -rcu's "dev" branch until at least the upcoming
merge window, fixing bugs as and when they becom apparent.  If I don't
hear otherwise by that time, I will create a tag for it and leave
it behind.

The backport to v5.17-rc2 just requires removing:

	mutex_init(&rnp->boost_kthread_mutex);

From rcu_init_one().  This line is added by this -rcu commit:

02a50b09c31f ("rcu: Add mutex for rcu boost kthread spawning and affinity setting")

Please let me know how it goes!

							Thanx, Paul

------------------------------------------------------------------------

commit dd896a86aebc5b225ceee13fcf1375c7542a5e2d
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon Jan 31 16:55:52 2022 -0800

    EXP rcu: Add polled expedited grace-period primitives
    
    This is an experimental proof of concept of polled expedited grace-period
    functions.  These functions are get_state_synchronize_rcu_expedited(),
    start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
    and cond_synchronize_rcu_expedited(), which are similar to
    get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
    poll_state_synchronize_rcu(), and cond_synchronize_rcu(), respectively.
    
    One limitation is that start_poll_synchronize_rcu_expedited() cannot
    be invoked before workqueues are initialized.
    
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 858f4d429946d..ca139b4b2d25f 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -23,6 +23,26 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
 	might_sleep();
 }
 
+static inline unsigned long get_state_synchronize_rcu_expedited(void)
+{
+	return get_state_synchronize_rcu();
+}
+
+static inline unsigned long start_poll_synchronize_rcu_expedited(void)
+{
+	return start_poll_synchronize_rcu();
+}
+
+static inline bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	return poll_state_synchronize_rcu(oldstate);
+}
+
+static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	cond_synchronize_rcu(oldstate);
+}
+
 extern void rcu_barrier(void);
 
 static inline void synchronize_rcu_expedited(void)
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 76665db179fa1..eb774e9be21bf 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -40,6 +40,10 @@ bool rcu_eqs_special_set(int cpu);
 void rcu_momentary_dyntick_idle(void);
 void kfree_rcu_scheduler_running(void);
 bool rcu_gp_might_be_stalled(void);
+unsigned long get_state_synchronize_rcu_expedited(void);
+unsigned long start_poll_synchronize_rcu_expedited(void);
+bool poll_state_synchronize_rcu_expedited(unsigned long oldstate);
+void cond_synchronize_rcu_expedited(unsigned long oldstate);
 unsigned long get_state_synchronize_rcu(void);
 unsigned long start_poll_synchronize_rcu(void);
 bool poll_state_synchronize_rcu(unsigned long oldstate);
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 24b5f2c2de87b..5b61cf20c91e9 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -23,6 +23,13 @@
 #define RCU_SEQ_CTR_SHIFT	2
 #define RCU_SEQ_STATE_MASK	((1 << RCU_SEQ_CTR_SHIFT) - 1)
 
+/*
+ * Low-order bit definitions for polled grace-period APIs.
+ */
+#define RCU_GET_STATE_FROM_EXPEDITED	0x1
+#define RCU_GET_STATE_USE_NORMAL	0x2
+#define RCU_GET_STATE_BAD_FOR_NORMAL	(RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL)
+
 /*
  * Return the counter portion of a sequence number previously returned
  * by rcu_seq_snap() or rcu_seq_current().
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e6ad532cffe78..5de36abcd7da1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3871,7 +3871,8 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
  */
 bool poll_state_synchronize_rcu(unsigned long oldstate)
 {
-	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
+	if (rcu_seq_done(&rcu_state.gp_seq, oldstate) &&
+	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL)) {
 		smp_mb(); /* Ensure GP ends before subsequent accesses. */
 		return true;
 	}
@@ -3900,7 +3901,8 @@ EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
  */
 void cond_synchronize_rcu(unsigned long oldstate)
 {
-	if (!poll_state_synchronize_rcu(oldstate))
+	if (!poll_state_synchronize_rcu(oldstate) &&
+	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL))
 		synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
@@ -4593,6 +4595,9 @@ static void __init rcu_init_one(void)
 			init_waitqueue_head(&rnp->exp_wq[3]);
 			spin_lock_init(&rnp->exp_lock);
 			mutex_init(&rnp->boost_kthread_mutex);
+			raw_spin_lock_init(&rnp->exp_poll_lock);
+			rnp->exp_seq_poll_rq = 0x1;
+			INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp);
 		}
 	}
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 926673ebe355f..19fc9acce3ce2 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -128,6 +128,10 @@ struct rcu_node {
 	wait_queue_head_t exp_wq[4];
 	struct rcu_exp_work rew;
 	bool exp_need_flush;	/* Need to flush workitem? */
+	raw_spinlock_t exp_poll_lock;
+				/* Lock and data for polled expedited grace periods. */
+	unsigned long exp_seq_poll_rq;
+	struct work_struct exp_poll_wq;
 } ____cacheline_internodealigned_in_smp;
 
 /*
@@ -476,3 +480,6 @@ static void rcu_iw_handler(struct irq_work *iwp);
 static void check_cpu_stall(struct rcu_data *rdp);
 static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 				     const unsigned long gpssdelay);
+
+/* Forward declarations for tree_exp.h. */
+static void sync_rcu_do_polled_gp(struct work_struct *wp);
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 1a45667402260..728896f374fee 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -871,3 +871,154 @@ void synchronize_rcu_expedited(void)
 		destroy_work_on_stack(&rew.rew_work);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/**
+ * get_state_synchronize_rcu_expedited - Snapshot current expedited RCU state
+ *
+ * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
+ * or poll_state_synchronize_rcu_expedited(), allowing them to determine
+ * whether or not a full expedited grace period has elapsed in the meantime.
+ */
+unsigned long get_state_synchronize_rcu_expedited(void)
+{
+	if (rcu_gp_is_normal())
+	return get_state_synchronize_rcu() |
+	       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
+
+	// Any prior manipulation of RCU-protected data must happen
+	// before the load from ->expedited_sequence.
+	smp_mb();  /* ^^^ */
+	return rcu_exp_gp_seq_snap() | RCU_GET_STATE_FROM_EXPEDITED;
+}
+EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_expedited);
+
+/*
+ * Ensure that start_poll_synchronize_rcu_expedited() has the expedited
+ * RCU grace periods that it needs.
+ */
+static void sync_rcu_do_polled_gp(struct work_struct *wp)
+{
+	unsigned long flags;
+	struct rcu_node *rnp = container_of(wp, struct rcu_node, exp_poll_wq);
+	unsigned long s;
+
+	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
+	s = rnp->exp_seq_poll_rq;
+	rnp->exp_seq_poll_rq |= 0x1;
+	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+	if (s & 0x1)
+		return;
+	while (!sync_exp_work_done(s))
+		synchronize_rcu_expedited();
+	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
+	s = rnp->exp_seq_poll_rq;
+	if (!(s & 0x1) && !sync_exp_work_done(s))
+		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
+	else
+		rnp->exp_seq_poll_rq |= 0x1;
+	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+}
+
+/**
+ * start_poll_synchronize_rcu_expedited - Snapshot current expedited RCU state and start grace period
+ *
+ * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
+ * or poll_state_synchronize_rcu_expedited(), allowing them to determine
+ * whether or not a full expedited grace period has elapsed in the meantime.
+ * If the needed grace period is not already slated to start, initiates
+ * that grace period.
+ */
+
+unsigned long start_poll_synchronize_rcu_expedited(void)
+{
+	unsigned long flags;
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+	unsigned long s;
+
+	if (rcu_gp_is_normal())
+		return start_poll_synchronize_rcu_expedited() |
+		       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
+
+	s = rcu_exp_gp_seq_snap();
+	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
+	rnp = rdp->mynode;
+	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
+	if ((rnp->exp_seq_poll_rq & 0x1) || ULONG_CMP_LT(rnp->exp_seq_poll_rq, s)) {
+		rnp->exp_seq_poll_rq = s;
+		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
+	}
+	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+
+	return s | RCU_GET_STATE_FROM_EXPEDITED;
+}
+EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
+
+/**
+ * poll_state_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
+ *
+ * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
+ *
+ * If a full expedited RCU grace period has elapsed since the earlier call
+ * from which oldstate was obtained, return @true, otherwise return @false.
+ * If @false is returned, it is the caller's responsibility to invoke
+ * this function later on until it does return @true.  Alternatively,
+ * the caller can explicitly wait for a grace period, for example, by
+ * passing @oldstate to cond_synchronize_rcu_expedited() or by directly
+ * invoking synchronize_rcu_expedited().
+ *
+ * Yes, this function does not take counter wrap into account.
+ * But counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!).
+ * Those needing to keep oldstate values for very long time periods
+ * (several hours even on 32-bit systems) should check them occasionally
+ * and either refresh them or set a flag indicating that the grace period
+ * has completed.
+ *
+ * This function provides the same memory-ordering guarantees that would
+ * be provided by a synchronize_rcu_expedited() that was invoked at the
+ * call to the function that provided @oldstate, and that returned at the
+ * end of this function.
+ */
+bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
+	if (oldstate & RCU_GET_STATE_USE_NORMAL)
+		return poll_state_synchronize_rcu(oldstate & ~RCU_GET_STATE_BAD_FOR_NORMAL);
+	if (!rcu_exp_gp_seq_done(oldstate & ~RCU_SEQ_STATE_MASK))
+		return false;
+	smp_mb(); /* Ensure GP ends before subsequent accesses. */
+	return true;
+}
+EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu_expedited);
+
+/**
+ * cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
+ *
+ * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
+ *
+ * If a full expedited RCU grace period has elapsed since the earlier
+ * call from which oldstate was obtained, just return.  Otherwise, invoke
+ * synchronize_rcu_expedited() to wait for a full grace period.
+ *
+ * Yes, this function does not take counter wrap into account.  But
+ * counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!),
+ * so waiting for one additional grace period should be just fine.
+ *
+ * This function provides the same memory-ordering guarantees that would
+ * be provided by a synchronize_rcu_expedited() that was invoked at the
+ * call to the function that provided @oldstate, and that returned at the
+ * end of this function.
+ */
+void cond_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
+	if (poll_state_synchronize_rcu_expedited(oldstate))
+		return;
+	if (oldstate & RCU_GET_STATE_USE_NORMAL)
+		synchronize_rcu_expedited();
+	else
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
Paul E. McKenney Feb. 3, 2022, 6:49 p.m. UTC | #31
On Tue, Feb 01, 2022 at 02:00:28PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 31, 2022 at 08:22:43AM -0500, Brian Foster wrote:
> > On Fri, Jan 28, 2022 at 01:39:11PM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > > > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > > > 
> > > > > > > Right, background inactivation does not improve performance - it's
> > > > > > > necessary to get the transactions out of the evict() path. All we
> > > > > > > wanted was to ensure that there were no performance degradations as
> > > > > > > a result of background inactivation, not that it was faster.
> > > > > > > 
> > > > > > > If you want to confirm that there is an increase in cold cache
> > > > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > > > where I mention those things to Paul.
> > > > > > 
> > > > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > > > patch is obviously fine.
> > > > > 
> > > > > Yes, but not in the near term.
> > > > > 
> > > > > > What I really want to avoid is the situation when we are stuck with
> > > > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > > > thread do look like that.
> > > > > 
> > > > > The simplest way I think is to have the XFS inode allocation track
> > > > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > > > is an extent that has been freed by the user, but is not yet marked
> > > > > free in the journal/on disk. If we try to reallocate that busy
> > > > > extent, we either select a different free extent to allocate, or if
> > > > > we can't find any we force the journal to disk, wait for it to
> > > > > complete (hence unbusying the extents) and retry the allocation
> > > > > again.
> > > > > 
> > > > > We can do something similar for inode allocation - it's actually a
> > > > > lockless tag lookup on the radix tree entry for the candidate inode
> > > > > number. If we find the reclaimable radix tree tag set, the we select
> > > > > a different inode. If we can't allocate a new inode, then we kick
> > > > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > > > recycled this time.
> > > > 
> > > > I'm starting to poke around this area since it's become clear that the
> > > > currently proposed scheme just involves too much latency (unless Paul
> > > > chimes in with his expedited grace period variant, at which point I will
> > > > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > > > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > > > have pretty much the same pattern of behavior as this patch: one
> > > > synchronize_rcu() per batch.
> > > 
> > > Apologies for being slow, but there have been some distractions.
> > > One of the distractions was trying to put together atheoretically
> > > attractive but massively overcomplicated implementation of
> > > poll_state_synchronize_rcu_expedited().  It currently looks like a
> > > somewhat suboptimal but much simpler approach is available.  This
> > > assumes that XFS is not in the picture until after both the scheduler
> > > and workqueues are operational.
> > > 
> > 
> > No worries.. I don't think that would be a roadblock for us. ;)
> > 
> > > And yes, the complicated version might prove necessary, but let's
> > > see if this whole thing is even useful first.  ;-)
> > > 
> > 
> > Indeed. This patch only really requires a single poll/sync pair of
> > calls, so assuming the expedited grace period usage plays nice enough
> > with typical !expedited usage elsewhere in the kernel for some basic
> > tests, it would be fairly trivial to port this over and at least get an
> > idea of what the worst case behavior might be with expedited grace
> > periods, whether it satisfies the existing latency requirements, etc.
> > 
> > Brian
> > 
> > > In the meantime, if you want to look at an extremely unbaked view,
> > > here you go:
> > > 
> > > https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
> 
> And here is a version that passes moderate rcutorture testing.  So no
> obvious bugs.  Probably a few non-obvious ones, though!  ;-)
> 
> This commit is on -rcu's "dev" branch along with this rcutorture
> addition:
> 
> cd7bd64af59f ("EXP rcutorture: Test polled expedited grace-period primitives")
> 
> I will carry these in -rcu's "dev" branch until at least the upcoming
> merge window, fixing bugs as and when they becom apparent.  If I don't
> hear otherwise by that time, I will create a tag for it and leave
> it behind.
> 
> The backport to v5.17-rc2 just requires removing:
> 
> 	mutex_init(&rnp->boost_kthread_mutex);
> 
> From rcu_init_one().  This line is added by this -rcu commit:
> 
> 02a50b09c31f ("rcu: Add mutex for rcu boost kthread spawning and affinity setting")

And with some alleged fixes of issues Neeraj found when reviewing this,
perhaps most notably the ability to run on real-time kernels booted
with rcupdate.rcu_normal=1.  This version passes reasonably heavy-duty
rcutorture testing.  Must mean bugs in rcutorture...  :-/

f93fa07011bd ("EXP rcu: Add polled expedited grace-period primitives")

Again, please let me know how it goes!

							Thanx, Paul

------------------------------------------------------------------------

commit f93fa07011bd2460f222e570d17968baff21fa90
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Mon Jan 31 16:55:52 2022 -0800

    EXP rcu: Add polled expedited grace-period primitives
    
    This is an experimental proof of concept of polled expedited grace-period
    functions.  These functions are get_state_synchronize_rcu_expedited(),
    start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
    and cond_synchronize_rcu_expedited(), which are similar to
    get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
    poll_state_synchronize_rcu(), and cond_synchronize_rcu(), respectively.
    
    One limitation is that start_poll_synchronize_rcu_expedited() cannot
    be invoked before workqueues are initialized.
    
    Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
    Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
    Cc: Brian Foster <bfoster@redhat.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Ian Kent <raven@themaw.net>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 858f4d429946d..ca139b4b2d25f 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -23,6 +23,26 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
 	might_sleep();
 }
 
+static inline unsigned long get_state_synchronize_rcu_expedited(void)
+{
+	return get_state_synchronize_rcu();
+}
+
+static inline unsigned long start_poll_synchronize_rcu_expedited(void)
+{
+	return start_poll_synchronize_rcu();
+}
+
+static inline bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	return poll_state_synchronize_rcu(oldstate);
+}
+
+static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	cond_synchronize_rcu(oldstate);
+}
+
 extern void rcu_barrier(void);
 
 static inline void synchronize_rcu_expedited(void)
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 76665db179fa1..eb774e9be21bf 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -40,6 +40,10 @@ bool rcu_eqs_special_set(int cpu);
 void rcu_momentary_dyntick_idle(void);
 void kfree_rcu_scheduler_running(void);
 bool rcu_gp_might_be_stalled(void);
+unsigned long get_state_synchronize_rcu_expedited(void);
+unsigned long start_poll_synchronize_rcu_expedited(void);
+bool poll_state_synchronize_rcu_expedited(unsigned long oldstate);
+void cond_synchronize_rcu_expedited(unsigned long oldstate);
 unsigned long get_state_synchronize_rcu(void);
 unsigned long start_poll_synchronize_rcu(void);
 bool poll_state_synchronize_rcu(unsigned long oldstate);
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 24b5f2c2de87b..5b61cf20c91e9 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -23,6 +23,13 @@
 #define RCU_SEQ_CTR_SHIFT	2
 #define RCU_SEQ_STATE_MASK	((1 << RCU_SEQ_CTR_SHIFT) - 1)
 
+/*
+ * Low-order bit definitions for polled grace-period APIs.
+ */
+#define RCU_GET_STATE_FROM_EXPEDITED	0x1
+#define RCU_GET_STATE_USE_NORMAL	0x2
+#define RCU_GET_STATE_BAD_FOR_NORMAL	(RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL)
+
 /*
  * Return the counter portion of a sequence number previously returned
  * by rcu_seq_snap() or rcu_seq_current().
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e6ad532cffe78..135d5e2bce879 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3871,7 +3871,8 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
  */
 bool poll_state_synchronize_rcu(unsigned long oldstate)
 {
-	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
+	if (rcu_seq_done(&rcu_state.gp_seq, oldstate) &&
+	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL)) {
 		smp_mb(); /* Ensure GP ends before subsequent accesses. */
 		return true;
 	}
@@ -3900,7 +3901,8 @@ EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
  */
 void cond_synchronize_rcu(unsigned long oldstate)
 {
-	if (!poll_state_synchronize_rcu(oldstate))
+	if (!poll_state_synchronize_rcu(oldstate) ||
+	    WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL))
 		synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
@@ -4593,6 +4595,9 @@ static void __init rcu_init_one(void)
 			init_waitqueue_head(&rnp->exp_wq[3]);
 			spin_lock_init(&rnp->exp_lock);
 			mutex_init(&rnp->boost_kthread_mutex);
+			raw_spin_lock_init(&rnp->exp_poll_lock);
+			rnp->exp_seq_poll_rq = 0x1;
+			INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp);
 		}
 	}
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 926673ebe355f..19fc9acce3ce2 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -128,6 +128,10 @@ struct rcu_node {
 	wait_queue_head_t exp_wq[4];
 	struct rcu_exp_work rew;
 	bool exp_need_flush;	/* Need to flush workitem? */
+	raw_spinlock_t exp_poll_lock;
+				/* Lock and data for polled expedited grace periods. */
+	unsigned long exp_seq_poll_rq;
+	struct work_struct exp_poll_wq;
 } ____cacheline_internodealigned_in_smp;
 
 /*
@@ -476,3 +480,6 @@ static void rcu_iw_handler(struct irq_work *iwp);
 static void check_cpu_stall(struct rcu_data *rdp);
 static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
 				     const unsigned long gpssdelay);
+
+/* Forward declarations for tree_exp.h. */
+static void sync_rcu_do_polled_gp(struct work_struct *wp);
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 1a45667402260..4041988086830 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -871,3 +871,154 @@ void synchronize_rcu_expedited(void)
 		destroy_work_on_stack(&rew.rew_work);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/**
+ * get_state_synchronize_rcu_expedited - Snapshot current expedited RCU state
+ *
+ * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
+ * or poll_state_synchronize_rcu_expedited(), allowing them to determine
+ * whether or not a full expedited grace period has elapsed in the meantime.
+ */
+unsigned long get_state_synchronize_rcu_expedited(void)
+{
+	if (rcu_gp_is_normal())
+		return get_state_synchronize_rcu() |
+		       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
+
+	// Any prior manipulation of RCU-protected data must happen
+	// before the load from ->expedited_sequence, and this ordering is
+	// provided by rcu_exp_gp_seq_snap().
+	return rcu_exp_gp_seq_snap() | RCU_GET_STATE_FROM_EXPEDITED;
+}
+EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_expedited);
+
+/*
+ * Ensure that start_poll_synchronize_rcu_expedited() has the expedited
+ * RCU grace periods that it needs.
+ */
+static void sync_rcu_do_polled_gp(struct work_struct *wp)
+{
+	unsigned long flags;
+	struct rcu_node *rnp = container_of(wp, struct rcu_node, exp_poll_wq);
+	unsigned long s;
+
+	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
+	s = rnp->exp_seq_poll_rq;
+	rnp->exp_seq_poll_rq |= 0x1;
+	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+	if (s & 0x1)
+		return;
+	while (!sync_exp_work_done(s))
+		synchronize_rcu_expedited();
+	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
+	s = rnp->exp_seq_poll_rq;
+	if (!(s & 0x1) && !sync_exp_work_done(s))
+		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
+	else
+		rnp->exp_seq_poll_rq |= 0x1;
+	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+}
+
+/**
+ * start_poll_synchronize_rcu_expedited - Snapshot current expedited RCU state and start grace period
+ *
+ * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
+ * or poll_state_synchronize_rcu_expedited(), allowing them to determine
+ * whether or not a full expedited grace period has elapsed in the meantime.
+ * If the needed grace period is not already slated to start, initiates
+ * that grace period.
+ */
+
+unsigned long start_poll_synchronize_rcu_expedited(void)
+{
+	unsigned long flags;
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+	unsigned long s;
+
+	if (rcu_gp_is_normal())
+		return start_poll_synchronize_rcu() |
+		       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
+
+	s = rcu_exp_gp_seq_snap();
+	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
+	rnp = rdp->mynode;
+	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
+	if ((rnp->exp_seq_poll_rq & 0x1) || ULONG_CMP_LT(rnp->exp_seq_poll_rq, s)) {
+		rnp->exp_seq_poll_rq = s;
+		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
+	}
+	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
+
+	return s | RCU_GET_STATE_FROM_EXPEDITED;
+}
+EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
+
+/**
+ * poll_state_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
+ *
+ * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
+ *
+ * If a full expedited RCU grace period has elapsed since the earlier call
+ * from which oldstate was obtained, return @true, otherwise return @false.
+ * If @false is returned, it is the caller's responsibility to invoke
+ * this function later on until it does return @true.  Alternatively,
+ * the caller can explicitly wait for a grace period, for example, by
+ * passing @oldstate to cond_synchronize_rcu_expedited() or by directly
+ * invoking synchronize_rcu_expedited().
+ *
+ * Yes, this function does not take counter wrap into account.
+ * But counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!).
+ * Those needing to keep oldstate values for very long time periods
+ * (several hours even on 32-bit systems) should check them occasionally
+ * and either refresh them or set a flag indicating that the grace period
+ * has completed.
+ *
+ * This function provides the same memory-ordering guarantees that would
+ * be provided by a synchronize_rcu_expedited() that was invoked at the
+ * call to the function that provided @oldstate, and that returned at the
+ * end of this function.
+ */
+bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
+	if (oldstate & RCU_GET_STATE_USE_NORMAL)
+		return poll_state_synchronize_rcu(oldstate & ~RCU_GET_STATE_BAD_FOR_NORMAL);
+	if (!rcu_exp_gp_seq_done(oldstate & ~RCU_SEQ_STATE_MASK))
+		return false;
+	smp_mb(); /* Ensure GP ends before subsequent accesses. */
+	return true;
+}
+EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu_expedited);
+
+/**
+ * cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
+ *
+ * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
+ *
+ * If a full expedited RCU grace period has elapsed since the earlier
+ * call from which oldstate was obtained, just return.  Otherwise, invoke
+ * synchronize_rcu_expedited() to wait for a full grace period.
+ *
+ * Yes, this function does not take counter wrap into account.  But
+ * counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!),
+ * so waiting for one additional grace period should be just fine.
+ *
+ * This function provides the same memory-ordering guarantees that would
+ * be provided by a synchronize_rcu_expedited() that was invoked at the
+ * call to the function that provided @oldstate, and that returned at the
+ * end of this function.
+ */
+void cond_synchronize_rcu_expedited(unsigned long oldstate)
+{
+	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
+	if (poll_state_synchronize_rcu_expedited(oldstate))
+		return;
+	if (oldstate & RCU_GET_STATE_USE_NORMAL)
+		synchronize_rcu();
+	else
+		synchronize_rcu_expedited();
+}
+EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
Brian Foster Feb. 7, 2022, 1:30 p.m. UTC | #32
On Tue, Feb 01, 2022 at 02:00:28PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 31, 2022 at 08:22:43AM -0500, Brian Foster wrote:
> > On Fri, Jan 28, 2022 at 01:39:11PM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > > > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > > > 
> > > > > > > Right, background inactivation does not improve performance - it's
> > > > > > > necessary to get the transactions out of the evict() path. All we
> > > > > > > wanted was to ensure that there were no performance degradations as
> > > > > > > a result of background inactivation, not that it was faster.
> > > > > > > 
> > > > > > > If you want to confirm that there is an increase in cold cache
> > > > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > > > where I mention those things to Paul.
> > > > > > 
> > > > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > > > patch is obviously fine.
> > > > > 
> > > > > Yes, but not in the near term.
> > > > > 
> > > > > > What I really want to avoid is the situation when we are stuck with
> > > > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > > > thread do look like that.
> > > > > 
> > > > > The simplest way I think is to have the XFS inode allocation track
> > > > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > > > is an extent that has been freed by the user, but is not yet marked
> > > > > free in the journal/on disk. If we try to reallocate that busy
> > > > > extent, we either select a different free extent to allocate, or if
> > > > > we can't find any we force the journal to disk, wait for it to
> > > > > complete (hence unbusying the extents) and retry the allocation
> > > > > again.
> > > > > 
> > > > > We can do something similar for inode allocation - it's actually a
> > > > > lockless tag lookup on the radix tree entry for the candidate inode
> > > > > number. If we find the reclaimable radix tree tag set, the we select
> > > > > a different inode. If we can't allocate a new inode, then we kick
> > > > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > > > recycled this time.
> > > > 
> > > > I'm starting to poke around this area since it's become clear that the
> > > > currently proposed scheme just involves too much latency (unless Paul
> > > > chimes in with his expedited grace period variant, at which point I will
> > > > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > > > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > > > have pretty much the same pattern of behavior as this patch: one
> > > > synchronize_rcu() per batch.
> > > 
> > > Apologies for being slow, but there have been some distractions.
> > > One of the distractions was trying to put together atheoretically
> > > attractive but massively overcomplicated implementation of
> > > poll_state_synchronize_rcu_expedited().  It currently looks like a
> > > somewhat suboptimal but much simpler approach is available.  This
> > > assumes that XFS is not in the picture until after both the scheduler
> > > and workqueues are operational.
> > > 
> > 
> > No worries.. I don't think that would be a roadblock for us. ;)
> > 
> > > And yes, the complicated version might prove necessary, but let's
> > > see if this whole thing is even useful first.  ;-)
> > > 
> > 
> > Indeed. This patch only really requires a single poll/sync pair of
> > calls, so assuming the expedited grace period usage plays nice enough
> > with typical !expedited usage elsewhere in the kernel for some basic
> > tests, it would be fairly trivial to port this over and at least get an
> > idea of what the worst case behavior might be with expedited grace
> > periods, whether it satisfies the existing latency requirements, etc.
> > 
> > Brian
> > 
> > > In the meantime, if you want to look at an extremely unbaked view,
> > > here you go:
> > > 
> > > https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
> 
> And here is a version that passes moderate rcutorture testing.  So no
> obvious bugs.  Probably a few non-obvious ones, though!  ;-)
> 
> This commit is on -rcu's "dev" branch along with this rcutorture
> addition:
> 
> cd7bd64af59f ("EXP rcutorture: Test polled expedited grace-period primitives")
> 
> I will carry these in -rcu's "dev" branch until at least the upcoming
> merge window, fixing bugs as and when they becom apparent.  If I don't
> hear otherwise by that time, I will create a tag for it and leave
> it behind.
> 
> The backport to v5.17-rc2 just requires removing:
> 
> 	mutex_init(&rnp->boost_kthread_mutex);
> 
> From rcu_init_one().  This line is added by this -rcu commit:
> 
> 02a50b09c31f ("rcu: Add mutex for rcu boost kthread spawning and affinity setting")
> 
> Please let me know how it goes!
> 

Thanks Paul. I gave this a whirl with a ported variant of this patch on
top. There is definitely a notable improvement with the expedited grace
periods. A few quick runs of the same batched alloc/free test (i.e. 10
sample) I had run against the original version:

batch	baseline	baseline+bg	test	test+bg

1	889954		210075		552911	25540
4	879540		212740		575356	24624
8	924928		213568		496992	26080
16	922960		211504		518496	24592
32	844832		219744		524672	28608
64	579968		196544		358720	24128
128	667392		195840		397696	22400
256	624896		197888		376320	31232
512	572928		204800		382464	46080
1024	549888		174080		379904	73728
2048	522240		174080		350208	106496
4096	536576		167936		360448	131072

So this shows a major improvement in the case where the system is
otherwise idle. We still aren't quite at the baseline numbers, but
that's not really the goal here because those numbers are partly driven
by the fact that we unsafely reuse recently freed inodes in cases where
proper behavior would be to allocate new inode chunks for a period of
time. The core test numbers are much closer to the single threaded
allocation rate (55k-65k inodes/sec) on this setup, so that is quite
positive.

The "bg" variants are the same tests with 64 tasks doing unrelated
pathwalk listings on a kernel source tree (on separate storage)
concurrently in the background. The purpose of this was just to generate
background (rcu) activity in the form of pathname lookups and whatnot
and see how that impacts the results. This clearly affects both kernels,
but the test kernel drops down closer to numbers reminiscent of the
non-expedited grace period variant. Note that this impact seems to scale
with increased background workload. With a similar test running only 8
background tasks, the test kernel is pretty consistently in the
225k-250k (per 10s) range across the set of batch sizes. That's about
half the core test rate, so still not as terrible as the original
variant. ;)

In any event, this probably requires some thought/discussion (and more
testing) on whether this is considered an acceptable change or whether
we want to explore options to mitigate this further. I am still playing
with some ideas to potentially mitigate grace period latency, so it
might be worth seeing if anything useful falls out of that as well.
Thoughts appreciated...

Brian

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit dd896a86aebc5b225ceee13fcf1375c7542a5e2d
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Mon Jan 31 16:55:52 2022 -0800
> 
>     EXP rcu: Add polled expedited grace-period primitives
>     
>     This is an experimental proof of concept of polled expedited grace-period
>     functions.  These functions are get_state_synchronize_rcu_expedited(),
>     start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
>     and cond_synchronize_rcu_expedited(), which are similar to
>     get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
>     poll_state_synchronize_rcu(), and cond_synchronize_rcu(), respectively.
>     
>     One limitation is that start_poll_synchronize_rcu_expedited() cannot
>     be invoked before workqueues are initialized.
>     
>     Cc: Brian Foster <bfoster@redhat.com>
>     Cc: Dave Chinner <david@fromorbit.com>
>     Cc: Al Viro <viro@zeniv.linux.org.uk>
>     Cc: Ian Kent <raven@themaw.net>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 858f4d429946d..ca139b4b2d25f 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -23,6 +23,26 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
>  	might_sleep();
>  }
>  
> +static inline unsigned long get_state_synchronize_rcu_expedited(void)
> +{
> +	return get_state_synchronize_rcu();
> +}
> +
> +static inline unsigned long start_poll_synchronize_rcu_expedited(void)
> +{
> +	return start_poll_synchronize_rcu();
> +}
> +
> +static inline bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
> +{
> +	return poll_state_synchronize_rcu(oldstate);
> +}
> +
> +static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
> +{
> +	cond_synchronize_rcu(oldstate);
> +}
> +
>  extern void rcu_barrier(void);
>  
>  static inline void synchronize_rcu_expedited(void)
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 76665db179fa1..eb774e9be21bf 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -40,6 +40,10 @@ bool rcu_eqs_special_set(int cpu);
>  void rcu_momentary_dyntick_idle(void);
>  void kfree_rcu_scheduler_running(void);
>  bool rcu_gp_might_be_stalled(void);
> +unsigned long get_state_synchronize_rcu_expedited(void);
> +unsigned long start_poll_synchronize_rcu_expedited(void);
> +bool poll_state_synchronize_rcu_expedited(unsigned long oldstate);
> +void cond_synchronize_rcu_expedited(unsigned long oldstate);
>  unsigned long get_state_synchronize_rcu(void);
>  unsigned long start_poll_synchronize_rcu(void);
>  bool poll_state_synchronize_rcu(unsigned long oldstate);
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 24b5f2c2de87b..5b61cf20c91e9 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -23,6 +23,13 @@
>  #define RCU_SEQ_CTR_SHIFT	2
>  #define RCU_SEQ_STATE_MASK	((1 << RCU_SEQ_CTR_SHIFT) - 1)
>  
> +/*
> + * Low-order bit definitions for polled grace-period APIs.
> + */
> +#define RCU_GET_STATE_FROM_EXPEDITED	0x1
> +#define RCU_GET_STATE_USE_NORMAL	0x2
> +#define RCU_GET_STATE_BAD_FOR_NORMAL	(RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL)
> +
>  /*
>   * Return the counter portion of a sequence number previously returned
>   * by rcu_seq_snap() or rcu_seq_current().
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index e6ad532cffe78..5de36abcd7da1 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3871,7 +3871,8 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
>   */
>  bool poll_state_synchronize_rcu(unsigned long oldstate)
>  {
> -	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate) &&
> +	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL)) {
>  		smp_mb(); /* Ensure GP ends before subsequent accesses. */
>  		return true;
>  	}
> @@ -3900,7 +3901,8 @@ EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
>   */
>  void cond_synchronize_rcu(unsigned long oldstate)
>  {
> -	if (!poll_state_synchronize_rcu(oldstate))
> +	if (!poll_state_synchronize_rcu(oldstate) &&
> +	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL))
>  		synchronize_rcu();
>  }
>  EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
> @@ -4593,6 +4595,9 @@ static void __init rcu_init_one(void)
>  			init_waitqueue_head(&rnp->exp_wq[3]);
>  			spin_lock_init(&rnp->exp_lock);
>  			mutex_init(&rnp->boost_kthread_mutex);
> +			raw_spin_lock_init(&rnp->exp_poll_lock);
> +			rnp->exp_seq_poll_rq = 0x1;
> +			INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp);
>  		}
>  	}
>  
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 926673ebe355f..19fc9acce3ce2 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -128,6 +128,10 @@ struct rcu_node {
>  	wait_queue_head_t exp_wq[4];
>  	struct rcu_exp_work rew;
>  	bool exp_need_flush;	/* Need to flush workitem? */
> +	raw_spinlock_t exp_poll_lock;
> +				/* Lock and data for polled expedited grace periods. */
> +	unsigned long exp_seq_poll_rq;
> +	struct work_struct exp_poll_wq;
>  } ____cacheline_internodealigned_in_smp;
>  
>  /*
> @@ -476,3 +480,6 @@ static void rcu_iw_handler(struct irq_work *iwp);
>  static void check_cpu_stall(struct rcu_data *rdp);
>  static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
>  				     const unsigned long gpssdelay);
> +
> +/* Forward declarations for tree_exp.h. */
> +static void sync_rcu_do_polled_gp(struct work_struct *wp);
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 1a45667402260..728896f374fee 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -871,3 +871,154 @@ void synchronize_rcu_expedited(void)
>  		destroy_work_on_stack(&rew.rew_work);
>  }
>  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
> +
> +/**
> + * get_state_synchronize_rcu_expedited - Snapshot current expedited RCU state
> + *
> + * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
> + * or poll_state_synchronize_rcu_expedited(), allowing them to determine
> + * whether or not a full expedited grace period has elapsed in the meantime.
> + */
> +unsigned long get_state_synchronize_rcu_expedited(void)
> +{
> +	if (rcu_gp_is_normal())
> +	return get_state_synchronize_rcu() |
> +	       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
> +
> +	// Any prior manipulation of RCU-protected data must happen
> +	// before the load from ->expedited_sequence.
> +	smp_mb();  /* ^^^ */
> +	return rcu_exp_gp_seq_snap() | RCU_GET_STATE_FROM_EXPEDITED;
> +}
> +EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_expedited);
> +
> +/*
> + * Ensure that start_poll_synchronize_rcu_expedited() has the expedited
> + * RCU grace periods that it needs.
> + */
> +static void sync_rcu_do_polled_gp(struct work_struct *wp)
> +{
> +	unsigned long flags;
> +	struct rcu_node *rnp = container_of(wp, struct rcu_node, exp_poll_wq);
> +	unsigned long s;
> +
> +	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
> +	s = rnp->exp_seq_poll_rq;
> +	rnp->exp_seq_poll_rq |= 0x1;
> +	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
> +	if (s & 0x1)
> +		return;
> +	while (!sync_exp_work_done(s))
> +		synchronize_rcu_expedited();
> +	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
> +	s = rnp->exp_seq_poll_rq;
> +	if (!(s & 0x1) && !sync_exp_work_done(s))
> +		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
> +	else
> +		rnp->exp_seq_poll_rq |= 0x1;
> +	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
> +}
> +
> +/**
> + * start_poll_synchronize_rcu_expedited - Snapshot current expedited RCU state and start grace period
> + *
> + * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
> + * or poll_state_synchronize_rcu_expedited(), allowing them to determine
> + * whether or not a full expedited grace period has elapsed in the meantime.
> + * If the needed grace period is not already slated to start, initiates
> + * that grace period.
> + */
> +
> +unsigned long start_poll_synchronize_rcu_expedited(void)
> +{
> +	unsigned long flags;
> +	struct rcu_data *rdp;
> +	struct rcu_node *rnp;
> +	unsigned long s;
> +
> +	if (rcu_gp_is_normal())
> +		return start_poll_synchronize_rcu_expedited() |
> +		       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
> +
> +	s = rcu_exp_gp_seq_snap();
> +	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
> +	rnp = rdp->mynode;
> +	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
> +	if ((rnp->exp_seq_poll_rq & 0x1) || ULONG_CMP_LT(rnp->exp_seq_poll_rq, s)) {
> +		rnp->exp_seq_poll_rq = s;
> +		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
> +	}
> +	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
> +
> +	return s | RCU_GET_STATE_FROM_EXPEDITED;
> +}
> +EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
> +
> +/**
> + * poll_state_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
> + *
> + * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
> + *
> + * If a full expedited RCU grace period has elapsed since the earlier call
> + * from which oldstate was obtained, return @true, otherwise return @false.
> + * If @false is returned, it is the caller's responsibility to invoke
> + * this function later on until it does return @true.  Alternatively,
> + * the caller can explicitly wait for a grace period, for example, by
> + * passing @oldstate to cond_synchronize_rcu_expedited() or by directly
> + * invoking synchronize_rcu_expedited().
> + *
> + * Yes, this function does not take counter wrap into account.
> + * But counter wrap is harmless.  If the counter wraps, we have waited for
> + * more than 2 billion grace periods (and way more on a 64-bit system!).
> + * Those needing to keep oldstate values for very long time periods
> + * (several hours even on 32-bit systems) should check them occasionally
> + * and either refresh them or set a flag indicating that the grace period
> + * has completed.
> + *
> + * This function provides the same memory-ordering guarantees that would
> + * be provided by a synchronize_rcu_expedited() that was invoked at the
> + * call to the function that provided @oldstate, and that returned at the
> + * end of this function.
> + */
> +bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
> +{
> +	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
> +	if (oldstate & RCU_GET_STATE_USE_NORMAL)
> +		return poll_state_synchronize_rcu(oldstate & ~RCU_GET_STATE_BAD_FOR_NORMAL);
> +	if (!rcu_exp_gp_seq_done(oldstate & ~RCU_SEQ_STATE_MASK))
> +		return false;
> +	smp_mb(); /* Ensure GP ends before subsequent accesses. */
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu_expedited);
> +
> +/**
> + * cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
> + *
> + * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
> + *
> + * If a full expedited RCU grace period has elapsed since the earlier
> + * call from which oldstate was obtained, just return.  Otherwise, invoke
> + * synchronize_rcu_expedited() to wait for a full grace period.
> + *
> + * Yes, this function does not take counter wrap into account.  But
> + * counter wrap is harmless.  If the counter wraps, we have waited for
> + * more than 2 billion grace periods (and way more on a 64-bit system!),
> + * so waiting for one additional grace period should be just fine.
> + *
> + * This function provides the same memory-ordering guarantees that would
> + * be provided by a synchronize_rcu_expedited() that was invoked at the
> + * call to the function that provided @oldstate, and that returned at the
> + * end of this function.
> + */
> +void cond_synchronize_rcu_expedited(unsigned long oldstate)
> +{
> +	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
> +	if (poll_state_synchronize_rcu_expedited(oldstate))
> +		return;
> +	if (oldstate & RCU_GET_STATE_USE_NORMAL)
> +		synchronize_rcu_expedited();
> +	else
> +		synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
>
Paul E. McKenney Feb. 7, 2022, 4:36 p.m. UTC | #33
On Mon, Feb 07, 2022 at 08:30:03AM -0500, Brian Foster wrote:
> On Tue, Feb 01, 2022 at 02:00:28PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 31, 2022 at 08:22:43AM -0500, Brian Foster wrote:
> > > On Fri, Jan 28, 2022 at 01:39:11PM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 27, 2022 at 02:01:25PM -0500, Brian Foster wrote:
> > > > > On Thu, Jan 27, 2022 at 04:26:09PM +1100, Dave Chinner wrote:
> > > > > > On Thu, Jan 27, 2022 at 04:19:34AM +0000, Al Viro wrote:
> > > > > > > On Wed, Jan 26, 2022 at 09:45:51AM +1100, Dave Chinner wrote:
> > > > > > > 
> > > > > > > > Right, background inactivation does not improve performance - it's
> > > > > > > > necessary to get the transactions out of the evict() path. All we
> > > > > > > > wanted was to ensure that there were no performance degradations as
> > > > > > > > a result of background inactivation, not that it was faster.
> > > > > > > > 
> > > > > > > > If you want to confirm that there is an increase in cold cache
> > > > > > > > access when the batch size is increased, cpu profiles with 'perf
> > > > > > > > top'/'perf record/report' and CPU cache performance metric reporting
> > > > > > > > via 'perf stat -dddd' are your friend. See elsewhere in the thread
> > > > > > > > where I mention those things to Paul.
> > > > > > > 
> > > > > > > Dave, do you see a plausible way to eventually drop Ian's bandaid?
> > > > > > > I'm not asking for that to happen this cycle and for backports Ian's
> > > > > > > patch is obviously fine.
> > > > > > 
> > > > > > Yes, but not in the near term.
> > > > > > 
> > > > > > > What I really want to avoid is the situation when we are stuck with
> > > > > > > keeping that bandaid in fs/namei.c, since all ways to avoid seeing
> > > > > > > reused inodes would hurt XFS too badly.  And the benchmarks in this
> > > > > > > thread do look like that.
> > > > > > 
> > > > > > The simplest way I think is to have the XFS inode allocation track
> > > > > > "busy inodes" in the same way we track "busy extents". A busy extent
> > > > > > is an extent that has been freed by the user, but is not yet marked
> > > > > > free in the journal/on disk. If we try to reallocate that busy
> > > > > > extent, we either select a different free extent to allocate, or if
> > > > > > we can't find any we force the journal to disk, wait for it to
> > > > > > complete (hence unbusying the extents) and retry the allocation
> > > > > > again.
> > > > > > 
> > > > > > We can do something similar for inode allocation - it's actually a
> > > > > > lockless tag lookup on the radix tree entry for the candidate inode
> > > > > > number. If we find the reclaimable radix tree tag set, the we select
> > > > > > a different inode. If we can't allocate a new inode, then we kick
> > > > > > synchronize_rcu() and retry the allocation, allowing inodes to be
> > > > > > recycled this time.
> > > > > 
> > > > > I'm starting to poke around this area since it's become clear that the
> > > > > currently proposed scheme just involves too much latency (unless Paul
> > > > > chimes in with his expedited grace period variant, at which point I will
> > > > > revisit) in the fast allocation/recycle path. ISTM so far that a simple
> > > > > "skip inodes in the radix tree, sync rcu if unsuccessful" algorithm will
> > > > > have pretty much the same pattern of behavior as this patch: one
> > > > > synchronize_rcu() per batch.
> > > > 
> > > > Apologies for being slow, but there have been some distractions.
> > > > One of the distractions was trying to put together atheoretically
> > > > attractive but massively overcomplicated implementation of
> > > > poll_state_synchronize_rcu_expedited().  It currently looks like a
> > > > somewhat suboptimal but much simpler approach is available.  This
> > > > assumes that XFS is not in the picture until after both the scheduler
> > > > and workqueues are operational.
> > > > 
> > > 
> > > No worries.. I don't think that would be a roadblock for us. ;)
> > > 
> > > > And yes, the complicated version might prove necessary, but let's
> > > > see if this whole thing is even useful first.  ;-)
> > > > 
> > > 
> > > Indeed. This patch only really requires a single poll/sync pair of
> > > calls, so assuming the expedited grace period usage plays nice enough
> > > with typical !expedited usage elsewhere in the kernel for some basic
> > > tests, it would be fairly trivial to port this over and at least get an
> > > idea of what the worst case behavior might be with expedited grace
> > > periods, whether it satisfies the existing latency requirements, etc.
> > > 
> > > Brian
> > > 
> > > > In the meantime, if you want to look at an extremely unbaked view,
> > > > here you go:
> > > > 
> > > > https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
> > 
> > And here is a version that passes moderate rcutorture testing.  So no
> > obvious bugs.  Probably a few non-obvious ones, though!  ;-)
> > 
> > This commit is on -rcu's "dev" branch along with this rcutorture
> > addition:
> > 
> > cd7bd64af59f ("EXP rcutorture: Test polled expedited grace-period primitives")
> > 
> > I will carry these in -rcu's "dev" branch until at least the upcoming
> > merge window, fixing bugs as and when they becom apparent.  If I don't
> > hear otherwise by that time, I will create a tag for it and leave
> > it behind.
> > 
> > The backport to v5.17-rc2 just requires removing:
> > 
> > 	mutex_init(&rnp->boost_kthread_mutex);
> > 
> > From rcu_init_one().  This line is added by this -rcu commit:
> > 
> > 02a50b09c31f ("rcu: Add mutex for rcu boost kthread spawning and affinity setting")
> > 
> > Please let me know how it goes!
> > 
> 
> Thanks Paul. I gave this a whirl with a ported variant of this patch on
> top. There is definitely a notable improvement with the expedited grace
> periods. A few quick runs of the same batched alloc/free test (i.e. 10
> sample) I had run against the original version:
> 
> batch	baseline	baseline+bg	test	test+bg
> 
> 1	889954		210075		552911	25540
> 4	879540		212740		575356	24624
> 8	924928		213568		496992	26080
> 16	922960		211504		518496	24592
> 32	844832		219744		524672	28608
> 64	579968		196544		358720	24128
> 128	667392		195840		397696	22400
> 256	624896		197888		376320	31232
> 512	572928		204800		382464	46080
> 1024	549888		174080		379904	73728
> 2048	522240		174080		350208	106496
> 4096	536576		167936		360448	131072
> 
> So this shows a major improvement in the case where the system is
> otherwise idle. We still aren't quite at the baseline numbers, but
> that's not really the goal here because those numbers are partly driven
> by the fact that we unsafely reuse recently freed inodes in cases where
> proper behavior would be to allocate new inode chunks for a period of
> time. The core test numbers are much closer to the single threaded
> allocation rate (55k-65k inodes/sec) on this setup, so that is quite
> positive.
> 
> The "bg" variants are the same tests with 64 tasks doing unrelated
> pathwalk listings on a kernel source tree (on separate storage)
> concurrently in the background. The purpose of this was just to generate
> background (rcu) activity in the form of pathname lookups and whatnot
> and see how that impacts the results. This clearly affects both kernels,
> but the test kernel drops down closer to numbers reminiscent of the
> non-expedited grace period variant. Note that this impact seems to scale
> with increased background workload. With a similar test running only 8
> background tasks, the test kernel is pretty consistently in the
> 225k-250k (per 10s) range across the set of batch sizes. That's about
> half the core test rate, so still not as terrible as the original
> variant. ;)
> 
> In any event, this probably requires some thought/discussion (and more
> testing) on whether this is considered an acceptable change or whether
> we want to explore options to mitigate this further. I am still playing
> with some ideas to potentially mitigate grace period latency, so it
> might be worth seeing if anything useful falls out of that as well.
> Thoughts appreciated...

So this fixes a bug, but results in many 10s of percent performance
degradation?  Ouch...

Another approach is to use SLAB_TYPESAFE_BY_RCU.  This allows immediate
reuse of freed memory, but also requires pointer traversals to the memory
to do a revalidation operation.  (Sorry, no free lunch here!)

							Thanx, Paul

> Brian
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit dd896a86aebc5b225ceee13fcf1375c7542a5e2d
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Mon Jan 31 16:55:52 2022 -0800
> > 
> >     EXP rcu: Add polled expedited grace-period primitives
> >     
> >     This is an experimental proof of concept of polled expedited grace-period
> >     functions.  These functions are get_state_synchronize_rcu_expedited(),
> >     start_poll_synchronize_rcu_expedited(), poll_state_synchronize_rcu_expedited(),
> >     and cond_synchronize_rcu_expedited(), which are similar to
> >     get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
> >     poll_state_synchronize_rcu(), and cond_synchronize_rcu(), respectively.
> >     
> >     One limitation is that start_poll_synchronize_rcu_expedited() cannot
> >     be invoked before workqueues are initialized.
> >     
> >     Cc: Brian Foster <bfoster@redhat.com>
> >     Cc: Dave Chinner <david@fromorbit.com>
> >     Cc: Al Viro <viro@zeniv.linux.org.uk>
> >     Cc: Ian Kent <raven@themaw.net>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > index 858f4d429946d..ca139b4b2d25f 100644
> > --- a/include/linux/rcutiny.h
> > +++ b/include/linux/rcutiny.h
> > @@ -23,6 +23,26 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
> >  	might_sleep();
> >  }
> >  
> > +static inline unsigned long get_state_synchronize_rcu_expedited(void)
> > +{
> > +	return get_state_synchronize_rcu();
> > +}
> > +
> > +static inline unsigned long start_poll_synchronize_rcu_expedited(void)
> > +{
> > +	return start_poll_synchronize_rcu();
> > +}
> > +
> > +static inline bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
> > +{
> > +	return poll_state_synchronize_rcu(oldstate);
> > +}
> > +
> > +static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
> > +{
> > +	cond_synchronize_rcu(oldstate);
> > +}
> > +
> >  extern void rcu_barrier(void);
> >  
> >  static inline void synchronize_rcu_expedited(void)
> > diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> > index 76665db179fa1..eb774e9be21bf 100644
> > --- a/include/linux/rcutree.h
> > +++ b/include/linux/rcutree.h
> > @@ -40,6 +40,10 @@ bool rcu_eqs_special_set(int cpu);
> >  void rcu_momentary_dyntick_idle(void);
> >  void kfree_rcu_scheduler_running(void);
> >  bool rcu_gp_might_be_stalled(void);
> > +unsigned long get_state_synchronize_rcu_expedited(void);
> > +unsigned long start_poll_synchronize_rcu_expedited(void);
> > +bool poll_state_synchronize_rcu_expedited(unsigned long oldstate);
> > +void cond_synchronize_rcu_expedited(unsigned long oldstate);
> >  unsigned long get_state_synchronize_rcu(void);
> >  unsigned long start_poll_synchronize_rcu(void);
> >  bool poll_state_synchronize_rcu(unsigned long oldstate);
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index 24b5f2c2de87b..5b61cf20c91e9 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -23,6 +23,13 @@
> >  #define RCU_SEQ_CTR_SHIFT	2
> >  #define RCU_SEQ_STATE_MASK	((1 << RCU_SEQ_CTR_SHIFT) - 1)
> >  
> > +/*
> > + * Low-order bit definitions for polled grace-period APIs.
> > + */
> > +#define RCU_GET_STATE_FROM_EXPEDITED	0x1
> > +#define RCU_GET_STATE_USE_NORMAL	0x2
> > +#define RCU_GET_STATE_BAD_FOR_NORMAL	(RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL)
> > +
> >  /*
> >   * Return the counter portion of a sequence number previously returned
> >   * by rcu_seq_snap() or rcu_seq_current().
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index e6ad532cffe78..5de36abcd7da1 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3871,7 +3871,8 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
> >   */
> >  bool poll_state_synchronize_rcu(unsigned long oldstate)
> >  {
> > -	if (rcu_seq_done(&rcu_state.gp_seq, oldstate)) {
> > +	if (rcu_seq_done(&rcu_state.gp_seq, oldstate) &&
> > +	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL)) {
> >  		smp_mb(); /* Ensure GP ends before subsequent accesses. */
> >  		return true;
> >  	}
> > @@ -3900,7 +3901,8 @@ EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> >   */
> >  void cond_synchronize_rcu(unsigned long oldstate)
> >  {
> > -	if (!poll_state_synchronize_rcu(oldstate))
> > +	if (!poll_state_synchronize_rcu(oldstate) &&
> > +	    !WARN_ON_ONCE(oldstate & RCU_GET_STATE_BAD_FOR_NORMAL))
> >  		synchronize_rcu();
> >  }
> >  EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
> > @@ -4593,6 +4595,9 @@ static void __init rcu_init_one(void)
> >  			init_waitqueue_head(&rnp->exp_wq[3]);
> >  			spin_lock_init(&rnp->exp_lock);
> >  			mutex_init(&rnp->boost_kthread_mutex);
> > +			raw_spin_lock_init(&rnp->exp_poll_lock);
> > +			rnp->exp_seq_poll_rq = 0x1;
> > +			INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp);
> >  		}
> >  	}
> >  
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 926673ebe355f..19fc9acce3ce2 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -128,6 +128,10 @@ struct rcu_node {
> >  	wait_queue_head_t exp_wq[4];
> >  	struct rcu_exp_work rew;
> >  	bool exp_need_flush;	/* Need to flush workitem? */
> > +	raw_spinlock_t exp_poll_lock;
> > +				/* Lock and data for polled expedited grace periods. */
> > +	unsigned long exp_seq_poll_rq;
> > +	struct work_struct exp_poll_wq;
> >  } ____cacheline_internodealigned_in_smp;
> >  
> >  /*
> > @@ -476,3 +480,6 @@ static void rcu_iw_handler(struct irq_work *iwp);
> >  static void check_cpu_stall(struct rcu_data *rdp);
> >  static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
> >  				     const unsigned long gpssdelay);
> > +
> > +/* Forward declarations for tree_exp.h. */
> > +static void sync_rcu_do_polled_gp(struct work_struct *wp);
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 1a45667402260..728896f374fee 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -871,3 +871,154 @@ void synchronize_rcu_expedited(void)
> >  		destroy_work_on_stack(&rew.rew_work);
> >  }
> >  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
> > +
> > +/**
> > + * get_state_synchronize_rcu_expedited - Snapshot current expedited RCU state
> > + *
> > + * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
> > + * or poll_state_synchronize_rcu_expedited(), allowing them to determine
> > + * whether or not a full expedited grace period has elapsed in the meantime.
> > + */
> > +unsigned long get_state_synchronize_rcu_expedited(void)
> > +{
> > +	if (rcu_gp_is_normal())
> > +	return get_state_synchronize_rcu() |
> > +	       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
> > +
> > +	// Any prior manipulation of RCU-protected data must happen
> > +	// before the load from ->expedited_sequence.
> > +	smp_mb();  /* ^^^ */
> > +	return rcu_exp_gp_seq_snap() | RCU_GET_STATE_FROM_EXPEDITED;
> > +}
> > +EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_expedited);
> > +
> > +/*
> > + * Ensure that start_poll_synchronize_rcu_expedited() has the expedited
> > + * RCU grace periods that it needs.
> > + */
> > +static void sync_rcu_do_polled_gp(struct work_struct *wp)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_node *rnp = container_of(wp, struct rcu_node, exp_poll_wq);
> > +	unsigned long s;
> > +
> > +	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
> > +	s = rnp->exp_seq_poll_rq;
> > +	rnp->exp_seq_poll_rq |= 0x1;
> > +	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
> > +	if (s & 0x1)
> > +		return;
> > +	while (!sync_exp_work_done(s))
> > +		synchronize_rcu_expedited();
> > +	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
> > +	s = rnp->exp_seq_poll_rq;
> > +	if (!(s & 0x1) && !sync_exp_work_done(s))
> > +		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
> > +	else
> > +		rnp->exp_seq_poll_rq |= 0x1;
> > +	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
> > +}
> > +
> > +/**
> > + * start_poll_synchronize_rcu_expedited - Snapshot current expedited RCU state and start grace period
> > + *
> > + * Returns a cookie to pass to a call to cond_synchronize_rcu_expedited()
> > + * or poll_state_synchronize_rcu_expedited(), allowing them to determine
> > + * whether or not a full expedited grace period has elapsed in the meantime.
> > + * If the needed grace period is not already slated to start, initiates
> > + * that grace period.
> > + */
> > +
> > +unsigned long start_poll_synchronize_rcu_expedited(void)
> > +{
> > +	unsigned long flags;
> > +	struct rcu_data *rdp;
> > +	struct rcu_node *rnp;
> > +	unsigned long s;
> > +
> > +	if (rcu_gp_is_normal())
> > +		return start_poll_synchronize_rcu_expedited() |
> > +		       RCU_GET_STATE_FROM_EXPEDITED | RCU_GET_STATE_USE_NORMAL;
> > +
> > +	s = rcu_exp_gp_seq_snap();
> > +	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
> > +	rnp = rdp->mynode;
> > +	raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
> > +	if ((rnp->exp_seq_poll_rq & 0x1) || ULONG_CMP_LT(rnp->exp_seq_poll_rq, s)) {
> > +		rnp->exp_seq_poll_rq = s;
> > +		queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
> > +	}
> > +	raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
> > +
> > +	return s | RCU_GET_STATE_FROM_EXPEDITED;
> > +}
> > +EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
> > +
> > +/**
> > + * poll_state_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
> > + *
> > + * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
> > + *
> > + * If a full expedited RCU grace period has elapsed since the earlier call
> > + * from which oldstate was obtained, return @true, otherwise return @false.
> > + * If @false is returned, it is the caller's responsibility to invoke
> > + * this function later on until it does return @true.  Alternatively,
> > + * the caller can explicitly wait for a grace period, for example, by
> > + * passing @oldstate to cond_synchronize_rcu_expedited() or by directly
> > + * invoking synchronize_rcu_expedited().
> > + *
> > + * Yes, this function does not take counter wrap into account.
> > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > + * Those needing to keep oldstate values for very long time periods
> > + * (several hours even on 32-bit systems) should check them occasionally
> > + * and either refresh them or set a flag indicating that the grace period
> > + * has completed.
> > + *
> > + * This function provides the same memory-ordering guarantees that would
> > + * be provided by a synchronize_rcu_expedited() that was invoked at the
> > + * call to the function that provided @oldstate, and that returned at the
> > + * end of this function.
> > + */
> > +bool poll_state_synchronize_rcu_expedited(unsigned long oldstate)
> > +{
> > +	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
> > +	if (oldstate & RCU_GET_STATE_USE_NORMAL)
> > +		return poll_state_synchronize_rcu(oldstate & ~RCU_GET_STATE_BAD_FOR_NORMAL);
> > +	if (!rcu_exp_gp_seq_done(oldstate & ~RCU_SEQ_STATE_MASK))
> > +		return false;
> > +	smp_mb(); /* Ensure GP ends before subsequent accesses. */
> > +	return true;
> > +}
> > +EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu_expedited);
> > +
> > +/**
> > + * cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
> > + *
> > + * @oldstate: value from get_state_synchronize_rcu_expedited() or start_poll_synchronize_rcu_expedited()
> > + *
> > + * If a full expedited RCU grace period has elapsed since the earlier
> > + * call from which oldstate was obtained, just return.  Otherwise, invoke
> > + * synchronize_rcu_expedited() to wait for a full grace period.
> > + *
> > + * Yes, this function does not take counter wrap into account.  But
> > + * counter wrap is harmless.  If the counter wraps, we have waited for
> > + * more than 2 billion grace periods (and way more on a 64-bit system!),
> > + * so waiting for one additional grace period should be just fine.
> > + *
> > + * This function provides the same memory-ordering guarantees that would
> > + * be provided by a synchronize_rcu_expedited() that was invoked at the
> > + * call to the function that provided @oldstate, and that returned at the
> > + * end of this function.
> > + */
> > +void cond_synchronize_rcu_expedited(unsigned long oldstate)
> > +{
> > +	WARN_ON_ONCE(!(oldstate & RCU_GET_STATE_FROM_EXPEDITED));
> > +	if (poll_state_synchronize_rcu_expedited(oldstate))
> > +		return;
> > +	if (oldstate & RCU_GET_STATE_USE_NORMAL)
> > +		synchronize_rcu_expedited();
> > +	else
> > +		synchronize_rcu();
> > +}
> > +EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
> > 
>
Dave Chinner Feb. 10, 2022, 4:09 a.m. UTC | #34
On Mon, Feb 07, 2022 at 08:36:21AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 07, 2022 at 08:30:03AM -0500, Brian Foster wrote:
> Another approach is to use SLAB_TYPESAFE_BY_RCU.  This allows immediate
> reuse of freed memory, but also requires pointer traversals to the memory
> to do a revalidation operation.  (Sorry, no free lunch here!)

Can't do that with inodes - newly allocated/reused inodes have to go
through inode_init_always() which is the very function that causes
the problems we have now with path-walk tripping over inodes in an
intermediate re-initialised state because we recycled it inside a
RCU grace period.

Cheers,

Dave.
Paul E. McKenney Feb. 10, 2022, 5:45 a.m. UTC | #35
On Thu, Feb 10, 2022 at 03:09:17PM +1100, Dave Chinner wrote:
> On Mon, Feb 07, 2022 at 08:36:21AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 07, 2022 at 08:30:03AM -0500, Brian Foster wrote:
> > Another approach is to use SLAB_TYPESAFE_BY_RCU.  This allows immediate
> > reuse of freed memory, but also requires pointer traversals to the memory
> > to do a revalidation operation.  (Sorry, no free lunch here!)
> 
> Can't do that with inodes - newly allocated/reused inodes have to go
> through inode_init_always() which is the very function that causes
> the problems we have now with path-walk tripping over inodes in an
> intermediate re-initialised state because we recycled it inside a
> RCU grace period.

So not just no free lunch, but this is also not a lunch that is consistent
with the code's dietary restrictions.

From what you said earlier in this thread, I am guessing that you have
some other fix in mind.

							Thanx, Paul
Brian Foster Feb. 10, 2022, 8:47 p.m. UTC | #36
On Wed, Feb 09, 2022 at 09:45:44PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 10, 2022 at 03:09:17PM +1100, Dave Chinner wrote:
> > On Mon, Feb 07, 2022 at 08:36:21AM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 07, 2022 at 08:30:03AM -0500, Brian Foster wrote:
> > > Another approach is to use SLAB_TYPESAFE_BY_RCU.  This allows immediate
> > > reuse of freed memory, but also requires pointer traversals to the memory
> > > to do a revalidation operation.  (Sorry, no free lunch here!)
> > 
> > Can't do that with inodes - newly allocated/reused inodes have to go
> > through inode_init_always() which is the very function that causes
> > the problems we have now with path-walk tripping over inodes in an
> > intermediate re-initialised state because we recycled it inside a
> > RCU grace period.
> 
> So not just no free lunch, but this is also not a lunch that is consistent
> with the code's dietary restrictions.
> 
> From what you said earlier in this thread, I am guessing that you have
> some other fix in mind.
> 

Yeah.. I've got an experiment running that essentially tracks pending
inode grace period cookies and attempts to avoid them at allocation
time. It's crude atm, but the initial numbers I see aren't that far off
from the results produced by your expedited grace period mechanism. I
see numbers mostly in the 40-50k cycles per second ballpark. This is
somewhat expected because the current baseline behavior relies on unsafe
reuse of inodes before a grace period has elapsed. We have to rely on
more physical allocations to get around this, so the small batch
alloc/free patterns simply won't be able to spin as fast. The difference
I do see with this sort of explicit gp tracking is that the results
remain much closer to the baseline kernel when background activity is
ramped up.

However, one of the things I'd like to experiment with is whether the
combination of this approach and expedited grace periods provides any
sort of opportunity for further optimization. For example, if we can
identify that a grace period has elapsed between the time of
->destroy_inode() and when the queue processing ultimately marks the
inode reclaimable, that might allow for some optimized allocation
behavior. I see this occur occasionally with normal grace periods, but
not quite frequent enough to make a difference.

What I observe right now is that the same test above runs at much closer
to the baseline numbers when using the ikeep mount option, so I may need
to look into ways to mitigate the chunk allocation overhead..

Brian

> 							Thanx, Paul
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index d019c98eb839..4931daa45ca4 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -349,6 +349,16 @@  xfs_iget_recycle(
 	spin_unlock(&ip->i_flags_lock);
 	rcu_read_unlock();
 
+	/*
+	 * VFS RCU pathwalk lookups dictate the same lifecycle rules for an
+	 * inode recycle as for freeing an inode. I.e., we cannot repurpose the
+	 * inode until a grace period has elapsed from the time the previous
+	 * version of the inode was destroyed. In most cases a grace period has
+	 * already elapsed if the inode was (deferred) inactivated, but
+	 * synchronize here as a last resort to guarantee correctness.
+	 */
+	cond_synchronize_rcu(ip->i_destroy_gp);
+
 	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 	error = xfs_reinit_inode(mp, inode);
 	if (error) {
@@ -2019,6 +2029,7 @@  xfs_inodegc_queue(
 	trace_xfs_inode_set_need_inactive(ip);
 	spin_lock(&ip->i_flags_lock);
 	ip->i_flags |= XFS_NEED_INACTIVE;
+	ip->i_destroy_gp = start_poll_synchronize_rcu();
 	spin_unlock(&ip->i_flags_lock);
 
 	gc = get_cpu_ptr(mp->m_inodegc);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c447bf04205a..2153e3edbb86 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -40,8 +40,9 @@  typedef struct xfs_inode {
 	/* Transaction and locking information. */
 	struct xfs_inode_log_item *i_itemp;	/* logging information */
 	mrlock_t		i_lock;		/* inode lock */
-	atomic_t		i_pincount;	/* inode pin count */
 	struct llist_node	i_gclist;	/* deferred inactivation list */
+	unsigned long		i_destroy_gp;	/* destroy rcugp cookie */
+	atomic_t		i_pincount;	/* inode pin count */
 
 	/*
 	 * Bitsets of inode metadata that have been checked and/or are sick.