diff mbox series

[v11,2/2] x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node

Message ID 20211103012813.670195-2-jarkko@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v11,1/2] x86/sgx: Rename fallback labels in sgx_init() | expand

Commit Message

Jarkko Sakkinen Nov. 3, 2021, 1:28 a.m. UTC
The amount of SGX memory on the system is determined by the BIOS and it
varies wildly between systems.  It can be from dozens of MB's on desktops
or VM's, up to many GB's on servers.  Just like for regular memory, it is
sometimes useful to know the amount of usable SGX memory in the system.

Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch
specific attribute group, and add an attribute for the amount of SGX
memory in bytes to each NUMA node:

/sys/devices/system/node/nodeX/x86/sgx_total_bytes

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v11:
* Fix documentation and the commit message.

v10:
* Change DEVICE_ATTR_RO() to static (Greg K-H)
* Change the attribute name as sgx_total_bytes, and attribute group
  name as "x86" (Dave).
* Add a new config flag HAVE_ARCH_NODE_DEV_GROUP to identify, whether
  an arch exports arch specific attribute group for NUMA nodes.

v9:
* Fix racy initialization of sysfs attributes:
  https://lore.kernel.org/linux-sgx/YXOsx8SvFJV5R7lU@kroah.com/

v8:
* Fix a bug in sgx_numa_init(): node->dev should be only set after
  sysfe_create_group().  Otherwise, sysfs_remove_group() will issue a
  warning in sgx_numa_exit(), when sgx_create_group() is unsuccessful,
  because the group does not exist.

v7:
* Shorten memory_size to size. The prefix makes the name only longer
  but does not clarify things more than "size" would.
* Use device_attribute instead of kobj_attribute.
* Use named attribute group instead of creating raw kobject just for
  the "sgx" subdirectory.

v6:
* Initialize node->size to zero in sgx_setup_epc_section(), when the
  node is first accessed.

v5
* A new patch based on the discussion on
  https://lore.kernel.org/linux-sgx/3a7cab4115b4f902f3509ad8652e616b91703e1d.camel@kernel.org/T/#t
---
 Documentation/ABI/stable/sysfs-devices-node |  6 ++++
 arch/Kconfig                                |  4 +++
 arch/x86/Kconfig                            |  1 +
 arch/x86/kernel/cpu/sgx/main.c              | 31 +++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h               |  2 ++
 drivers/base/node.c                         | 13 ++++++++-
 include/linux/numa.h                        |  4 +++
 7 files changed, 60 insertions(+), 1 deletion(-)

Comments

Greg KH Nov. 3, 2021, 8:22 a.m. UTC | #1
On Wed, Nov 03, 2021 at 03:28:13AM +0200, Jarkko Sakkinen wrote:
> The amount of SGX memory on the system is determined by the BIOS and it
> varies wildly between systems.  It can be from dozens of MB's on desktops
> or VM's, up to many GB's on servers.  Just like for regular memory, it is
> sometimes useful to know the amount of usable SGX memory in the system.
> 
> Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch
> specific attribute group, and add an attribute for the amount of SGX
> memory in bytes to each NUMA node:
> 
> /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> ---
> v11:
> * Fix documentation and the commit message.
> 
> v10:
> * Change DEVICE_ATTR_RO() to static (Greg K-H)
> * Change the attribute name as sgx_total_bytes, and attribute group
>   name as "x86" (Dave).
> * Add a new config flag HAVE_ARCH_NODE_DEV_GROUP to identify, whether
>   an arch exports arch specific attribute group for NUMA nodes.
> 
> v9:
> * Fix racy initialization of sysfs attributes:
>   https://lore.kernel.org/linux-sgx/YXOsx8SvFJV5R7lU@kroah.com/
> 
> v8:
> * Fix a bug in sgx_numa_init(): node->dev should be only set after
>   sysfe_create_group().  Otherwise, sysfs_remove_group() will issue a
>   warning in sgx_numa_exit(), when sgx_create_group() is unsuccessful,
>   because the group does not exist.
> 
> v7:
> * Shorten memory_size to size. The prefix makes the name only longer
>   but does not clarify things more than "size" would.
> * Use device_attribute instead of kobj_attribute.
> * Use named attribute group instead of creating raw kobject just for
>   the "sgx" subdirectory.
> 
> v6:
> * Initialize node->size to zero in sgx_setup_epc_section(), when the
>   node is first accessed.
> 
> v5
> * A new patch based on the discussion on
>   https://lore.kernel.org/linux-sgx/3a7cab4115b4f902f3509ad8652e616b91703e1d.camel@kernel.org/T/#t
> ---
>  Documentation/ABI/stable/sysfs-devices-node |  6 ++++
>  arch/Kconfig                                |  4 +++
>  arch/x86/Kconfig                            |  1 +
>  arch/x86/kernel/cpu/sgx/main.c              | 31 +++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h               |  2 ++
>  drivers/base/node.c                         | 13 ++++++++-
>  include/linux/numa.h                        |  4 +++
>  7 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 484fc04bcc25..8db67aa472f1 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -176,3 +176,9 @@ Contact:	Keith Busch <keith.busch@intel.com>
>  Description:
>  		The cache write policy: 0 for write-back, 1 for write-through,
>  		other or unknown.
> +
> +What:		/sys/devices/system/node/nodeX/x86/sgx_total_bytes
> +Date:		November 2021
> +Contact:	Jarkko Sakkinen <jarkko@kernel.org>
> +Description:
> +		The total amount of SGX physical memory in bytes.
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 98db63496bab..ca5d75b5a427 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1285,6 +1285,10 @@ config ARCH_HAS_ELFCORE_COMPAT
>  config ARCH_HAS_PARANOID_L1D_FLUSH
>  	bool
>  
> +# Select, if arch has a named attribute group bound to NUMA device nodes.
> +config HAVE_ARCH_NODE_DEV_GROUP
> +	bool
> +
>  source "kernel/gcov/Kconfig"
>  
>  source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 421fa9e38c60..8503c3bdf63f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -266,6 +266,7 @@ config X86
>  	select HAVE_ARCH_KCSAN			if X86_64
>  	select X86_FEATURE_NAMES		if PROC_FS
>  	select PROC_PID_ARCH_STATUS		if PROC_FS
> +	select HAVE_ARCH_NODE_DEV_GROUP		if X86_SGX
>  	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
>  
>  config INSTRUCTION_DECODER
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index a6e313f1a82d..02ebb233c511 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -714,9 +714,12 @@ static bool __init sgx_page_cache_init(void)
>  			spin_lock_init(&sgx_numa_nodes[nid].lock);
>  			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
>  			node_set(nid, sgx_numa_mask);
> +			sgx_numa_nodes[nid].size = 0;
>  		}
>  
>  		sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
> +		sgx_numa_nodes[nid].dev = &node_devices[nid]->dev;

You are saving off a pointer to an object without incrementing the
reference count of it?  What prevents it from going away in the future?

> +		sgx_numa_nodes[nid].size += size;
>  
>  		sgx_nr_epc_sections++;
>  	}
> @@ -790,6 +793,34 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
>  }
>  EXPORT_SYMBOL_GPL(sgx_set_attribute);
>  
> +#ifdef CONFIG_NUMA
> +static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	unsigned long size = 0;
> +	int nid;
> +
> +	for (nid = 0; nid < num_possible_nodes(); nid++) {
> +		if (dev == sgx_numa_nodes[nid].dev) {

Why aren't these values all just part of the device that is being used
here?  You are walking some odd array, with no locking, and no reference
counting on the objects you are looking at, just to find the same object
that you started out with???

That's not the proper thing to do here at all, these values should be
part of the device structure that this attribute lives on, in order to
properly handle all of these reference counting and locking issues
automatically.

Please fix the design of this code first, _before_ adding new
attributes.

thanks,

greg k-h
Jarkko Sakkinen Nov. 3, 2021, 9:49 p.m. UTC | #2
On Wed, 2021-11-03 at 09:22 +0100, Greg Kroah-Hartman wrote:
> On Wed, Nov 03, 2021 at 03:28:13AM +0200, Jarkko Sakkinen wrote:
> > The amount of SGX memory on the system is determined by the BIOS and it
> > varies wildly between systems.  It can be from dozens of MB's on desktops
> > or VM's, up to many GB's on servers.  Just like for regular memory, it is
> > sometimes useful to know the amount of usable SGX memory in the system.
> > 
> > Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch
> > specific attribute group, and add an attribute for the amount of SGX
> > memory in bytes to each NUMA node:
> > 
> > /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> > 
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > ---
> > v11:
> > * Fix documentation and the commit message.
> > 
> > v10:
> > * Change DEVICE_ATTR_RO() to static (Greg K-H)
> > * Change the attribute name as sgx_total_bytes, and attribute group
> >   name as "x86" (Dave).
> > * Add a new config flag HAVE_ARCH_NODE_DEV_GROUP to identify, whether
> >   an arch exports arch specific attribute group for NUMA nodes.
> > 
> > v9:
> > * Fix racy initialization of sysfs attributes:
> >   https://lore.kernel.org/linux-sgx/YXOsx8SvFJV5R7lU@kroah.com/
> > 
> > v8:
> > * Fix a bug in sgx_numa_init(): node->dev should be only set after
> >   sysfe_create_group().  Otherwise, sysfs_remove_group() will issue a
> >   warning in sgx_numa_exit(), when sgx_create_group() is unsuccessful,
> >   because the group does not exist.
> > 
> > v7:
> > * Shorten memory_size to size. The prefix makes the name only longer
> >   but does not clarify things more than "size" would.
> > * Use device_attribute instead of kobj_attribute.
> > * Use named attribute group instead of creating raw kobject just for
> >   the "sgx" subdirectory.
> > 
> > v6:
> > * Initialize node->size to zero in sgx_setup_epc_section(), when the
> >   node is first accessed.
> > 
> > v5
> > * A new patch based on the discussion on
> >   https://lore.kernel.org/linux-sgx/3a7cab4115b4f902f3509ad8652e616b91703e1d.camel@kernel.org/T/#t
> > ---
> >  Documentation/ABI/stable/sysfs-devices-node |  6 ++++
> >  arch/Kconfig                                |  4 +++
> >  arch/x86/Kconfig                            |  1 +
> >  arch/x86/kernel/cpu/sgx/main.c              | 31 +++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/sgx.h               |  2 ++
> >  drivers/base/node.c                         | 13 ++++++++-
> >  include/linux/numa.h                        |  4 +++
> >  7 files changed, 60 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> > index 484fc04bcc25..8db67aa472f1 100644
> > --- a/Documentation/ABI/stable/sysfs-devices-node
> > +++ b/Documentation/ABI/stable/sysfs-devices-node
> > @@ -176,3 +176,9 @@ Contact:    Keith Busch <keith.busch@intel.com>
> >  Description:
> >                 The cache write policy: 0 for write-back, 1 for write-through,
> >                 other or unknown.
> > +
> > +What:          /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> > +Date:          November 2021
> > +Contact:       Jarkko Sakkinen <jarkko@kernel.org>
> > +Description:
> > +               The total amount of SGX physical memory in bytes.
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 98db63496bab..ca5d75b5a427 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1285,6 +1285,10 @@ config ARCH_HAS_ELFCORE_COMPAT
> >  config ARCH_HAS_PARANOID_L1D_FLUSH
> >         bool
> >  
> > +# Select, if arch has a named attribute group bound to NUMA device nodes.
> > +config HAVE_ARCH_NODE_DEV_GROUP
> > +       bool
> > +
> >  source "kernel/gcov/Kconfig"
> >  
> >  source "scripts/gcc-plugins/Kconfig"
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 421fa9e38c60..8503c3bdf63f 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -266,6 +266,7 @@ config X86
> >         select HAVE_ARCH_KCSAN                  if X86_64
> >         select X86_FEATURE_NAMES                if PROC_FS
> >         select PROC_PID_ARCH_STATUS             if PROC_FS
> > +       select HAVE_ARCH_NODE_DEV_GROUP         if X86_SGX
> >         imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
> >  
> >  config INSTRUCTION_DECODER
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > index a6e313f1a82d..02ebb233c511 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -714,9 +714,12 @@ static bool __init sgx_page_cache_init(void)
> >                         spin_lock_init(&sgx_numa_nodes[nid].lock);
> >                         INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
> >                         node_set(nid, sgx_numa_mask);
> > +                       sgx_numa_nodes[nid].size = 0;
> >                 }
> >  
> >                 sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
> > +               sgx_numa_nodes[nid].dev = &node_devices[nid]->dev;
> 
> You are saving off a pointer to an object without incrementing the
> reference count of it?  What prevents it from going away in the future?

Since the arch specific attribute group is part of the "groups" of that device,
I'd presume that when sgx_total_bytes_show() is called the device has not gone
away:

static const struct attribute_group *node_dev_groups[] = {
	&node_dev_group,
#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
	&arch_node_dev_group,
#endif
	NULL,
};

I mean the "dev" parameter in sgx_total_bytes_show() is probably valid, when
sysfs framework calls it, right?

> > +               sgx_numa_nodes[nid].size += size;
> >  
> >                 sgx_nr_epc_sections++;
> >         }
> > @@ -790,6 +793,34 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
> >  }
> >  EXPORT_SYMBOL_GPL(sgx_set_attribute);
> >  
> > +#ifdef CONFIG_NUMA
> > +static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
> > +{
> > +       unsigned long size = 0;
> > +       int nid;
> > +
> > +       for (nid = 0; nid < num_possible_nodes(); nid++) {
> > +               if (dev == sgx_numa_nodes[nid].dev) {
> 
> Why aren't these values all just part of the device that is being used
> here?  You are walking some odd array, with no locking, and no reference
> counting on the objects you are looking at, just to find the same object
> that you started out with???

This code looks up the struct sgx_numa_node instance, and the array is
indexed with NUMA node ID. "dev" is only used as lookup key.

> That's not the proper thing to do here at all, these values should be
> part of the device structure that this attribute lives on, in order to
> properly handle all of these reference counting and locking issues
> automatically.

The attribute group is part of the device structure, so in that sense
it is already like this. And I think that all NUMA node specific data
(most of which is unrelated to device sysfs) is best kept in the private
struct.

> Please fix the design of this code first, _before_ adding new
> attributes.

I sincerely think it is correct now.

> 
> thanks,
> 
> greg k-h

/Jarkko
Jarkko Sakkinen Nov. 3, 2021, 9:54 p.m. UTC | #3
On Wed, 2021-11-03 at 23:49 +0200, Jarkko Sakkinen wrote:
> On Wed, 2021-11-03 at 09:22 +0100, Greg Kroah-Hartman wrote:
> > That's not the proper thing to do here at all, these values should be
> > part of the device structure that this attribute lives on, in order to
> > properly handle all of these reference counting and locking issues
> > automatically.
> 
> The attribute group is part of the device structure, so in that sense
> it is already like this. And I think that all NUMA node specific data
> (most of which is unrelated to device sysfs) is best kept in the private
> struct.

If you are concerned that the SGX driver might go away when NUMA code
calls it, I can address that concern too.

The memory management code if arch/x86/kernel/cpu/sgx is not associated
with the user space facing driver. E.g. you can use memory manager with
KVM without having the driver even enabled in the kernel.

This include the numa code in the SGX subsystem. It's life-cycle is the
same as the power-cycle. It is guaranteed that any possible time during
the power cycle, when that sysfs callback is called, all the data is valid.
That's why it is safe to be part of the attribute groups of the NUMA
device.


/Jarkko
Greg KH Nov. 4, 2021, 7:25 a.m. UTC | #4
On Wed, Nov 03, 2021 at 11:49:40PM +0200, Jarkko Sakkinen wrote:
> On Wed, 2021-11-03 at 09:22 +0100, Greg Kroah-Hartman wrote:
> > On Wed, Nov 03, 2021 at 03:28:13AM +0200, Jarkko Sakkinen wrote:
> > > The amount of SGX memory on the system is determined by the BIOS and it
> > > varies wildly between systems.  It can be from dozens of MB's on desktops
> > > or VM's, up to many GB's on servers.  Just like for regular memory, it is
> > > sometimes useful to know the amount of usable SGX memory in the system.
> > > 
> > > Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an arch
> > > specific attribute group, and add an attribute for the amount of SGX
> > > memory in bytes to each NUMA node:
> > > 
> > > /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> > > 
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > ---
> > > v11:
> > > * Fix documentation and the commit message.
> > > 
> > > v10:
> > > * Change DEVICE_ATTR_RO() to static (Greg K-H)
> > > * Change the attribute name as sgx_total_bytes, and attribute group
> > >   name as "x86" (Dave).
> > > * Add a new config flag HAVE_ARCH_NODE_DEV_GROUP to identify, whether
> > >   an arch exports arch specific attribute group for NUMA nodes.
> > > 
> > > v9:
> > > * Fix racy initialization of sysfs attributes:
> > >   https://lore.kernel.org/linux-sgx/YXOsx8SvFJV5R7lU@kroah.com/
> > > 
> > > v8:
> > > * Fix a bug in sgx_numa_init(): node->dev should be only set after
> > >   sysfe_create_group().  Otherwise, sysfs_remove_group() will issue a
> > >   warning in sgx_numa_exit(), when sgx_create_group() is unsuccessful,
> > >   because the group does not exist.
> > > 
> > > v7:
> > > * Shorten memory_size to size. The prefix makes the name only longer
> > >   but does not clarify things more than "size" would.
> > > * Use device_attribute instead of kobj_attribute.
> > > * Use named attribute group instead of creating raw kobject just for
> > >   the "sgx" subdirectory.
> > > 
> > > v6:
> > > * Initialize node->size to zero in sgx_setup_epc_section(), when the
> > >   node is first accessed.
> > > 
> > > v5
> > > * A new patch based on the discussion on
> > >   https://lore.kernel.org/linux-sgx/3a7cab4115b4f902f3509ad8652e616b91703e1d.camel@kernel.org/T/#t
> > > ---
> > >  Documentation/ABI/stable/sysfs-devices-node |  6 ++++
> > >  arch/Kconfig                                |  4 +++
> > >  arch/x86/Kconfig                            |  1 +
> > >  arch/x86/kernel/cpu/sgx/main.c              | 31 +++++++++++++++++++++
> > >  arch/x86/kernel/cpu/sgx/sgx.h               |  2 ++
> > >  drivers/base/node.c                         | 13 ++++++++-
> > >  include/linux/numa.h                        |  4 +++
> > >  7 files changed, 60 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> > > index 484fc04bcc25..8db67aa472f1 100644
> > > --- a/Documentation/ABI/stable/sysfs-devices-node
> > > +++ b/Documentation/ABI/stable/sysfs-devices-node
> > > @@ -176,3 +176,9 @@ Contact:    Keith Busch <keith.busch@intel.com>
> > >  Description:
> > >                 The cache write policy: 0 for write-back, 1 for write-through,
> > >                 other or unknown.
> > > +
> > > +What:          /sys/devices/system/node/nodeX/x86/sgx_total_bytes
> > > +Date:          November 2021
> > > +Contact:       Jarkko Sakkinen <jarkko@kernel.org>
> > > +Description:
> > > +               The total amount of SGX physical memory in bytes.
> > > diff --git a/arch/Kconfig b/arch/Kconfig
> > > index 98db63496bab..ca5d75b5a427 100644
> > > --- a/arch/Kconfig
> > > +++ b/arch/Kconfig
> > > @@ -1285,6 +1285,10 @@ config ARCH_HAS_ELFCORE_COMPAT
> > >  config ARCH_HAS_PARANOID_L1D_FLUSH
> > >         bool
> > >  
> > > +# Select, if arch has a named attribute group bound to NUMA device nodes.
> > > +config HAVE_ARCH_NODE_DEV_GROUP
> > > +       bool
> > > +
> > >  source "kernel/gcov/Kconfig"
> > >  
> > >  source "scripts/gcc-plugins/Kconfig"
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index 421fa9e38c60..8503c3bdf63f 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -266,6 +266,7 @@ config X86
> > >         select HAVE_ARCH_KCSAN                  if X86_64
> > >         select X86_FEATURE_NAMES                if PROC_FS
> > >         select PROC_PID_ARCH_STATUS             if PROC_FS
> > > +       select HAVE_ARCH_NODE_DEV_GROUP         if X86_SGX
> > >         imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
> > >  
> > >  config INSTRUCTION_DECODER
> > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> > > index a6e313f1a82d..02ebb233c511 100644
> > > --- a/arch/x86/kernel/cpu/sgx/main.c
> > > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > > @@ -714,9 +714,12 @@ static bool __init sgx_page_cache_init(void)
> > >                         spin_lock_init(&sgx_numa_nodes[nid].lock);
> > >                         INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
> > >                         node_set(nid, sgx_numa_mask);
> > > +                       sgx_numa_nodes[nid].size = 0;
> > >                 }
> > >  
> > >                 sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
> > > +               sgx_numa_nodes[nid].dev = &node_devices[nid]->dev;
> > 
> > You are saving off a pointer to an object without incrementing the
> > reference count of it?  What prevents it from going away in the future?
> 
> Since the arch specific attribute group is part of the "groups" of that device,
> I'd presume that when sgx_total_bytes_show() is called the device has not gone
> away:
> 
> static const struct attribute_group *node_dev_groups[] = {
> 	&node_dev_group,
> #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
> 	&arch_node_dev_group,
> #endif
> 	NULL,
> };

Yes, that is true for the dev pointer passed to your callback, but what
about the dev pointers in this random array you are looping over?

> I mean the "dev" parameter in sgx_total_bytes_show() is probably valid, when
> sysfs framework calls it, right?

Yes.

> > > +               sgx_numa_nodes[nid].size += size;
> > >  
> > >                 sgx_nr_epc_sections++;
> > >         }
> > > @@ -790,6 +793,34 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
> > >  }
> > >  EXPORT_SYMBOL_GPL(sgx_set_attribute);
> > >  
> > > +#ifdef CONFIG_NUMA
> > > +static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
> > > +{
> > > +       unsigned long size = 0;
> > > +       int nid;
> > > +
> > > +       for (nid = 0; nid < num_possible_nodes(); nid++) {
> > > +               if (dev == sgx_numa_nodes[nid].dev) {
> > 
> > Why aren't these values all just part of the device that is being used
> > here?  You are walking some odd array, with no locking, and no reference
> > counting on the objects you are looking at, just to find the same object
> > that you started out with???
> 
> This code looks up the struct sgx_numa_node instance, and the array is
> indexed with NUMA node ID. "dev" is only used as lookup key.

And why isn't that pointer/key properly reference counted?

What happens when the memory numa node id goes away?  Who removes that
item from the list?

> > That's not the proper thing to do here at all, these values should be
> > part of the device structure that this attribute lives on, in order to
> > properly handle all of these reference counting and locking issues
> > automatically.
> 
> The attribute group is part of the device structure, so in that sense
> it is already like this. And I think that all NUMA node specific data
> (most of which is unrelated to device sysfs) is best kept in the private
> struct.

No, the data associated with a specific numa node needs to be IN the
structure that it belongs to, not in a random other data structure that
is not locked or reference counted at all.  That is not ok.

> > Please fix the design of this code first, _before_ adding new
> > attributes.
> 
> I sincerely think it is correct now.

I disagree.  Where is the locking and reference count logic that keeps
these different structures in sync properly?  Why are there different
structures at all?

Yes, you are just working with what you have now, but you have exposed
that what we have now is incorrect, and trying to use it in the manner
you are wanting to here is also incorrect.

thanks,

greg k-h
Jarkko Sakkinen Nov. 5, 2021, 10:14 p.m. UTC | #5
On Thu, 2021-11-04 at 08:25 +0100, Greg Kroah-Hartman wrote:
> > static const struct attribute_group *node_dev_groups[] = {
> >         &node_dev_group,
> > #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
> >         &arch_node_dev_group,
> > #endif
> >         NULL,
> > };
> 
> Yes, that is true for the dev pointer passed to your callback, but what
> about the dev pointers in this random array you are looping over?

Right. I got what you are saying.

I think the most legit place to mark an entry in this array would be
just *before* device_register() in register_node(). It's different from
hugetlb_register_node() because hugetlb code adds its attribute group
with sysfs_create_group().

Similarly, the legit place to unmark an entry would be in
unregister_node(), right after device_unregister().

After writing this I realized something: the device ID is the same
as NUMA node ID. This means that I can rewrite my callback as

static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	unsigned long size = 0;
	int nid = dev->id;

	return sysfs_emit(buf, "%lu\n", sgx_numa_nodes[dev->id].size);
}

I.e no need to maintain a device pointer.

/Jarkko
diff mbox series

Patch

diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index 484fc04bcc25..8db67aa472f1 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -176,3 +176,9 @@  Contact:	Keith Busch <keith.busch@intel.com>
 Description:
 		The cache write policy: 0 for write-back, 1 for write-through,
 		other or unknown.
+
+What:		/sys/devices/system/node/nodeX/x86/sgx_total_bytes
+Date:		November 2021
+Contact:	Jarkko Sakkinen <jarkko@kernel.org>
+Description:
+		The total amount of SGX physical memory in bytes.
diff --git a/arch/Kconfig b/arch/Kconfig
index 98db63496bab..ca5d75b5a427 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1285,6 +1285,10 @@  config ARCH_HAS_ELFCORE_COMPAT
 config ARCH_HAS_PARANOID_L1D_FLUSH
 	bool
 
+# Select, if arch has a named attribute group bound to NUMA device nodes.
+config HAVE_ARCH_NODE_DEV_GROUP
+	bool
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 421fa9e38c60..8503c3bdf63f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -266,6 +266,7 @@  config X86
 	select HAVE_ARCH_KCSAN			if X86_64
 	select X86_FEATURE_NAMES		if PROC_FS
 	select PROC_PID_ARCH_STATUS		if PROC_FS
+	select HAVE_ARCH_NODE_DEV_GROUP		if X86_SGX
 	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
 
 config INSTRUCTION_DECODER
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index a6e313f1a82d..02ebb233c511 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -714,9 +714,12 @@  static bool __init sgx_page_cache_init(void)
 			spin_lock_init(&sgx_numa_nodes[nid].lock);
 			INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
 			node_set(nid, sgx_numa_mask);
+			sgx_numa_nodes[nid].size = 0;
 		}
 
 		sgx_epc_sections[i].node =  &sgx_numa_nodes[nid];
+		sgx_numa_nodes[nid].dev = &node_devices[nid]->dev;
+		sgx_numa_nodes[nid].size += size;
 
 		sgx_nr_epc_sections++;
 	}
@@ -790,6 +793,34 @@  int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+#ifdef CONFIG_NUMA
+static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	unsigned long size = 0;
+	int nid;
+
+	for (nid = 0; nid < num_possible_nodes(); nid++) {
+		if (dev == sgx_numa_nodes[nid].dev) {
+			size = sgx_numa_nodes[nid].size;
+			break;
+		}
+	}
+
+	return sysfs_emit(buf, "%lu\n", size);
+}
+static DEVICE_ATTR_RO(sgx_total_bytes);
+
+static struct attribute *arch_node_dev_attrs[] = {
+	&dev_attr_sgx_total_bytes.attr,
+	NULL,
+};
+
+const struct attribute_group arch_node_dev_group = {
+	.name = "x86",
+	.attrs = arch_node_dev_attrs,
+};
+#endif /* CONFIG_NUMA */
+
 static int __init sgx_init(void)
 {
 	int ret;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index 4628acec0009..1de8c627a286 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,6 +39,8 @@  struct sgx_epc_page {
  */
 struct sgx_numa_node {
 	struct list_head free_page_list;
+	struct device *dev;
+	unsigned long size;
 	spinlock_t lock;
 };
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 4a4ae868ad9f..8da977895b9a 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -565,7 +565,18 @@  static struct attribute *node_dev_attrs[] = {
 	&dev_attr_vmstat.attr,
 	NULL
 };
-ATTRIBUTE_GROUPS(node_dev);
+
+static const struct attribute_group node_dev_group = {
+	.attrs = node_dev_attrs,
+};
+
+static const struct attribute_group *node_dev_groups[] = {
+	&node_dev_group,
+#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
+	&arch_node_dev_group,
+#endif
+	NULL,
+};
 
 #ifdef CONFIG_HUGETLBFS
 /*
diff --git a/include/linux/numa.h b/include/linux/numa.h
index cb44cfe2b725..59df211d051f 100644
--- a/include/linux/numa.h
+++ b/include/linux/numa.h
@@ -58,4 +58,8 @@  static inline int phys_to_target_node(u64 start)
 }
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
+extern const struct attribute_group arch_node_dev_group;
+#endif
+
 #endif /* _LINUX_NUMA_H */