diff mbox series

[v3,updated] mm/demotion: Expose memory tier details via sysfs

Message ID 20220830081736.119281-1-aneesh.kumar@linux.ibm.com (mailing list archive)
State New
Headers show
Series [v3,updated] mm/demotion: Expose memory tier details via sysfs | expand

Commit Message

Aneesh Kumar K.V Aug. 30, 2022, 8:17 a.m. UTC
This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
related details can be found. All allocated memory tiers will be listed
there as /sys/devices/virtual/memory_tiering/memory_tierN/

The nodes which are part of a specific memory tier can be listed via
/sys/devices/virtual/memory_tiering/memory_tierN/nodes

A directory hierarchy looks like
:/sys/devices/virtual/memory_tiering$ tree memory_tier4/
memory_tier4/
├── nodes
├── subsystem -> ../../../../bus/memory_tiering
└── uevent

All toptier nodes are listed via
/sys/devices/virtual/memory_tiering/toptier_nodes

:/sys/devices/virtual/memory_tiering$ cat toptier_nodes
0,2
:/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
0,2

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---

Changes from v2:
* update macro to static inline
* Fix build error with CONFIG_MIGRATION disabled
* drop abstract_distance
* update commit message


 .../ABI/testing/sysfs-kernel-mm-memory-tiers  |  35 ++++
 mm/memory-tiers.c                             | 154 +++++++++++++++---
 2 files changed, 167 insertions(+), 22 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-memory-tiers

Comments

Huang, Ying Sept. 1, 2022, 7:01 a.m. UTC | #1
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
> related details can be found. All allocated memory tiers will be listed
> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>
> The nodes which are part of a specific memory tier can be listed via
> /sys/devices/virtual/memory_tiering/memory_tierN/nodes

I think "memory_tier" is a better subsystem/bus name than
memory_tiering.  Because we have a set of memory_tierN devices inside.
"memory_tier" sounds more natural.  I know this is subjective, just my
preference.

>
> A directory hierarchy looks like
> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
> memory_tier4/
> ├── nodes
> ├── subsystem -> ../../../../bus/memory_tiering
> └── uevent
>
> All toptier nodes are listed via
> /sys/devices/virtual/memory_tiering/toptier_nodes
>
> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
> 0,2
> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
> 0,2

I don't think that it is a good idea to show toptier information in user
space interface.  Because it is just a in kernel implementation
details.  Now, we only promote pages from !toptier to toptier.  But
there may be multiple memory tiers in toptier and !toptier, we may
change the implementation in the future.  For example, we may promote
pages from DRAM to HBM in the future.

Do we need a way to show the default memory tier in sysfs?  That is, the
memory tier that the DRAM nodes belong to.

Best Regards,
Huang, Ying

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>
> Changes from v2:
> * update macro to static inline
> * Fix build error with CONFIG_MIGRATION disabled
> * drop abstract_distance
> * update commit message
>
>

[snip]
Aneesh Kumar K.V Sept. 1, 2022, 8:24 a.m. UTC | #2
On 9/1/22 12:31 PM, Huang, Ying wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> 
>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>> related details can be found. All allocated memory tiers will be listed
>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>
>> The nodes which are part of a specific memory tier can be listed via
>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
> 
> I think "memory_tier" is a better subsystem/bus name than
> memory_tiering.  Because we have a set of memory_tierN devices inside.
> "memory_tier" sounds more natural.  I know this is subjective, just my
> preference.
> 
>>
>> A directory hierarchy looks like
>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
>> memory_tier4/
>> ├── nodes
>> ├── subsystem -> ../../../../bus/memory_tiering
>> └── uevent
>>
>> All toptier nodes are listed via
>> /sys/devices/virtual/memory_tiering/toptier_nodes
>>
>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
>> 0,2
>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
>> 0,2
> 
> I don't think that it is a good idea to show toptier information in user
> space interface.  Because it is just a in kernel implementation
> details.  Now, we only promote pages from !toptier to toptier.  But
> there may be multiple memory tiers in toptier and !toptier, we may
> change the implementation in the future.  For example, we may promote
> pages from DRAM to HBM in the future.
> 


In the case you describe above and others, we will always have a list of
NUMA nodes from which memory promotion is not done.
/sys/devices/virtual/memory_tiering/toptier_nodes shows that list.



> Do we need a way to show the default memory tier in sysfs?  That is, the
> memory tier that the DRAM nodes belong to.
> 

I will hold adding that until we have support for modifying memory tier details from
userspace. That is when userspace would want to know about the default memory tier. 

For now, the user interface is a simpler hierarchy of memory tiers, it's associated
nodes and the list of nodes from which promotion is not done.

-aneesh
Huang, Ying Sept. 2, 2022, 12:29 a.m. UTC | #3
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/1/22 12:31 PM, Huang, Ying wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>> related details can be found. All allocated memory tiers will be listed
>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>
>>> The nodes which are part of a specific memory tier can be listed via
>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>> 
>> I think "memory_tier" is a better subsystem/bus name than
>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>> "memory_tier" sounds more natural.  I know this is subjective, just my
>> preference.
>> 
>>>
>>> A directory hierarchy looks like
>>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
>>> memory_tier4/
>>> ├── nodes
>>> ├── subsystem -> ../../../../bus/memory_tiering
>>> └── uevent
>>>
>>> All toptier nodes are listed via
>>> /sys/devices/virtual/memory_tiering/toptier_nodes
>>>
>>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
>>> 0,2
>>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
>>> 0,2
>> 
>> I don't think that it is a good idea to show toptier information in user
>> space interface.  Because it is just a in kernel implementation
>> details.  Now, we only promote pages from !toptier to toptier.  But
>> there may be multiple memory tiers in toptier and !toptier, we may
>> change the implementation in the future.  For example, we may promote
>> pages from DRAM to HBM in the future.
>> 
>
>
> In the case you describe above and others, we will always have a list of
> NUMA nodes from which memory promotion is not done.
> /sys/devices/virtual/memory_tiering/toptier_nodes shows that list.

I don't think we will need that interface if we don't restrict promotion
in the future.  For example, he can just check the memory tier with
smallest number.

TBH, I don't know why do we need that interface.  What is it for?  We
don't want to expose unnecessary information to restrict our in kernel
implementation in the future.

So, please remove that interface at least before we discussing it
thoroughly.

>> Do we need a way to show the default memory tier in sysfs?  That is, the
>> memory tier that the DRAM nodes belong to.
>> 
>
> I will hold adding that until we have support for modifying memory tier details from
> userspace. That is when userspace would want to know about the default memory tier. 
>
> For now, the user interface is a simpler hierarchy of memory tiers, it's associated
> nodes and the list of nodes from which promotion is not done.

OK.

Best Regards,
Huang, Ying
Wei Xu Sept. 2, 2022, 5:09 a.m. UTC | #4
On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>
> > On 9/1/22 12:31 PM, Huang, Ying wrote:
> >> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >>
> >>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
> >>> related details can be found. All allocated memory tiers will be listed
> >>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
> >>>
> >>> The nodes which are part of a specific memory tier can be listed via
> >>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
> >>
> >> I think "memory_tier" is a better subsystem/bus name than
> >> memory_tiering.  Because we have a set of memory_tierN devices inside.
> >> "memory_tier" sounds more natural.  I know this is subjective, just my
> >> preference.
> >>
> >>>
> >>> A directory hierarchy looks like
> >>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
> >>> memory_tier4/
> >>> ├── nodes
> >>> ├── subsystem -> ../../../../bus/memory_tiering
> >>> └── uevent
> >>>
> >>> All toptier nodes are listed via
> >>> /sys/devices/virtual/memory_tiering/toptier_nodes
> >>>
> >>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
> >>> 0,2
> >>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
> >>> 0,2
> >>
> >> I don't think that it is a good idea to show toptier information in user
> >> space interface.  Because it is just a in kernel implementation
> >> details.  Now, we only promote pages from !toptier to toptier.  But
> >> there may be multiple memory tiers in toptier and !toptier, we may
> >> change the implementation in the future.  For example, we may promote
> >> pages from DRAM to HBM in the future.
> >>
> >
> >
> > In the case you describe above and others, we will always have a list of
> > NUMA nodes from which memory promotion is not done.
> > /sys/devices/virtual/memory_tiering/toptier_nodes shows that list.
>
> I don't think we will need that interface if we don't restrict promotion
> in the future.  For example, he can just check the memory tier with
> smallest number.
>
> TBH, I don't know why do we need that interface.  What is it for?  We
> don't want to expose unnecessary information to restrict our in kernel
> implementation in the future.
>
> So, please remove that interface at least before we discussing it
> thoroughly.

I have asked for this interface to allow the userspace to query a list
of top-tier nodes as the targets of userspace-driven promotions.  The
idea is that demotion can gradually go down tier by tier, but we
promote hot pages directly to the top-tier and bypass the immediate
tiers.

Certainly, this can be viewed as a policy choice.  Given that now we
have a clearly defined memory tier hierarchy in sysfs and the
toptier_nodes content can be constructed from this memory tier
hierarchy and other information from the node sysfs interfaces, I am
fine if we want to remove toptier_nodes and keep the current memory
tier sysfs interfaces to the minimal.

Wei Xu

> >> Do we need a way to show the default memory tier in sysfs?  That is, the
> >> memory tier that the DRAM nodes belong to.
> >>
> >
> > I will hold adding that until we have support for modifying memory tier details from
> > userspace. That is when userspace would want to know about the default memory tier.
> >
> > For now, the user interface is a simpler hierarchy of memory tiers, it's associated
> > nodes and the list of nodes from which promotion is not done.
>
> OK.
>
> Best Regards,
> Huang, Ying
Huang, Ying Sept. 2, 2022, 5:15 a.m. UTC | #5
Wei Xu <weixugc@google.com> writes:

> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>
>> > On 9/1/22 12:31 PM, Huang, Ying wrote:
>> >> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> >>
>> >>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>> >>> related details can be found. All allocated memory tiers will be listed
>> >>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>> >>>
>> >>> The nodes which are part of a specific memory tier can be listed via
>> >>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>> >>
>> >> I think "memory_tier" is a better subsystem/bus name than
>> >> memory_tiering.  Because we have a set of memory_tierN devices inside.
>> >> "memory_tier" sounds more natural.  I know this is subjective, just my
>> >> preference.
>> >>
>> >>>
>> >>> A directory hierarchy looks like
>> >>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
>> >>> memory_tier4/
>> >>> ├── nodes
>> >>> ├── subsystem -> ../../../../bus/memory_tiering
>> >>> └── uevent
>> >>>
>> >>> All toptier nodes are listed via
>> >>> /sys/devices/virtual/memory_tiering/toptier_nodes
>> >>>
>> >>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
>> >>> 0,2
>> >>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
>> >>> 0,2
>> >>
>> >> I don't think that it is a good idea to show toptier information in user
>> >> space interface.  Because it is just a in kernel implementation
>> >> details.  Now, we only promote pages from !toptier to toptier.  But
>> >> there may be multiple memory tiers in toptier and !toptier, we may
>> >> change the implementation in the future.  For example, we may promote
>> >> pages from DRAM to HBM in the future.
>> >>
>> >
>> >
>> > In the case you describe above and others, we will always have a list of
>> > NUMA nodes from which memory promotion is not done.
>> > /sys/devices/virtual/memory_tiering/toptier_nodes shows that list.
>>
>> I don't think we will need that interface if we don't restrict promotion
>> in the future.  For example, he can just check the memory tier with
>> smallest number.
>>
>> TBH, I don't know why do we need that interface.  What is it for?  We
>> don't want to expose unnecessary information to restrict our in kernel
>> implementation in the future.
>>
>> So, please remove that interface at least before we discussing it
>> thoroughly.
>
> I have asked for this interface to allow the userspace to query a list
> of top-tier nodes as the targets of userspace-driven promotions.  The
> idea is that demotion can gradually go down tier by tier, but we
> promote hot pages directly to the top-tier and bypass the immediate
> tiers.
>
> Certainly, this can be viewed as a policy choice.

Yes.  It's possible for us to change this in the future.

> Given that now we have a clearly defined memory tier hierarchy in
> sysfs and the toptier_nodes content can be constructed from this
> memory tier hierarchy and other information from the node sysfs
> interfaces, I am fine if we want to remove toptier_nodes and keep the
> current memory tier sysfs interfaces to the minimal.

Thanks!

Best Regards,
Huang, Ying

>> >> Do we need a way to show the default memory tier in sysfs?  That is, the
>> >> memory tier that the DRAM nodes belong to.
>> >>
>> >
>> > I will hold adding that until we have support for modifying memory tier details from
>> > userspace. That is when userspace would want to know about the default memory tier.
>> >
>> > For now, the user interface is a simpler hierarchy of memory tiers, it's associated
>> > nodes and the list of nodes from which promotion is not done.
>>
>> OK.
>>
>> Best Regards,
>> Huang, Ying
Aneesh Kumar K.V Sept. 2, 2022, 5:23 a.m. UTC | #6
On 9/2/22 10:39 AM, Wei Xu wrote:
> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>
>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>> related details can be found. All allocated memory tiers will be listed
>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>
>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>
>>>> I think "memory_tier" is a better subsystem/bus name than
>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>> preference.
>>>>


I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
because we would want it to a susbsystem where all memory tiering related details can be found
including memory type in the future. This is as per discussion 

https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com

>>>>>
>>>>> A directory hierarchy looks like
>>>>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
>>>>> memory_tier4/
>>>>> ├── nodes
>>>>> ├── subsystem -> ../../../../bus/memory_tiering
>>>>> └── uevent
>>>>>
>>>>> All toptier nodes are listed via
>>>>> /sys/devices/virtual/memory_tiering/toptier_nodes
>>>>>
>>>>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
>>>>> 0,2
>>>>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
>>>>> 0,2
>>>>
>>>> I don't think that it is a good idea to show toptier information in user
>>>> space interface.  Because it is just a in kernel implementation
>>>> details.  Now, we only promote pages from !toptier to toptier.  But
>>>> there may be multiple memory tiers in toptier and !toptier, we may
>>>> change the implementation in the future.  For example, we may promote
>>>> pages from DRAM to HBM in the future.
>>>>
>>>
>>>
>>> In the case you describe above and others, we will always have a list of
>>> NUMA nodes from which memory promotion is not done.
>>> /sys/devices/virtual/memory_tiering/toptier_nodes shows that list.
>>
>> I don't think we will need that interface if we don't restrict promotion
>> in the future.  For example, he can just check the memory tier with
>> smallest number.
>>
>> TBH, I don't know why do we need that interface.  What is it for?  We
>> don't want to expose unnecessary information to restrict our in kernel
>> implementation in the future.
>>
>> So, please remove that interface at least before we discussing it
>> thoroughly.
> 
> I have asked for this interface to allow the userspace to query a list
> of top-tier nodes as the targets of userspace-driven promotions.  The
> idea is that demotion can gradually go down tier by tier, but we
> promote hot pages directly to the top-tier and bypass the immediate
> tiers.
> 
> Certainly, this can be viewed as a policy choice.  Given that now we
> have a clearly defined memory tier hierarchy in sysfs and the
> toptier_nodes content can be constructed from this memory tier
> hierarchy and other information from the node sysfs interfaces, I am
> fine if we want to remove toptier_nodes and keep the current memory
> tier sysfs interfaces to the minimal.
>


Ok I can do a v4 with toptier_nodes dropped.

 
-aneesh
Huang, Ying Sept. 2, 2022, 5:40 a.m. UTC | #7
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/2/22 10:39 AM, Wei Xu wrote:
>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>
>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>
>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>> preference.
>>>>>
>
>
> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
> because we would want it to a susbsystem where all memory tiering related details can be found
> including memory type in the future. This is as per discussion 
>
> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com

I don't think that it's a good idea to mix 2 types of devices in one
subsystem (bus).  If my understanding were correct, that breaks the
driver core convention.

>>>>>>
>>>>>> A directory hierarchy looks like
>>>>>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/
>>>>>> memory_tier4/
>>>>>> ├── nodes
>>>>>> ├── subsystem -> ../../../../bus/memory_tiering
>>>>>> └── uevent
>>>>>>
>>>>>> All toptier nodes are listed via
>>>>>> /sys/devices/virtual/memory_tiering/toptier_nodes
>>>>>>
>>>>>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes
>>>>>> 0,2
>>>>>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes
>>>>>> 0,2
>>>>>
>>>>> I don't think that it is a good idea to show toptier information in user
>>>>> space interface.  Because it is just a in kernel implementation
>>>>> details.  Now, we only promote pages from !toptier to toptier.  But
>>>>> there may be multiple memory tiers in toptier and !toptier, we may
>>>>> change the implementation in the future.  For example, we may promote
>>>>> pages from DRAM to HBM in the future.
>>>>>
>>>>
>>>>
>>>> In the case you describe above and others, we will always have a list of
>>>> NUMA nodes from which memory promotion is not done.
>>>> /sys/devices/virtual/memory_tiering/toptier_nodes shows that list.
>>>
>>> I don't think we will need that interface if we don't restrict promotion
>>> in the future.  For example, he can just check the memory tier with
>>> smallest number.
>>>
>>> TBH, I don't know why do we need that interface.  What is it for?  We
>>> don't want to expose unnecessary information to restrict our in kernel
>>> implementation in the future.
>>>
>>> So, please remove that interface at least before we discussing it
>>> thoroughly.
>> 
>> I have asked for this interface to allow the userspace to query a list
>> of top-tier nodes as the targets of userspace-driven promotions.  The
>> idea is that demotion can gradually go down tier by tier, but we
>> promote hot pages directly to the top-tier and bypass the immediate
>> tiers.
>> 
>> Certainly, this can be viewed as a policy choice.  Given that now we
>> have a clearly defined memory tier hierarchy in sysfs and the
>> toptier_nodes content can be constructed from this memory tier
>> hierarchy and other information from the node sysfs interfaces, I am
>> fine if we want to remove toptier_nodes and keep the current memory
>> tier sysfs interfaces to the minimal.
>>
>
>
> Ok I can do a v4 with toptier_nodes dropped.

Thanks!

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 2, 2022, 5:46 a.m. UTC | #8
On 9/2/22 11:10 AM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>
>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>
>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>
>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>> preference.
>>>>>>
>>
>>
>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
>> because we would want it to a susbsystem where all memory tiering related details can be found
>> including memory type in the future. This is as per discussion 
>>
>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
> 
> I don't think that it's a good idea to mix 2 types of devices in one
> subsystem (bus).  If my understanding were correct, that breaks the
> driver core convention.
> 

All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
details of memory tiers and can possibly contain details of different memory types .

-aneesh
Huang, Ying Sept. 2, 2022, 6:12 a.m. UTC | #9
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/2/22 11:10 AM, Huang, Ying wrote:
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>
>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>
>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>
>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>> preference.
>>>>>>>
>>>
>>>
>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>> including memory type in the future. This is as per discussion 
>>>
>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>> 
>> I don't think that it's a good idea to mix 2 types of devices in one
>> subsystem (bus).  If my understanding were correct, that breaks the
>> driver core convention.
>> 
>
> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
> details of memory tiers and can possibly contain details of different memory types .

IMHO, memory_tier and memory_type are 2 kind of devices.  They have
almost totally different attributes (sysfs file).  So, we should create
2 buses for them.  Each has its own attribute group.  "virtual" itself
isn't a subsystem.

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 2, 2022, 6:31 a.m. UTC | #10
On 9/2/22 11:42 AM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>
>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>
>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>
>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>> preference.
>>>>>>>>
>>>>
>>>>
>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>> including memory type in the future. This is as per discussion 
>>>>
>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>
>>> I don't think that it's a good idea to mix 2 types of devices in one
>>> subsystem (bus).  If my understanding were correct, that breaks the
>>> driver core convention.
>>>
>>
>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>> details of memory tiers and can possibly contain details of different memory types .
> 
> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
> almost totally different attributes (sysfs file).  So, we should create
> 2 buses for them.  Each has its own attribute group.  "virtual" itself
> isn't a subsystem.

Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier. 

/sys/devices/virtual/memory_tiering/memory_tierN
/sys/devices/virtual/memory_tiering/memory_typeN

-aneesh
Huang, Ying Sept. 2, 2022, 6:40 a.m. UTC | #11
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/2/22 11:42 AM, Huang, Ying wrote:
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>
>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>
>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>
>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>
>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>> preference.
>>>>>>>>>
>>>>>
>>>>>
>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>> including memory type in the future. This is as per discussion 
>>>>>
>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>
>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>> driver core convention.
>>>>
>>>
>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>> details of memory tiers and can possibly contain details of different memory types .
>> 
>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>> almost totally different attributes (sysfs file).  So, we should create
>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>> isn't a subsystem.
>
> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier. 
>
> /sys/devices/virtual/memory_tiering/memory_tierN
> /sys/devices/virtual/memory_tiering/memory_typeN

I think we should add

 /sys/devices/virtual/memory_tier/memory_tierN
 /sys/devices/virtual/memory_type/memory_typeN

I don't think this is complex.  Devices of same bus/subsystem should
have mostly same attributes.  This is my understanding of driver core
convention.

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 2, 2022, 6:44 a.m. UTC | #12
On 9/2/22 12:10 PM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>
>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>
>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>
>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>
>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>> preference.
>>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4 
>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>> including memory type in the future. This is as per discussion 
>>>>>>
>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>
>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>> driver core convention.
>>>>>
>>>>
>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>> details of memory tiers and can possibly contain details of different memory types .
>>>
>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>> almost totally different attributes (sysfs file).  So, we should create
>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>> isn't a subsystem.
>>
>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier. 
>>
>> /sys/devices/virtual/memory_tiering/memory_tierN
>> /sys/devices/virtual/memory_tiering/memory_typeN
> 
> I think we should add
> 
>  /sys/devices/virtual/memory_tier/memory_tierN
>  /sys/devices/virtual/memory_type/memory_typeN
> 

I am trying to find if there is a technical reason to do the same? 

> I don't think this is complex.  Devices of same bus/subsystem should
> have mostly same attributes.  This is my understanding of driver core
> convention.
> 

I was not looking at this from code complexity point. Instead of having multiple directories
with details w.r.t memory tiering, I was looking at consolidating the details
within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
are consolidated within /sys/devics/virtual/).

-aneesh
Wei Xu Sept. 2, 2022, 7:02 a.m. UTC | #13
On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 9/2/22 12:10 PM, Huang, Ying wrote:
> > Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> >
> >> On 9/2/22 11:42 AM, Huang, Ying wrote:
> >>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> >>>
> >>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
> >>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> >>>>>
> >>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
> >>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
> >>>>>>>>
> >>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> >>>>>>>>
> >>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
> >>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >>>>>>>>>>
> >>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
> >>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
> >>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
> >>>>>>>>>>>
> >>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
> >>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
> >>>>>>>>>>
> >>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
> >>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
> >>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
> >>>>>>>>>> preference.
> >>>>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
> >>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
> >>>>>> including memory type in the future. This is as per discussion
> >>>>>>
> >>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
> >>>>>
> >>>>> I don't think that it's a good idea to mix 2 types of devices in one
> >>>>> subsystem (bus).  If my understanding were correct, that breaks the
> >>>>> driver core convention.
> >>>>>
> >>>>
> >>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
> >>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
> >>>> details of memory tiers and can possibly contain details of different memory types .
> >>>
> >>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
> >>> almost totally different attributes (sysfs file).  So, we should create
> >>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
> >>> isn't a subsystem.
> >>
> >> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
> >> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
> >> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
> >>
> >> /sys/devices/virtual/memory_tiering/memory_tierN
> >> /sys/devices/virtual/memory_tiering/memory_typeN
> >
> > I think we should add
> >
> >  /sys/devices/virtual/memory_tier/memory_tierN
> >  /sys/devices/virtual/memory_type/memory_typeN
> >
>
> I am trying to find if there is a technical reason to do the same?
>
> > I don't think this is complex.  Devices of same bus/subsystem should
> > have mostly same attributes.  This is my understanding of driver core
> > convention.
> >
>
> I was not looking at this from code complexity point. Instead of having multiple directories
> with details w.r.t memory tiering, I was looking at consolidating the details
> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
> are consolidated within /sys/devics/virtual/).
>
> -aneesh

Here is an example of /sys/bus/nd/devices (I know it is not under
/sys/devices/virtual, but it can still serve as a reference):

ls -1 /sys/bus/nd/devices

namespace2.0
namespace3.0
ndbus0
nmem0
nmem1
region0
region1
region2
region3

So I think it is not unreasonable if we want to group memory tiering
related interfaces within a single top directory.

Wei
Huang, Ying Sept. 2, 2022, 7:57 a.m. UTC | #14
Wei Xu <weixugc@google.com> writes:

> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>> > Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> >
>> >> On 9/2/22 11:42 AM, Huang, Ying wrote:
>> >>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> >>>
>> >>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>> >>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> >>>>>
>> >>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>> >>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> >>>>>>>>
>> >>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>> >>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> >>>>>>>>>>
>> >>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>> >>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>> >>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>> >>>>>>>>>>>
>> >>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>> >>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>> >>>>>>>>>>
>> >>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>> >>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>> >>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>> >>>>>>>>>> preference.
>> >>>>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>> >>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>> >>>>>> including memory type in the future. This is as per discussion
>> >>>>>>
>> >>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>> >>>>>
>> >>>>> I don't think that it's a good idea to mix 2 types of devices in one
>> >>>>> subsystem (bus).  If my understanding were correct, that breaks the
>> >>>>> driver core convention.
>> >>>>>
>> >>>>
>> >>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>> >>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>> >>>> details of memory tiers and can possibly contain details of different memory types .
>> >>>
>> >>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>> >>> almost totally different attributes (sysfs file).  So, we should create
>> >>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>> >>> isn't a subsystem.
>> >>
>> >> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>> >> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>> >> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>> >>
>> >> /sys/devices/virtual/memory_tiering/memory_tierN
>> >> /sys/devices/virtual/memory_tiering/memory_typeN
>> >
>> > I think we should add
>> >
>> >  /sys/devices/virtual/memory_tier/memory_tierN
>> >  /sys/devices/virtual/memory_type/memory_typeN
>> >
>>
>> I am trying to find if there is a technical reason to do the same?
>>
>> > I don't think this is complex.  Devices of same bus/subsystem should
>> > have mostly same attributes.  This is my understanding of driver core
>> > convention.
>> >
>>
>> I was not looking at this from code complexity point. Instead of having multiple directories
>> with details w.r.t memory tiering, I was looking at consolidating the details
>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>> are consolidated within /sys/devics/virtual/).
>>
>> -aneesh
>
> Here is an example of /sys/bus/nd/devices (I know it is not under
> /sys/devices/virtual, but it can still serve as a reference):
>
> ls -1 /sys/bus/nd/devices
>
> namespace2.0
> namespace3.0
> ndbus0
> nmem0
> nmem1
> region0
> region1
> region2
> region3
>
> So I think it is not unreasonable if we want to group memory tiering
> related interfaces within a single top directory.

Thanks for pointing this out.  My original understanding of driver core
isn't correct.

But I still think it's better to separate instead of mixing memory_tier
and memory_type.  Per my understanding, memory_type shows information
(abstract distance, latency, bandwidth, etc.) of memory types (and
nodes), it can be useful even without memory tiers.  That is, memory
types describes the physical characteristics, while memory tier reflects
the policy.

Just my 2 cents.

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 2, 2022, 8:48 a.m. UTC | #15
On 9/2/22 1:27 PM, Huang, Ying wrote:
> Wei Xu <weixugc@google.com> writes:
> 
>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>> <aneesh.kumar@linux.ibm.com> wrote:
>>>
>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>
>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>
>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>> driver core convention.
>>>>>>>>
>>>>>>>
>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>
>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>> isn't a subsystem.
>>>>>
>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>
>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>
>>>> I think we should add
>>>>
>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>
>>>
>>> I am trying to find if there is a technical reason to do the same?
>>>
>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>> have mostly same attributes.  This is my understanding of driver core
>>>> convention.
>>>>
>>>
>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>> are consolidated within /sys/devics/virtual/).
>>>
>>> -aneesh
>>
>> Here is an example of /sys/bus/nd/devices (I know it is not under
>> /sys/devices/virtual, but it can still serve as a reference):
>>
>> ls -1 /sys/bus/nd/devices
>>
>> namespace2.0
>> namespace3.0
>> ndbus0
>> nmem0
>> nmem1
>> region0
>> region1
>> region2
>> region3
>>
>> So I think it is not unreasonable if we want to group memory tiering
>> related interfaces within a single top directory.
> 
> Thanks for pointing this out.  My original understanding of driver core
> isn't correct.
> 
> But I still think it's better to separate instead of mixing memory_tier
> and memory_type.  Per my understanding, memory_type shows information
> (abstract distance, latency, bandwidth, etc.) of memory types (and
> nodes), it can be useful even without memory tiers.  That is, memory
> types describes the physical characteristics, while memory tier reflects
> the policy.
>

The latency and bandwidth details are already exposed via 

	/sys/devices/system/node/nodeY/access0/initiators/

Documentation/admin-guide/mm/numaperf.rst

That is the interface that libraries like libmemkind will look at for finding
details w.r.t latency/bandwidth

-aneesh
Huang, Ying Sept. 2, 2022, 9:04 a.m. UTC | #16
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/2/22 1:27 PM, Huang, Ying wrote:
>> Wei Xu <weixugc@google.com> writes:
>> 
>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>
>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>
>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>
>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>
>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>> driver core convention.
>>>>>>>>>
>>>>>>>>
>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>
>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>> isn't a subsystem.
>>>>>>
>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>
>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>
>>>>> I think we should add
>>>>>
>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>
>>>>
>>>> I am trying to find if there is a technical reason to do the same?
>>>>
>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>> convention.
>>>>>
>>>>
>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>> are consolidated within /sys/devics/virtual/).
>>>>
>>>> -aneesh
>>>
>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>> /sys/devices/virtual, but it can still serve as a reference):
>>>
>>> ls -1 /sys/bus/nd/devices
>>>
>>> namespace2.0
>>> namespace3.0
>>> ndbus0
>>> nmem0
>>> nmem1
>>> region0
>>> region1
>>> region2
>>> region3
>>>
>>> So I think it is not unreasonable if we want to group memory tiering
>>> related interfaces within a single top directory.
>> 
>> Thanks for pointing this out.  My original understanding of driver core
>> isn't correct.
>> 
>> But I still think it's better to separate instead of mixing memory_tier
>> and memory_type.  Per my understanding, memory_type shows information
>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>> nodes), it can be useful even without memory tiers.  That is, memory
>> types describes the physical characteristics, while memory tier reflects
>> the policy.
>>
>
> The latency and bandwidth details are already exposed via 
>
> 	/sys/devices/system/node/nodeY/access0/initiators/
>
> Documentation/admin-guide/mm/numaperf.rst
>
> That is the interface that libraries like libmemkind will look at for finding
> details w.r.t latency/bandwidth

Yes.  Only with that, it's still inconvenient to find out which nodes
belong to same memory type (has same performance, same topology, managed
by same driver, etc).  So memory types can still provide useful
information even without memory tiering.

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 2, 2022, 9:44 a.m. UTC | #17
On 9/2/22 2:34 PM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>> Wei Xu <weixugc@google.com> writes:
>>>
>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>
>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>
>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>
>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>> driver core convention.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>
>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>> isn't a subsystem.
>>>>>>>
>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>
>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>
>>>>>> I think we should add
>>>>>>
>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>
>>>>>
>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>
>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>> convention.
>>>>>>
>>>>>
>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>> are consolidated within /sys/devics/virtual/).
>>>>>
>>>>> -aneesh
>>>>
>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>
>>>> ls -1 /sys/bus/nd/devices
>>>>
>>>> namespace2.0
>>>> namespace3.0
>>>> ndbus0
>>>> nmem0
>>>> nmem1
>>>> region0
>>>> region1
>>>> region2
>>>> region3
>>>>
>>>> So I think it is not unreasonable if we want to group memory tiering
>>>> related interfaces within a single top directory.
>>>
>>> Thanks for pointing this out.  My original understanding of driver core
>>> isn't correct.
>>>
>>> But I still think it's better to separate instead of mixing memory_tier
>>> and memory_type.  Per my understanding, memory_type shows information
>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>> nodes), it can be useful even without memory tiers.  That is, memory
>>> types describes the physical characteristics, while memory tier reflects
>>> the policy.
>>>
>>
>> The latency and bandwidth details are already exposed via 
>>
>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>
>> Documentation/admin-guide/mm/numaperf.rst
>>
>> That is the interface that libraries like libmemkind will look at for finding
>> details w.r.t latency/bandwidth
> 
> Yes.  Only with that, it's still inconvenient to find out which nodes
> belong to same memory type (has same performance, same topology, managed
> by same driver, etc).  So memory types can still provide useful
> information even without memory tiering.
> 

I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
"memory_tiering" so that all memory tiering related information can be consolidated there.
I guess you agreed to the above part that we can consolidated things like that. 


We might end up adding memory_type there if we allow changing "abstract distance" of a
memory type from userspace later. Otherwise, I don't see a reason for memory type to be
exposed. But then we don't have to decide on this now. 


-aneesh
Huang, Ying Sept. 5, 2022, 1:52 a.m. UTC | #18
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/2/22 2:34 PM, Huang, Ying wrote:
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>> Wei Xu <weixugc@google.com> writes:
>>>>
>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>
>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>
>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>
>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>
>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>> driver core convention.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>
>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>> isn't a subsystem.
>>>>>>>>
>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>
>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>
>>>>>>> I think we should add
>>>>>>>
>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>
>>>>>>
>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>
>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>> convention.
>>>>>>>
>>>>>>
>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>
>>>>>> -aneesh
>>>>>
>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>
>>>>> ls -1 /sys/bus/nd/devices
>>>>>
>>>>> namespace2.0
>>>>> namespace3.0
>>>>> ndbus0
>>>>> nmem0
>>>>> nmem1
>>>>> region0
>>>>> region1
>>>>> region2
>>>>> region3
>>>>>
>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>> related interfaces within a single top directory.
>>>>
>>>> Thanks for pointing this out.  My original understanding of driver core
>>>> isn't correct.
>>>>
>>>> But I still think it's better to separate instead of mixing memory_tier
>>>> and memory_type.  Per my understanding, memory_type shows information
>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>> types describes the physical characteristics, while memory tier reflects
>>>> the policy.
>>>>
>>>
>>> The latency and bandwidth details are already exposed via 
>>>
>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>
>>> Documentation/admin-guide/mm/numaperf.rst
>>>
>>> That is the interface that libraries like libmemkind will look at for finding
>>> details w.r.t latency/bandwidth
>> 
>> Yes.  Only with that, it's still inconvenient to find out which nodes
>> belong to same memory type (has same performance, same topology, managed
>> by same driver, etc).  So memory types can still provide useful
>> information even without memory tiering.
>> 
>
> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
> "memory_tiering" so that all memory tiering related information can be consolidated there.
> I guess you agreed to the above part that we can consolidated things like that. 

I just prefer to separate memory_tier and memory_type sysfs directories
personally.  Because memory_type describes the physical memory types and
performance, while memory_tier is more about the policy to group
memory_types.

> We might end up adding memory_type there if we allow changing "abstract distance" of a
> memory type from userspace later. Otherwise, I don't see a reason for memory type to be
> exposed. But then we don't have to decide on this now. 

As above, because I think memory_type can provide value even outside of
memory_tier, I prefer to add memory_type sysfs interface anyway
personally.

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 5, 2022, 3:50 a.m. UTC | #19
On 9/5/22 7:22 AM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/2/22 2:34 PM, Huang, Ying wrote:
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>>> Wei Xu <weixugc@google.com> writes:
>>>>>
>>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>>
>>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>>> driver core convention.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>>
>>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>>> isn't a subsystem.
>>>>>>>>>
>>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>>
>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>>
>>>>>>>> I think we should add
>>>>>>>>
>>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>>
>>>>>>>
>>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>>
>>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>>> convention.
>>>>>>>>
>>>>>>>
>>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>>
>>>>>>> -aneesh
>>>>>>
>>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>>
>>>>>> ls -1 /sys/bus/nd/devices
>>>>>>
>>>>>> namespace2.0
>>>>>> namespace3.0
>>>>>> ndbus0
>>>>>> nmem0
>>>>>> nmem1
>>>>>> region0
>>>>>> region1
>>>>>> region2
>>>>>> region3
>>>>>>
>>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>>> related interfaces within a single top directory.
>>>>>
>>>>> Thanks for pointing this out.  My original understanding of driver core
>>>>> isn't correct.
>>>>>
>>>>> But I still think it's better to separate instead of mixing memory_tier
>>>>> and memory_type.  Per my understanding, memory_type shows information
>>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>>> types describes the physical characteristics, while memory tier reflects
>>>>> the policy.
>>>>>
>>>>
>>>> The latency and bandwidth details are already exposed via 
>>>>
>>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>>
>>>> Documentation/admin-guide/mm/numaperf.rst
>>>>
>>>> That is the interface that libraries like libmemkind will look at for finding
>>>> details w.r.t latency/bandwidth
>>>
>>> Yes.  Only with that, it's still inconvenient to find out which nodes
>>> belong to same memory type (has same performance, same topology, managed
>>> by same driver, etc).  So memory types can still provide useful
>>> information even without memory tiering.
>>>
>>
>> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
>> "memory_tiering" so that all memory tiering related information can be consolidated there.
>> I guess you agreed to the above part that we can consolidated things like that. 
> 
> I just prefer to separate memory_tier and memory_type sysfs directories
> personally.  Because memory_type describes the physical memory types and
> performance, while memory_tier is more about the policy to group
> memory_types.
>
IMHO we can decide on that based on why we end up adding memory_type details to sysfs. If that
is only for memory tier modification from userspace we can look at adding that in the memory tiering
sysfs hierarchy. 

Also since we have precedence of consolidating things within a sysfs hierarchy as explained in previous emails,
I think we should keep "memory_tiering" as sysfs subsystem name? I hope we can get an agreement on that
for now?

-aneesh
Huang, Ying Sept. 5, 2022, 5:13 a.m. UTC | #20
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/5/22 7:22 AM, Huang, Ying wrote:
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> On 9/2/22 2:34 PM, Huang, Ying wrote:
>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>>>> Wei Xu <weixugc@google.com> writes:
>>>>>>
>>>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>
>>>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>>>> driver core convention.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>>>
>>>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>>>> isn't a subsystem.
>>>>>>>>>>
>>>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>>>
>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>>>
>>>>>>>>> I think we should add
>>>>>>>>>
>>>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>>>
>>>>>>>>
>>>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>>>
>>>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>>>> convention.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>>>
>>>>>>>> -aneesh
>>>>>>>
>>>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>>>
>>>>>>> ls -1 /sys/bus/nd/devices
>>>>>>>
>>>>>>> namespace2.0
>>>>>>> namespace3.0
>>>>>>> ndbus0
>>>>>>> nmem0
>>>>>>> nmem1
>>>>>>> region0
>>>>>>> region1
>>>>>>> region2
>>>>>>> region3
>>>>>>>
>>>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>>>> related interfaces within a single top directory.
>>>>>>
>>>>>> Thanks for pointing this out.  My original understanding of driver core
>>>>>> isn't correct.
>>>>>>
>>>>>> But I still think it's better to separate instead of mixing memory_tier
>>>>>> and memory_type.  Per my understanding, memory_type shows information
>>>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>>>> types describes the physical characteristics, while memory tier reflects
>>>>>> the policy.
>>>>>>
>>>>>
>>>>> The latency and bandwidth details are already exposed via 
>>>>>
>>>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>>>
>>>>> Documentation/admin-guide/mm/numaperf.rst
>>>>>
>>>>> That is the interface that libraries like libmemkind will look at for finding
>>>>> details w.r.t latency/bandwidth
>>>>
>>>> Yes.  Only with that, it's still inconvenient to find out which nodes
>>>> belong to same memory type (has same performance, same topology, managed
>>>> by same driver, etc).  So memory types can still provide useful
>>>> information even without memory tiering.
>>>>
>>>
>>> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
>>> "memory_tiering" so that all memory tiering related information can be consolidated there.
>>> I guess you agreed to the above part that we can consolidated things like that. 
>> 
>> I just prefer to separate memory_tier and memory_type sysfs directories
>> personally.  Because memory_type describes the physical memory types and
>> performance, while memory_tier is more about the policy to group
>> memory_types.
>>
> IMHO we can decide on that based on why we end up adding memory_type details to sysfs. If that
> is only for memory tier modification from userspace we can look at adding that in the memory tiering
> sysfs hierarchy. 
>
> Also since we have precedence of consolidating things within a sysfs hierarchy as explained in previous emails,
> I think we should keep "memory_tiering" as sysfs subsystem name? I hope we can get an agreement on that
> for now?

I prefer to separate memory_tier and memory_type, so the subsystem name
should be "memory_tier".  You prefer to consolidate memory_tier and
memory_type, so the subsystem name should be "memory_tiering".

The main reason behind my idea is that memory_type isn't tied with
memory tiering directly.  It describes some hardware property.  Even if
we don't use memory tiering, we can still use that to classify the
memory devices in the system.

Why do you want to consolidate them?  To reduce one directory from
sysfs?

I want to get opinions from other people too.

Best Regards,
Huang, Ying
Aneesh Kumar K.V Sept. 5, 2022, 5:27 a.m. UTC | #21
On 9/5/22 10:43 AM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/5/22 7:22 AM, Huang, Ying wrote:
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 9/2/22 2:34 PM, Huang, Ying wrote:
>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>>>>> Wei Xu <weixugc@google.com> writes:
>>>>>>>
>>>>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>>>>
>>>>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>>>>> driver core convention.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>>>>
>>>>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>>>>> isn't a subsystem.
>>>>>>>>>>>
>>>>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>>>>
>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>>>>
>>>>>>>>>> I think we should add
>>>>>>>>>>
>>>>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>>>>
>>>>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>>>>> convention.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>>>>
>>>>>>>>> -aneesh
>>>>>>>>
>>>>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>>>>
>>>>>>>> ls -1 /sys/bus/nd/devices
>>>>>>>>
>>>>>>>> namespace2.0
>>>>>>>> namespace3.0
>>>>>>>> ndbus0
>>>>>>>> nmem0
>>>>>>>> nmem1
>>>>>>>> region0
>>>>>>>> region1
>>>>>>>> region2
>>>>>>>> region3
>>>>>>>>
>>>>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>>>>> related interfaces within a single top directory.
>>>>>>>
>>>>>>> Thanks for pointing this out.  My original understanding of driver core
>>>>>>> isn't correct.
>>>>>>>
>>>>>>> But I still think it's better to separate instead of mixing memory_tier
>>>>>>> and memory_type.  Per my understanding, memory_type shows information
>>>>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>>>>> types describes the physical characteristics, while memory tier reflects
>>>>>>> the policy.
>>>>>>>
>>>>>>
>>>>>> The latency and bandwidth details are already exposed via 
>>>>>>
>>>>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>>>>
>>>>>> Documentation/admin-guide/mm/numaperf.rst
>>>>>>
>>>>>> That is the interface that libraries like libmemkind will look at for finding
>>>>>> details w.r.t latency/bandwidth
>>>>>
>>>>> Yes.  Only with that, it's still inconvenient to find out which nodes
>>>>> belong to same memory type (has same performance, same topology, managed
>>>>> by same driver, etc).  So memory types can still provide useful
>>>>> information even without memory tiering.
>>>>>
>>>>
>>>> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
>>>> "memory_tiering" so that all memory tiering related information can be consolidated there.
>>>> I guess you agreed to the above part that we can consolidated things like that. 
>>>
>>> I just prefer to separate memory_tier and memory_type sysfs directories
>>> personally.  Because memory_type describes the physical memory types and
>>> performance, while memory_tier is more about the policy to group
>>> memory_types.
>>>
>> IMHO we can decide on that based on why we end up adding memory_type details to sysfs. If that
>> is only for memory tier modification from userspace we can look at adding that in the memory tiering
>> sysfs hierarchy. 
>>
>> Also since we have precedence of consolidating things within a sysfs hierarchy as explained in previous emails,
>> I think we should keep "memory_tiering" as sysfs subsystem name? I hope we can get an agreement on that
>> for now?
> 
> I prefer to separate memory_tier and memory_type, so the subsystem name
> should be "memory_tier".  You prefer to consolidate memory_tier and
> memory_type, so the subsystem name should be "memory_tiering".
> 
> The main reason behind my idea is that memory_type isn't tied with
> memory tiering directly.  It describes some hardware property.  Even if
> we don't use memory tiering, we can still use that to classify the
> memory devices in the system.
> 
> Why do you want to consolidate them?  To reduce one directory from
> sysfs?
> 

So that it is much intuitive for user to got to memory_tiering sysfs hierarchy
to change the memory tier levels. As I mentioned earlier the reason for consolidating things
is to accommodate the possibility of supporting changing abstract distance of a memory type
so that we can change the memory tier assignment of that specific memory type. I don't
see any other reason we would want to expose memory type to userspace as of now.



> I want to get opinions from other people too.
> 
> Best Regards,
> Huang, Ying

-aneesh
Huang, Ying Sept. 5, 2022, 5:53 a.m. UTC | #22
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/5/22 10:43 AM, Huang, Ying wrote:
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> On 9/5/22 7:22 AM, Huang, Ying wrote:
>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> On 9/2/22 2:34 PM, Huang, Ying wrote:
>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>>>>>> Wei Xu <weixugc@google.com> writes:
>>>>>>>>
>>>>>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>>>>>> driver core convention.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>>>>>
>>>>>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>>>>>> isn't a subsystem.
>>>>>>>>>>>>
>>>>>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>>>>>
>>>>>>>>>>> I think we should add
>>>>>>>>>>>
>>>>>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>>>>>
>>>>>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>>>>>> convention.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>>>>>
>>>>>>>>>> -aneesh
>>>>>>>>>
>>>>>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>>>>>
>>>>>>>>> ls -1 /sys/bus/nd/devices
>>>>>>>>>
>>>>>>>>> namespace2.0
>>>>>>>>> namespace3.0
>>>>>>>>> ndbus0
>>>>>>>>> nmem0
>>>>>>>>> nmem1
>>>>>>>>> region0
>>>>>>>>> region1
>>>>>>>>> region2
>>>>>>>>> region3
>>>>>>>>>
>>>>>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>>>>>> related interfaces within a single top directory.
>>>>>>>>
>>>>>>>> Thanks for pointing this out.  My original understanding of driver core
>>>>>>>> isn't correct.
>>>>>>>>
>>>>>>>> But I still think it's better to separate instead of mixing memory_tier
>>>>>>>> and memory_type.  Per my understanding, memory_type shows information
>>>>>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>>>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>>>>>> types describes the physical characteristics, while memory tier reflects
>>>>>>>> the policy.
>>>>>>>>
>>>>>>>
>>>>>>> The latency and bandwidth details are already exposed via 
>>>>>>>
>>>>>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>>>>>
>>>>>>> Documentation/admin-guide/mm/numaperf.rst
>>>>>>>
>>>>>>> That is the interface that libraries like libmemkind will look at for finding
>>>>>>> details w.r.t latency/bandwidth
>>>>>>
>>>>>> Yes.  Only with that, it's still inconvenient to find out which nodes
>>>>>> belong to same memory type (has same performance, same topology, managed
>>>>>> by same driver, etc).  So memory types can still provide useful
>>>>>> information even without memory tiering.
>>>>>>
>>>>>
>>>>> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
>>>>> "memory_tiering" so that all memory tiering related information can be consolidated there.
>>>>> I guess you agreed to the above part that we can consolidated things like that. 
>>>>
>>>> I just prefer to separate memory_tier and memory_type sysfs directories
>>>> personally.  Because memory_type describes the physical memory types and
>>>> performance, while memory_tier is more about the policy to group
>>>> memory_types.
>>>>
>>> IMHO we can decide on that based on why we end up adding memory_type details to sysfs. If that
>>> is only for memory tier modification from userspace we can look at adding that in the memory tiering
>>> sysfs hierarchy. 
>>>
>>> Also since we have precedence of consolidating things within a sysfs hierarchy as explained in previous emails,
>>> I think we should keep "memory_tiering" as sysfs subsystem name? I hope we can get an agreement on that
>>> for now?
>> 
>> I prefer to separate memory_tier and memory_type, so the subsystem name
>> should be "memory_tier".  You prefer to consolidate memory_tier and
>> memory_type, so the subsystem name should be "memory_tiering".
>> 
>> The main reason behind my idea is that memory_type isn't tied with
>> memory tiering directly.  It describes some hardware property.  Even if
>> we don't use memory tiering, we can still use that to classify the
>> memory devices in the system.
>> 
>> Why do you want to consolidate them?  To reduce one directory from
>> sysfs?
>> 
>
> So that it is much intuitive for user to got to memory_tiering sysfs hierarchy
> to change the memory tier levels. As I mentioned earlier the reason for consolidating things
> is to accommodate the possibility of supporting changing abstract distance of a memory type
> so that we can change the memory tier assignment of that specific
> memory type.

If we put memory_tier and memory_type into 2 directories, it will be
much harder to change the abstract distance of a memory_type?

> I don't see any other reason we would want to expose memory type to
> userspace as of now.

Just like we expose the device tree to the user space via sysfs.  Memory
types are used to describe some hardware property directly.  Users need
these hardware information to manage their system.

Best Regards,
Huang, Ying

>> I want to get opinions from other people too.
>> 
>> Best Regards,
>> Huang, Ying
>
> -aneesh
Aneesh Kumar K.V Sept. 5, 2022, 6:14 a.m. UTC | #23
On 9/5/22 11:23 AM, Huang, Ying wrote:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 9/5/22 10:43 AM, Huang, Ying wrote:
>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> On 9/5/22 7:22 AM, Huang, Ying wrote:
>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> On 9/2/22 2:34 PM, Huang, Ying wrote:
>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>>>>>>> Wei Xu <weixugc@google.com> writes:
>>>>>>>>>
>>>>>>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>>>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>>>>>>> driver core convention.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>>>>>>> isn't a subsystem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>>>>>>
>>>>>>>>>>>> I think we should add
>>>>>>>>>>>>
>>>>>>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>>>>>>
>>>>>>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>>>>>>> convention.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>>>>>>
>>>>>>>>>>> -aneesh
>>>>>>>>>>
>>>>>>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>>>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>>>>>>
>>>>>>>>>> ls -1 /sys/bus/nd/devices
>>>>>>>>>>
>>>>>>>>>> namespace2.0
>>>>>>>>>> namespace3.0
>>>>>>>>>> ndbus0
>>>>>>>>>> nmem0
>>>>>>>>>> nmem1
>>>>>>>>>> region0
>>>>>>>>>> region1
>>>>>>>>>> region2
>>>>>>>>>> region3
>>>>>>>>>>
>>>>>>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>>>>>>> related interfaces within a single top directory.
>>>>>>>>>
>>>>>>>>> Thanks for pointing this out.  My original understanding of driver core
>>>>>>>>> isn't correct.
>>>>>>>>>
>>>>>>>>> But I still think it's better to separate instead of mixing memory_tier
>>>>>>>>> and memory_type.  Per my understanding, memory_type shows information
>>>>>>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>>>>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>>>>>>> types describes the physical characteristics, while memory tier reflects
>>>>>>>>> the policy.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The latency and bandwidth details are already exposed via 
>>>>>>>>
>>>>>>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>>>>>>
>>>>>>>> Documentation/admin-guide/mm/numaperf.rst
>>>>>>>>
>>>>>>>> That is the interface that libraries like libmemkind will look at for finding
>>>>>>>> details w.r.t latency/bandwidth
>>>>>>>
>>>>>>> Yes.  Only with that, it's still inconvenient to find out which nodes
>>>>>>> belong to same memory type (has same performance, same topology, managed
>>>>>>> by same driver, etc).  So memory types can still provide useful
>>>>>>> information even without memory tiering.
>>>>>>>
>>>>>>
>>>>>> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
>>>>>> "memory_tiering" so that all memory tiering related information can be consolidated there.
>>>>>> I guess you agreed to the above part that we can consolidated things like that. 
>>>>>
>>>>> I just prefer to separate memory_tier and memory_type sysfs directories
>>>>> personally.  Because memory_type describes the physical memory types and
>>>>> performance, while memory_tier is more about the policy to group
>>>>> memory_types.
>>>>>
>>>> IMHO we can decide on that based on why we end up adding memory_type details to sysfs. If that
>>>> is only for memory tier modification from userspace we can look at adding that in the memory tiering
>>>> sysfs hierarchy. 
>>>>
>>>> Also since we have precedence of consolidating things within a sysfs hierarchy as explained in previous emails,
>>>> I think we should keep "memory_tiering" as sysfs subsystem name? I hope we can get an agreement on that
>>>> for now?
>>>
>>> I prefer to separate memory_tier and memory_type, so the subsystem name
>>> should be "memory_tier".  You prefer to consolidate memory_tier and
>>> memory_type, so the subsystem name should be "memory_tiering".
>>>
>>> The main reason behind my idea is that memory_type isn't tied with
>>> memory tiering directly.  It describes some hardware property.  Even if
>>> we don't use memory tiering, we can still use that to classify the
>>> memory devices in the system.
>>>
>>> Why do you want to consolidate them?  To reduce one directory from
>>> sysfs?
>>>
>>
>> So that it is much intuitive for user to got to memory_tiering sysfs hierarchy
>> to change the memory tier levels. As I mentioned earlier the reason for consolidating things
>> is to accommodate the possibility of supporting changing abstract distance of a memory type
>> so that we can change the memory tier assignment of that specific
>> memory type.
> 
> If we put memory_tier and memory_type into 2 directories, it will be
> much harder to change the abstract distance of a memory_type?
> 

I did explain I believe it is more intuitive to manage memory tier levels within
memory tiering sysfs hierarchy. You seems to be ignoring my explanation in these emails. 


>> I don't see any other reason we would want to expose memory type to
>> userspace as of now.
> 
> Just like we expose the device tree to the user space via sysfs.  Memory
> types are used to describe some hardware property directly.  Users need
> these hardware information to manage their system.
> 

Again explained in earlier emails already, I don't see a reason to duplicate
attribute already present in /sys/devices/system/node/nodeY/access0/initiators/.
Only reason we might end up adding memory type to sysfs is to manage memory tier levels.
Hence the suggestion to consolidate things memory tiering directory.

-aneesh
Huang, Ying Sept. 5, 2022, 6:24 a.m. UTC | #24
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:

> On 9/5/22 11:23 AM, Huang, Ying wrote:
>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>> 
>>> On 9/5/22 10:43 AM, Huang, Ying wrote:
>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> On 9/5/22 7:22 AM, Huang, Ying wrote:
>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> On 9/2/22 2:34 PM, Huang, Ying wrote:
>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> On 9/2/22 1:27 PM, Huang, Ying wrote:
>>>>>>>>>> Wei Xu <weixugc@google.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 1, 2022 at 11:44 PM Aneesh Kumar K V
>>>>>>>>>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 9/2/22 12:10 PM, Huang, Ying wrote:
>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 9/2/22 11:42 AM, Huang, Ying wrote:
>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 9/2/22 11:10 AM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 9/2/22 10:39 AM, Wei Xu wrote:
>>>>>>>>>>>>>>>>>>> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying <ying.huang@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 9/1/22 12:31 PM, Huang, Ying wrote:
>>>>>>>>>>>>>>>>>>>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memory tier
>>>>>>>>>>>>>>>>>>>>>>> related details can be found. All allocated memory tiers will be listed
>>>>>>>>>>>>>>>>>>>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The nodes which are part of a specific memory tier can be listed via
>>>>>>>>>>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think "memory_tier" is a better subsystem/bus name than
>>>>>>>>>>>>>>>>>>>>>> memory_tiering.  Because we have a set of memory_tierN devices inside.
>>>>>>>>>>>>>>>>>>>>>> "memory_tier" sounds more natural.  I know this is subjective, just my
>>>>>>>>>>>>>>>>>>>>>> preference.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I missed replying to this earlier. I will keep memory_tiering as subsystem name in v4
>>>>>>>>>>>>>>>>>> because we would want it to a susbsystem where all memory tiering related details can be found
>>>>>>>>>>>>>>>>>> including memory type in the future. This is as per discussion
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=r-io3gkX7gorUunS2UfstudCWuihrA=0g@mail.gmail.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't think that it's a good idea to mix 2 types of devices in one
>>>>>>>>>>>>>>>>> subsystem (bus).  If my understanding were correct, that breaks the
>>>>>>>>>>>>>>>>> driver core convention.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All these are virtual devices .I am not sure i follow what you mean by 2 types of devices.
>>>>>>>>>>>>>>>> memory_tiering is a subsystem that represents all the details w.r.t memory tiering. It shows
>>>>>>>>>>>>>>>> details of memory tiers and can possibly contain details of different memory types .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> IMHO, memory_tier and memory_type are 2 kind of devices.  They have
>>>>>>>>>>>>>>> almost totally different attributes (sysfs file).  So, we should create
>>>>>>>>>>>>>>> 2 buses for them.  Each has its own attribute group.  "virtual" itself
>>>>>>>>>>>>>>> isn't a subsystem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Considering both the details are related to memory tiering, wouldn't it be much simpler we consolidate
>>>>>>>>>>>>>> them within the same subdirectory? I am still not clear why you are suggesting they need to be in different
>>>>>>>>>>>>>> sysfs hierarchy.  It doesn't break any driver core convention as you mentioned earlier.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_tierN
>>>>>>>>>>>>>> /sys/devices/virtual/memory_tiering/memory_typeN
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we should add
>>>>>>>>>>>>>
>>>>>>>>>>>>>  /sys/devices/virtual/memory_tier/memory_tierN
>>>>>>>>>>>>>  /sys/devices/virtual/memory_type/memory_typeN
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to find if there is a technical reason to do the same?
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think this is complex.  Devices of same bus/subsystem should
>>>>>>>>>>>>> have mostly same attributes.  This is my understanding of driver core
>>>>>>>>>>>>> convention.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I was not looking at this from code complexity point. Instead of having multiple directories
>>>>>>>>>>>> with details w.r.t memory tiering, I was looking at consolidating the details
>>>>>>>>>>>> within the directory /sys/devices/virtual/memory_tiering. (similar to all virtual devices
>>>>>>>>>>>> are consolidated within /sys/devics/virtual/).
>>>>>>>>>>>>
>>>>>>>>>>>> -aneesh
>>>>>>>>>>>
>>>>>>>>>>> Here is an example of /sys/bus/nd/devices (I know it is not under
>>>>>>>>>>> /sys/devices/virtual, but it can still serve as a reference):
>>>>>>>>>>>
>>>>>>>>>>> ls -1 /sys/bus/nd/devices
>>>>>>>>>>>
>>>>>>>>>>> namespace2.0
>>>>>>>>>>> namespace3.0
>>>>>>>>>>> ndbus0
>>>>>>>>>>> nmem0
>>>>>>>>>>> nmem1
>>>>>>>>>>> region0
>>>>>>>>>>> region1
>>>>>>>>>>> region2
>>>>>>>>>>> region3
>>>>>>>>>>>
>>>>>>>>>>> So I think it is not unreasonable if we want to group memory tiering
>>>>>>>>>>> related interfaces within a single top directory.
>>>>>>>>>>
>>>>>>>>>> Thanks for pointing this out.  My original understanding of driver core
>>>>>>>>>> isn't correct.
>>>>>>>>>>
>>>>>>>>>> But I still think it's better to separate instead of mixing memory_tier
>>>>>>>>>> and memory_type.  Per my understanding, memory_type shows information
>>>>>>>>>> (abstract distance, latency, bandwidth, etc.) of memory types (and
>>>>>>>>>> nodes), it can be useful even without memory tiers.  That is, memory
>>>>>>>>>> types describes the physical characteristics, while memory tier reflects
>>>>>>>>>> the policy.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The latency and bandwidth details are already exposed via 
>>>>>>>>>
>>>>>>>>> 	/sys/devices/system/node/nodeY/access0/initiators/
>>>>>>>>>
>>>>>>>>> Documentation/admin-guide/mm/numaperf.rst
>>>>>>>>>
>>>>>>>>> That is the interface that libraries like libmemkind will look at for finding
>>>>>>>>> details w.r.t latency/bandwidth
>>>>>>>>
>>>>>>>> Yes.  Only with that, it's still inconvenient to find out which nodes
>>>>>>>> belong to same memory type (has same performance, same topology, managed
>>>>>>>> by same driver, etc).  So memory types can still provide useful
>>>>>>>> information even without memory tiering.
>>>>>>>>
>>>>>>>
>>>>>>> I am not sure i quiet follow what to conclude from your reply. I used the subsystem name
>>>>>>> "memory_tiering" so that all memory tiering related information can be consolidated there.
>>>>>>> I guess you agreed to the above part that we can consolidated things like that. 
>>>>>>
>>>>>> I just prefer to separate memory_tier and memory_type sysfs directories
>>>>>> personally.  Because memory_type describes the physical memory types and
>>>>>> performance, while memory_tier is more about the policy to group
>>>>>> memory_types.
>>>>>>
>>>>> IMHO we can decide on that based on why we end up adding memory_type details to sysfs. If that
>>>>> is only for memory tier modification from userspace we can look at adding that in the memory tiering
>>>>> sysfs hierarchy. 
>>>>>
>>>>> Also since we have precedence of consolidating things within a sysfs hierarchy as explained in previous emails,
>>>>> I think we should keep "memory_tiering" as sysfs subsystem name? I hope we can get an agreement on that
>>>>> for now?
>>>>
>>>> I prefer to separate memory_tier and memory_type, so the subsystem name
>>>> should be "memory_tier".  You prefer to consolidate memory_tier and
>>>> memory_type, so the subsystem name should be "memory_tiering".
>>>>
>>>> The main reason behind my idea is that memory_type isn't tied with
>>>> memory tiering directly.  It describes some hardware property.  Even if
>>>> we don't use memory tiering, we can still use that to classify the
>>>> memory devices in the system.
>>>>
>>>> Why do you want to consolidate them?  To reduce one directory from
>>>> sysfs?
>>>>
>>>
>>> So that it is much intuitive for user to got to memory_tiering sysfs hierarchy
>>> to change the memory tier levels. As I mentioned earlier the reason for consolidating things
>>> is to accommodate the possibility of supporting changing abstract distance of a memory type
>>> so that we can change the memory tier assignment of that specific
>>> memory type.
>> 
>> If we put memory_tier and memory_type into 2 directories, it will be
>> much harder to change the abstract distance of a memory_type?
>> 
>
> I did explain I believe it is more intuitive to manage memory tier levels within
> memory tiering sysfs hierarchy. You seems to be ignoring my explanation in these emails. 

I don't want to ignore your explanation.  We just have different
opinion.  You think that it is more intuitive to put them in one
hierarchy, while I think that it's more clear to separate them to
reflect their difference.

>
>>> I don't see any other reason we would want to expose memory type to
>>> userspace as of now.
>> 
>> Just like we expose the device tree to the user space via sysfs.  Memory
>> types are used to describe some hardware property directly.  Users need
>> these hardware information to manage their system.
>> 
>
> Again explained in earlier emails already, I don't see a reason to duplicate
> attribute already present in /sys/devices/system/node/nodeY/access0/initiators/.
> Only reason we might end up adding memory type to sysfs is to manage memory tier levels.
> Hence the suggestion to consolidate things memory tiering directory.

I believe that we can provide useful information via memory_type in
addition to memory tier related stuff.  For example, we can know the
memory type of each node, the driver to manage the memory type, name of
memory type provided by driver, raw performance, etc.

Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-memory-tiers b/Documentation/ABI/testing/sysfs-kernel-mm-memory-tiers
new file mode 100644
index 000000000000..55051fcf5502
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-memory-tiers
@@ -0,0 +1,35 @@ 
+What:		/sys/devices/virtual/memory_tiering/
+Date:		August 2022
+Contact:	Linux memory management mailing list <linux-mm@kvack.org>
+Description:	A collection of all the memory tiers allocated.
+
+		Individual memory tier details are contained in subdirectories
+		named by the abstract distance of the memory tier.
+
+		/sys/devices/virtual/memory_tiering/memory_tierN/
+
+
+What:		/sys/devices/virtual/memory_tiering/memory_tierN/
+		/sys/devices/virtual/memory_tiering/memory_tierN/nodes
+Date:		August 2022
+Contact:	Linux memory management mailing list <linux-mm@kvack.org>
+Description:	Directory with details of a specific memory tier
+
+		This is the directory containing information about a particular
+		memory tier, memtierN, where N is derived based on abstract distance.
+
+		A smaller value of N implies a higher (faster) memory tier in the
+		hierarchy.
+
+		nodes: NUMA nodes that are part of this memory tier.
+
+
+What:		/sys/devices/virtual/memory_tiering/toptier_nodes
+Date:		August 2022
+Contact:	Linux memory management mailing list <linux-mm@kvack.org>
+Description:	Toptier node mask
+
+		A toptier is defined as the memory tier from which memory promotion
+		is not done by the kernel.
+
+		toptier_nodes: Union of NUMA nodes that are part of each toptier.
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index c82eb0111383..33673ed9b3dc 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -19,6 +19,7 @@  struct memory_tier {
 	 * adistance_start .. adistance_start + MEMTIER_CHUNK_SIZE
 	 */
 	int adistance_start;
+	struct device dev;
 	/* All the nodes that are part of all the lower memory tiers. */
 	nodemask_t lower_tier_mask;
 };
@@ -36,6 +37,12 @@  static DEFINE_MUTEX(memory_tier_lock);
 static LIST_HEAD(memory_tiers);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 static struct memory_dev_type *default_dram_type;
+
+static struct bus_type memory_tier_subsys = {
+	.name = "memory_tiering",
+	.dev_name = "memory_tier",
+};
+
 #ifdef CONFIG_MIGRATION
 static int top_tier_adistance;
 /*
@@ -98,8 +105,63 @@  static int top_tier_adistance;
 static struct demotion_nodes *node_demotion __read_mostly;
 #endif /* CONFIG_MIGRATION */
 
+static inline struct memory_tier *to_memory_tier(struct device *device)
+{
+	return container_of(device, struct memory_tier, dev);
+}
+
+static __always_inline nodemask_t get_memtier_nodemask(struct memory_tier *memtier)
+{
+	nodemask_t nodes = NODE_MASK_NONE;
+	struct memory_dev_type *memtype;
+
+	list_for_each_entry(memtype, &memtier->memory_types, tier_sibiling)
+		nodes_or(nodes, nodes, memtype->nodes);
+
+	return nodes;
+}
+
+static void memory_tier_device_release(struct device *dev)
+{
+	struct memory_tier *tier = to_memory_tier(dev);
+	/*
+	 * synchronize_rcu in clear_node_memory_tier makes sure
+	 * we don't have rcu access to this memory tier.
+	 */
+	kfree(tier);
+}
+
+static ssize_t nodes_show(struct device *dev,
+			  struct device_attribute *attr, char *buf)
+{
+	int ret;
+	nodemask_t nmask;
+
+	mutex_lock(&memory_tier_lock);
+	nmask = get_memtier_nodemask(to_memory_tier(dev));
+	ret = sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&nmask));
+	mutex_unlock(&memory_tier_lock);
+	return ret;
+}
+static DEVICE_ATTR_RO(nodes);
+
+static struct attribute *memtier_dev_attrs[] = {
+	&dev_attr_nodes.attr,
+	NULL
+};
+
+static const struct attribute_group memtier_dev_group = {
+	.attrs = memtier_dev_attrs,
+};
+
+static const struct attribute_group *memtier_dev_groups[] = {
+	&memtier_dev_group,
+	NULL
+};
+
 static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memtype)
 {
+	int ret;
 	bool found_slot = false;
 	struct memory_tier *memtier, *new_memtier;
 	int adistance = memtype->adistance;
@@ -123,15 +185,14 @@  static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memty
 
 	list_for_each_entry(memtier, &memory_tiers, list) {
 		if (adistance == memtier->adistance_start) {
-			list_add(&memtype->tier_sibiling, &memtier->memory_types);
-			return memtier;
+			goto link_memtype;
 		} else if (adistance < memtier->adistance_start) {
 			found_slot = true;
 			break;
 		}
 	}
 
-	new_memtier = kmalloc(sizeof(struct memory_tier), GFP_KERNEL);
+	new_memtier = kzalloc(sizeof(struct memory_tier), GFP_KERNEL);
 	if (!new_memtier)
 		return ERR_PTR(-ENOMEM);
 
@@ -142,8 +203,23 @@  static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memty
 		list_add_tail(&new_memtier->list, &memtier->list);
 	else
 		list_add_tail(&new_memtier->list, &memory_tiers);
-	list_add(&memtype->tier_sibiling, &new_memtier->memory_types);
-	return new_memtier;
+
+	new_memtier->dev.id = adistance >> MEMTIER_CHUNK_BITS;
+	new_memtier->dev.bus = &memory_tier_subsys;
+	new_memtier->dev.release = memory_tier_device_release;
+	new_memtier->dev.groups = memtier_dev_groups;
+
+	ret = device_register(&new_memtier->dev);
+	if (ret) {
+		list_del(&memtier->list);
+		put_device(&memtier->dev);
+		return ERR_PTR(ret);
+	}
+	memtier = new_memtier;
+
+link_memtype:
+	list_add(&memtype->tier_sibiling, &memtier->memory_types);
+	return memtier;
 }
 
 static struct memory_tier *__node_get_memory_tier(int node)
@@ -275,17 +351,6 @@  static void disable_all_demotion_targets(void)
 	synchronize_rcu();
 }
 
-static __always_inline nodemask_t get_memtier_nodemask(struct memory_tier *memtier)
-{
-	nodemask_t nodes = NODE_MASK_NONE;
-	struct memory_dev_type *memtype;
-
-	list_for_each_entry(memtype, &memtier->memory_types, tier_sibiling)
-		nodes_or(nodes, nodes, memtype->nodes);
-
-	return nodes;
-}
-
 /*
  * Find an automatic demotion target for all memory
  * nodes. Failing here is OK.  It might just indicate
@@ -433,11 +498,7 @@  static struct memory_tier *set_node_memory_tier(int node)
 static void destroy_memory_tier(struct memory_tier *memtier)
 {
 	list_del(&memtier->list);
-	/*
-	 * synchronize_rcu in clear_node_memory_tier makes sure
-	 * we don't have rcu access to this memory tier.
-	 */
-	kfree(memtier);
+	device_unregister(&memtier->dev);
 }
 
 static bool clear_node_memory_tier(int node)
@@ -564,11 +625,60 @@  static int __meminit memtier_hotplug_callback(struct notifier_block *self,
 	return notifier_from_errno(0);
 }
 
+#ifdef CONFIG_MIGRATION
+static ssize_t toptier_nodes_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	int ret;
+	nodemask_t nmask, top_tier_mask = NODE_MASK_NONE;
+	struct memory_tier *memtier = to_memory_tier(dev);
+
+	mutex_lock(&memory_tier_lock);
+	list_for_each_entry(memtier, &memory_tiers, list) {
+		if (memtier->adistance_start > top_tier_adistance)
+			break;
+		nmask = get_memtier_nodemask(memtier);
+		nodes_or(top_tier_mask, top_tier_mask, nmask);
+	}
+
+	ret = sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&top_tier_mask));
+	mutex_unlock(&memory_tier_lock);
+	return ret;
+}
+#else
+static ssize_t toptier_nodes_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	nodemask_t top_tier_mask = node_states[N_MEMORY];
+
+	return sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&top_tier_mask));
+}
+#endif
+static DEVICE_ATTR_RO(toptier_nodes);
+
+static struct attribute *memtier_subsys_attrs[] = {
+	&dev_attr_toptier_nodes.attr,
+	NULL
+};
+
+static const struct attribute_group memtier_subsys_group = {
+	.attrs = memtier_subsys_attrs,
+};
+
+static const struct attribute_group *memtier_subsys_groups[] = {
+	&memtier_subsys_group,
+	NULL
+};
+
 static int __init memory_tier_init(void)
 {
-	int node;
+	int ret, node;
 	struct memory_tier *memtier;
 
+	ret = subsys_virtual_register(&memory_tier_subsys, memtier_subsys_groups);
+	if (ret)
+		panic("%s() failed to register memory tier subsystem\n", __func__);
+
 #ifdef CONFIG_MIGRATION
 	node_demotion = kcalloc(nr_node_ids, sizeof(struct demotion_nodes),
 				GFP_KERNEL);