Message ID | 20190501140438.9506-1-brian.welty@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | cgroup support for GPU devices | expand |
On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote: > In containerized or virtualized environments, there is desire to have > controls in place for resources that can be consumed by users of a GPU > device. This RFC patch series proposes a framework for integrating > use of existing cgroup controllers into device drivers. > The i915 driver is updated in this series as our primary use case to > leverage this framework and to serve as an example for discussion. > > The patch series enables device drivers to use cgroups to control the > following resources within a GPU (or other accelerator device): > * control allocation of device memory (reuse of memcg) Count us (Mellanox) too, our RDMA devices are exposing special and limited in size device memory to the users and we would like to provide an option to use cgroup to control its exposure. > and with future work, we could extend to: > * track and control share of GPU time (reuse of cpu/cpuacct) > * apply mask of allowed execution engines (reuse of cpusets) > > Instead of introducing a new cgroup subsystem for GPU devices, a new > framework is proposed to allow devices to register with existing cgroup > controllers, which creates per-device cgroup_subsys_state within the > cgroup. This gives device drivers their own private cgroup controls > (such as memory limits or other parameters) to be applied to device > resources instead of host system resources. > Device drivers (GPU or other) are then able to reuse the existing cgroup > controls, instead of inventing similar ones. > > Per-device controls would be exposed in cgroup filesystem as: > mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > such as (for example): > mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > > The drm/i915 patch in this series is based on top of other RFC work [1] > for i915 device memory support. > > AMD [2] and Intel [3] have proposed related work in this area within the > last few years, listed below as reference. This new RFC reuses existing > cgroup controllers and takes a different approach than prior work. > > Finally, some potential discussion points for this series: > * merge proposed <subsys_name>.devices into a single devices directory? > * allow devices to have multiple registrations for subsets of resources? > * document a 'common charging policy' for device drivers to follow? > > [1] https://patchwork.freedesktop.org/series/56683/ > [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > > > Brian Welty (5): > cgroup: Add cgroup_subsys per-device registration framework > cgroup: Change kernfs_node for directories to store > cgroup_subsys_state > memcg: Add per-device support to memory cgroup subsystem > drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > drm/i915: Use memory cgroup for enforcing device memory limit > > drivers/gpu/drm/drm_drv.c | 12 + > drivers/gpu/drm/drm_gem.c | 7 + > drivers/gpu/drm/i915/i915_drv.c | 2 +- > drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > include/drm/drm_device.h | 3 + > include/drm/drm_drv.h | 8 + > include/drm/drm_gem.h | 11 + > include/linux/cgroup-defs.h | 28 ++ > include/linux/cgroup.h | 3 + > include/linux/memcontrol.h | 10 + > kernel/cgroup/cgroup-v1.c | 10 +- > kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > mm/memcontrol.c | 183 +++++++++++- > 13 files changed, 552 insertions(+), 59 deletions(-) > > -- > 2.21.0 >
> Count us (Mellanox) too, our RDMA devices are exposing special and > limited in size device memory to the users and we would like to provide > an option to use cgroup to control its exposure. Doesn't RDMA already has a separate cgroup? Why not implement it there? > > and with future work, we could extend to: > > * track and control share of GPU time (reuse of cpu/cpuacct) > > * apply mask of allowed execution engines (reuse of cpusets) > > > > Instead of introducing a new cgroup subsystem for GPU devices, a new > > framework is proposed to allow devices to register with existing cgroup > > controllers, which creates per-device cgroup_subsys_state within the > > cgroup. This gives device drivers their own private cgroup controls > > (such as memory limits or other parameters) to be applied to device > > resources instead of host system resources. > > Device drivers (GPU or other) are then able to reuse the existing cgroup > > controls, instead of inventing similar ones. > > > > Per-device controls would be exposed in cgroup filesystem as: > > mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > > such as (for example): > > mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > > mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > > mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > > mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > > > > The drm/i915 patch in this series is based on top of other RFC work [1] > > for i915 device memory support. > > > > AMD [2] and Intel [3] have proposed related work in this area within the > > last few years, listed below as reference. This new RFC reuses existing > > cgroup controllers and takes a different approach than prior work. > > > > Finally, some potential discussion points for this series: > > * merge proposed <subsys_name>.devices into a single devices directory? > > * allow devices to have multiple registrations for subsets of resources? > > * document a 'common charging policy' for device drivers to follow? > > > > [1] https://patchwork.freedesktop.org/series/56683/ > > [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > > [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > > > > > > Brian Welty (5): > > cgroup: Add cgroup_subsys per-device registration framework > > cgroup: Change kernfs_node for directories to store > > cgroup_subsys_state > > memcg: Add per-device support to memory cgroup subsystem > > drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > > drm/i915: Use memory cgroup for enforcing device memory limit > > > > drivers/gpu/drm/drm_drv.c | 12 + > > drivers/gpu/drm/drm_gem.c | 7 + > > drivers/gpu/drm/i915/i915_drv.c | 2 +- > > drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > > include/drm/drm_device.h | 3 + > > include/drm/drm_drv.h | 8 + > > include/drm/drm_gem.h | 11 + > > include/linux/cgroup-defs.h | 28 ++ > > include/linux/cgroup.h | 3 + > > include/linux/memcontrol.h | 10 + > > kernel/cgroup/cgroup-v1.c | 10 +- > > kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > > mm/memcontrol.c | 183 +++++++++++- > > 13 files changed, 552 insertions(+), 59 deletions(-) > > > > -- > > 2.21.0 > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 5/2/2019 3:48 PM, Kenny Ho wrote: > On 5/2/2019 1:34 AM, Leon Romanovsky wrote: >> Count us (Mellanox) too, our RDMA devices are exposing special and >> limited in size device memory to the users and we would like to provide >> an option to use cgroup to control its exposure. Hi Leon, great to hear and happy to work with you and RDMA community to shape this framework for use by RDMA devices as well. The intent was to support more than GPU devices. Incidentally, I also wanted to ask about the rdma cgroup controller and if there is interest in updating the device registration implemented in that controller. It could use the cgroup_device_register() that is proposed here. But this is perhaps future work, so can discuss separately. > Doesn't RDMA already has a separate cgroup? Why not implement it there? > Hi Kenny, I can't answer for Leon, but I'm hopeful he agrees with rationale I gave in the cover letter. Namely, to implement in rdma controller, would mean duplicating existing memcg controls there. Is AMD interested in collaborating to help shape this framework? It is intended to be device-neutral, so could be leveraged by various types of devices. If you have an alternative solution well underway, then maybe we can work together to merge our efforts into one. In the end, the DRM community is best served with common solution. > >>> and with future work, we could extend to: >>> * track and control share of GPU time (reuse of cpu/cpuacct) >>> * apply mask of allowed execution engines (reuse of cpusets) >>> >>> Instead of introducing a new cgroup subsystem for GPU devices, a new >>> framework is proposed to allow devices to register with existing cgroup >>> controllers, which creates per-device cgroup_subsys_state within the >>> cgroup. This gives device drivers their own private cgroup controls >>> (such as memory limits or other parameters) to be applied to device >>> resources instead of host system resources. >>> Device drivers (GPU or other) are then able to reuse the existing cgroup >>> controls, instead of inventing similar ones. >>> >>> Per-device controls would be exposed in cgroup filesystem as: >>> mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> >>> such as (for example): >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.max >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.current >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight >>> >>> The drm/i915 patch in this series is based on top of other RFC work [1] >>> for i915 device memory support. >>> >>> AMD [2] and Intel [3] have proposed related work in this area within the >>> last few years, listed below as reference. This new RFC reuses existing >>> cgroup controllers and takes a different approach than prior work. >>> >>> Finally, some potential discussion points for this series: >>> * merge proposed <subsys_name>.devices into a single devices directory? >>> * allow devices to have multiple registrations for subsets of resources? >>> * document a 'common charging policy' for device drivers to follow? >>> >>> [1] https://patchwork.freedesktop.org/series/56683/ >>> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html >>> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html >>> >>> >>> Brian Welty (5): >>> cgroup: Add cgroup_subsys per-device registration framework >>> cgroup: Change kernfs_node for directories to store >>> cgroup_subsys_state >>> memcg: Add per-device support to memory cgroup subsystem >>> drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit >>> drm/i915: Use memory cgroup for enforcing device memory limit >>> >>> drivers/gpu/drm/drm_drv.c | 12 + >>> drivers/gpu/drm/drm_gem.c | 7 + >>> drivers/gpu/drm/i915/i915_drv.c | 2 +- >>> drivers/gpu/drm/i915/intel_memory_region.c | 24 +- >>> include/drm/drm_device.h | 3 + >>> include/drm/drm_drv.h | 8 + >>> include/drm/drm_gem.h | 11 + >>> include/linux/cgroup-defs.h | 28 ++ >>> include/linux/cgroup.h | 3 + >>> include/linux/memcontrol.h | 10 + >>> kernel/cgroup/cgroup-v1.c | 10 +- >>> kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- >>> mm/memcontrol.c | 183 +++++++++++- >>> 13 files changed, 552 insertions(+), 59 deletions(-) >>> >>> -- >>> 2.21.0 >>> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, May 03, 2019 at 02:14:33PM -0700, Welty, Brian wrote: > > On 5/2/2019 3:48 PM, Kenny Ho wrote: > > On 5/2/2019 1:34 AM, Leon Romanovsky wrote: > >> Count us (Mellanox) too, our RDMA devices are exposing special and > >> limited in size device memory to the users and we would like to provide > >> an option to use cgroup to control its exposure. > > Hi Leon, great to hear and happy to work with you and RDMA community > to shape this framework for use by RDMA devices as well. The intent > was to support more than GPU devices. > > Incidentally, I also wanted to ask about the rdma cgroup controller > and if there is interest in updating the device registration implemented > in that controller. It could use the cgroup_device_register() that is > proposed here. But this is perhaps future work, so can discuss separately. I'll try to take a look later this week. > > > > Doesn't RDMA already has a separate cgroup? Why not implement it there? > > > > Hi Kenny, I can't answer for Leon, but I'm hopeful he agrees with rationale > I gave in the cover letter. Namely, to implement in rdma controller, would > mean duplicating existing memcg controls there. Exactly, I didn't feel comfortable to add notion of "device memory" to RDMA cgroup and postponed that decision to later point of time. RDMA operates with verbs objects and all our user space API is based around that concept. At the end, system administrator will have hard time to understand the differences between memcg and RDMA memory. > > Is AMD interested in collaborating to help shape this framework? > It is intended to be device-neutral, so could be leveraged by various > types of devices. > If you have an alternative solution well underway, then maybe > we can work together to merge our efforts into one. > In the end, the DRM community is best served with common solution. > > > > > >>> and with future work, we could extend to: > >>> * track and control share of GPU time (reuse of cpu/cpuacct) > >>> * apply mask of allowed execution engines (reuse of cpusets) > >>> > >>> Instead of introducing a new cgroup subsystem for GPU devices, a new > >>> framework is proposed to allow devices to register with existing cgroup > >>> controllers, which creates per-device cgroup_subsys_state within the > >>> cgroup. This gives device drivers their own private cgroup controls > >>> (such as memory limits or other parameters) to be applied to device > >>> resources instead of host system resources. > >>> Device drivers (GPU or other) are then able to reuse the existing cgroup > >>> controls, instead of inventing similar ones. > >>> > >>> Per-device controls would be exposed in cgroup filesystem as: > >>> mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > >>> such as (for example): > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > >>> > >>> The drm/i915 patch in this series is based on top of other RFC work [1] > >>> for i915 device memory support. > >>> > >>> AMD [2] and Intel [3] have proposed related work in this area within the > >>> last few years, listed below as reference. This new RFC reuses existing > >>> cgroup controllers and takes a different approach than prior work. > >>> > >>> Finally, some potential discussion points for this series: > >>> * merge proposed <subsys_name>.devices into a single devices directory? > >>> * allow devices to have multiple registrations for subsets of resources? > >>> * document a 'common charging policy' for device drivers to follow? > >>> > >>> [1] https://patchwork.freedesktop.org/series/56683/ > >>> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > >>> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > >>> > >>> > >>> Brian Welty (5): > >>> cgroup: Add cgroup_subsys per-device registration framework > >>> cgroup: Change kernfs_node for directories to store > >>> cgroup_subsys_state > >>> memcg: Add per-device support to memory cgroup subsystem > >>> drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > >>> drm/i915: Use memory cgroup for enforcing device memory limit > >>> > >>> drivers/gpu/drm/drm_drv.c | 12 + > >>> drivers/gpu/drm/drm_gem.c | 7 + > >>> drivers/gpu/drm/i915/i915_drv.c | 2 +- > >>> drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > >>> include/drm/drm_device.h | 3 + > >>> include/drm/drm_drv.h | 8 + > >>> include/drm/drm_gem.h | 11 + > >>> include/linux/cgroup-defs.h | 28 ++ > >>> include/linux/cgroup.h | 3 + > >>> include/linux/memcontrol.h | 10 + > >>> kernel/cgroup/cgroup-v1.c | 10 +- > >>> kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > >>> mm/memcontrol.c | 183 +++++++++++- > >>> 13 files changed, 552 insertions(+), 59 deletions(-) > >>> > >>> -- > >>> 2.21.0 > >>> > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Sun, May 5, 2019 at 3:14 AM Leon Romanovsky <leon@kernel.org> wrote: > > > Doesn't RDMA already has a separate cgroup? Why not implement it there? > > > > > > > Hi Kenny, I can't answer for Leon, but I'm hopeful he agrees with rationale > > I gave in the cover letter. Namely, to implement in rdma controller, would > > mean duplicating existing memcg controls there. > > Exactly, I didn't feel comfortable to add notion of "device memory" > to RDMA cgroup and postponed that decision to later point of time. > RDMA operates with verbs objects and all our user space API is based around > that concept. At the end, system administrator will have hard time to > understand the differences between memcg and RDMA memory. Interesting. I actually don't understand this part (I worked in devops/sysadmin side of things but never with rdma.) Don't applications that use rdma require some awareness of rdma (I mean, you mentioned verbs and objects... or do they just use regular malloc for buffer allocation and then send it through some function?) As a user, I would have this question: why do I need to configure some part of rdma resources under rdma cgroup while other part of rdma resources in a different, seemingly unrelated cgroups. I think we need to be careful about drawing the line between duplication and over couplings between subsystems. I have other thoughts and concerns and I will try to organize them into a response in the next few days. Regards, Kenny > > > > Is AMD interested in collaborating to help shape this framework? > > It is intended to be device-neutral, so could be leveraged by various > > types of devices. > > If you have an alternative solution well underway, then maybe > > we can work together to merge our efforts into one. > > In the end, the DRM community is best served with common solution. > > > > > > > > > >>> and with future work, we could extend to: > > >>> * track and control share of GPU time (reuse of cpu/cpuacct) > > >>> * apply mask of allowed execution engines (reuse of cpusets) > > >>> > > >>> Instead of introducing a new cgroup subsystem for GPU devices, a new > > >>> framework is proposed to allow devices to register with existing cgroup > > >>> controllers, which creates per-device cgroup_subsys_state within the > > >>> cgroup. This gives device drivers their own private cgroup controls > > >>> (such as memory limits or other parameters) to be applied to device > > >>> resources instead of host system resources. > > >>> Device drivers (GPU or other) are then able to reuse the existing cgroup > > >>> controls, instead of inventing similar ones. > > >>> > > >>> Per-device controls would be exposed in cgroup filesystem as: > > >>> mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > > >>> such as (for example): > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > > >>> > > >>> The drm/i915 patch in this series is based on top of other RFC work [1] > > >>> for i915 device memory support. > > >>> > > >>> AMD [2] and Intel [3] have proposed related work in this area within the > > >>> last few years, listed below as reference. This new RFC reuses existing > > >>> cgroup controllers and takes a different approach than prior work. > > >>> > > >>> Finally, some potential discussion points for this series: > > >>> * merge proposed <subsys_name>.devices into a single devices directory? > > >>> * allow devices to have multiple registrations for subsets of resources? > > >>> * document a 'common charging policy' for device drivers to follow? > > >>> > > >>> [1] https://patchwork.freedesktop.org/series/56683/ > > >>> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > > >>> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > > >>> > > >>> > > >>> Brian Welty (5): > > >>> cgroup: Add cgroup_subsys per-device registration framework > > >>> cgroup: Change kernfs_node for directories to store > > >>> cgroup_subsys_state > > >>> memcg: Add per-device support to memory cgroup subsystem > > >>> drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > > >>> drm/i915: Use memory cgroup for enforcing device memory limit > > >>> > > >>> drivers/gpu/drm/drm_drv.c | 12 + > > >>> drivers/gpu/drm/drm_gem.c | 7 + > > >>> drivers/gpu/drm/i915/i915_drv.c | 2 +- > > >>> drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > > >>> include/drm/drm_device.h | 3 + > > >>> include/drm/drm_drv.h | 8 + > > >>> include/drm/drm_gem.h | 11 + > > >>> include/linux/cgroup-defs.h | 28 ++ > > >>> include/linux/cgroup.h | 3 + > > >>> include/linux/memcontrol.h | 10 + > > >>> kernel/cgroup/cgroup-v1.c | 10 +- > > >>> kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > > >>> mm/memcontrol.c | 183 +++++++++++- > > >>> 13 files changed, 552 insertions(+), 59 deletions(-) > > >>> > > >>> -- > > >>> 2.21.0 > > >>> > > >> _______________________________________________ > > >> dri-devel mailing list > > >> dri-devel@lists.freedesktop.org > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Sun, May 05, 2019 at 10:21:30AM -0400, Kenny Ho wrote: > On Sun, May 5, 2019 at 3:14 AM Leon Romanovsky <leon@kernel.org> wrote: > > > > Doesn't RDMA already has a separate cgroup? Why not implement it there? > > > > > > > > > > Hi Kenny, I can't answer for Leon, but I'm hopeful he agrees with rationale > > > I gave in the cover letter. Namely, to implement in rdma controller, would > > > mean duplicating existing memcg controls there. > > > > Exactly, I didn't feel comfortable to add notion of "device memory" > > to RDMA cgroup and postponed that decision to later point of time. > > RDMA operates with verbs objects and all our user space API is based around > > that concept. At the end, system administrator will have hard time to > > understand the differences between memcg and RDMA memory. > Interesting. I actually don't understand this part (I worked in > devops/sysadmin side of things but never with rdma.) Don't > applications that use rdma require some awareness of rdma (I mean, you > mentioned verbs and objects... or do they just use regular malloc for > buffer allocation and then send it through some function?) As a user, > I would have this question: why do I need to configure some part of > rdma resources under rdma cgroup while other part of rdma resources in > a different, seemingly unrelated cgroups. We are talking about two different access patterns for this device memory (DM). One is to use this device memory (DM) and second to configure/limit. Usually those actions will be performed by different groups. First group (programmers) is using special API [1] through libibverbs [2] without any notion of cgroups or any limitations. Second group (sysadmins) is less interested in application specifics and for them "device memory" means "memory" and not "rdma, nic specific, internal memory". [1] ibv_alloc_dm() http://man7.org/linux/man-pages/man3/ibv_alloc_dm.3.html https://www.openfabrics.org/images/2018workshop/presentations/304_LLiss_OnDeviceMemory.pdf [2] https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/ > > I think we need to be careful about drawing the line between > duplication and over couplings between subsystems. I have other > thoughts and concerns and I will try to organize them into a response > in the next few days. > > Regards, > Kenny > > > > > > > > Is AMD interested in collaborating to help shape this framework? > > > It is intended to be device-neutral, so could be leveraged by various > > > types of devices. > > > If you have an alternative solution well underway, then maybe > > > we can work together to merge our efforts into one. > > > In the end, the DRM community is best served with common solution. > > > > > > > > > > > > > >>> and with future work, we could extend to: > > > >>> * track and control share of GPU time (reuse of cpu/cpuacct) > > > >>> * apply mask of allowed execution engines (reuse of cpusets) > > > >>> > > > >>> Instead of introducing a new cgroup subsystem for GPU devices, a new > > > >>> framework is proposed to allow devices to register with existing cgroup > > > >>> controllers, which creates per-device cgroup_subsys_state within the > > > >>> cgroup. This gives device drivers their own private cgroup controls > > > >>> (such as memory limits or other parameters) to be applied to device > > > >>> resources instead of host system resources. > > > >>> Device drivers (GPU or other) are then able to reuse the existing cgroup > > > >>> controls, instead of inventing similar ones. > > > >>> > > > >>> Per-device controls would be exposed in cgroup filesystem as: > > > >>> mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > > > >>> such as (for example): > > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > > > >>> > > > >>> The drm/i915 patch in this series is based on top of other RFC work [1] > > > >>> for i915 device memory support. > > > >>> > > > >>> AMD [2] and Intel [3] have proposed related work in this area within the > > > >>> last few years, listed below as reference. This new RFC reuses existing > > > >>> cgroup controllers and takes a different approach than prior work. > > > >>> > > > >>> Finally, some potential discussion points for this series: > > > >>> * merge proposed <subsys_name>.devices into a single devices directory? > > > >>> * allow devices to have multiple registrations for subsets of resources? > > > >>> * document a 'common charging policy' for device drivers to follow? > > > >>> > > > >>> [1] https://patchwork.freedesktop.org/series/56683/ > > > >>> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > > > >>> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > > > >>> > > > >>> > > > >>> Brian Welty (5): > > > >>> cgroup: Add cgroup_subsys per-device registration framework > > > >>> cgroup: Change kernfs_node for directories to store > > > >>> cgroup_subsys_state > > > >>> memcg: Add per-device support to memory cgroup subsystem > > > >>> drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > > > >>> drm/i915: Use memory cgroup for enforcing device memory limit > > > >>> > > > >>> drivers/gpu/drm/drm_drv.c | 12 + > > > >>> drivers/gpu/drm/drm_gem.c | 7 + > > > >>> drivers/gpu/drm/i915/i915_drv.c | 2 +- > > > >>> drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > > > >>> include/drm/drm_device.h | 3 + > > > >>> include/drm/drm_drv.h | 8 + > > > >>> include/drm/drm_gem.h | 11 + > > > >>> include/linux/cgroup-defs.h | 28 ++ > > > >>> include/linux/cgroup.h | 3 + > > > >>> include/linux/memcontrol.h | 10 + > > > >>> kernel/cgroup/cgroup-v1.c | 10 +- > > > >>> kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > > > >>> mm/memcontrol.c | 183 +++++++++++- > > > >>> 13 files changed, 552 insertions(+), 59 deletions(-) > > > >>> > > > >>> -- > > > >>> 2.21.0 > > > >>> > > > >> _______________________________________________ > > > >> dri-devel mailing list > > > >> dri-devel@lists.freedesktop.org > > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
(sent again. Not sure why my previous email was just a reply instead of reply-all.) On Sun, May 5, 2019 at 12:05 PM Leon Romanovsky <leon@kernel.org> wrote: > We are talking about two different access patterns for this device > memory (DM). One is to use this device memory (DM) and second to configure/limit. > Usually those actions will be performed by different groups. > > First group (programmers) is using special API [1] through libibverbs [2] > without any notion of cgroups or any limitations. Second group (sysadmins) > is less interested in application specifics and for them "device memory" means > "memory" and not "rdma, nic specific, internal memory". Um... I am not sure that answered it, especially in the context of cgroup (this is just for my curiosity btw, I don't know much about rdma.) You said sysadmins are less interested in application specifics but then how would they make the judgement call on how much "device memory" is provisioned to one application/container over another (let say you have 5 cgroup sharing an rdma device)? What are the consequences of under provisioning "device memory" to an application? And if they are all just memory, can a sysadmin provision more system memory in place of device memory (like, are they interchangeable)? I guess I am confused because if device memory is just memory (not rdma, nic specific) to sysadmins how would they know to set the right amount? Regards, Kenny > [1] ibv_alloc_dm() > http://man7.org/linux/man-pages/man3/ibv_alloc_dm.3.html > https://www.openfabrics.org/images/2018workshop/presentations/304_LLiss_OnDeviceMemory.pdf > [2] https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/ > > > > > I think we need to be careful about drawing the line between > > duplication and over couplings between subsystems. I have other > > thoughts and concerns and I will try to organize them into a response > > in the next few days. > > > > Regards, > > Kenny > > > > > > > > > > > > Is AMD interested in collaborating to help shape this framework? > > > > It is intended to be device-neutral, so could be leveraged by various > > > > types of devices. > > > > If you have an alternative solution well underway, then maybe > > > > we can work together to merge our efforts into one. > > > > In the end, the DRM community is best served with common solution. > > > > > > > > > > > > > > > > > >>> and with future work, we could extend to: > > > > >>> * track and control share of GPU time (reuse of cpu/cpuacct) > > > > >>> * apply mask of allowed execution engines (reuse of cpusets) > > > > >>> > > > > >>> Instead of introducing a new cgroup subsystem for GPU devices, a new > > > > >>> framework is proposed to allow devices to register with existing cgroup > > > > >>> controllers, which creates per-device cgroup_subsys_state within the > > > > >>> cgroup. This gives device drivers their own private cgroup controls > > > > >>> (such as memory limits or other parameters) to be applied to device > > > > >>> resources instead of host system resources. > > > > >>> Device drivers (GPU or other) are then able to reuse the existing cgroup > > > > >>> controls, instead of inventing similar ones. > > > > >>> > > > > >>> Per-device controls would be exposed in cgroup filesystem as: > > > > >>> mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > > > > >>> such as (for example): > > > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > > > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > > > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > > > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > > > > >>> > > > > >>> The drm/i915 patch in this series is based on top of other RFC work [1] > > > > >>> for i915 device memory support. > > > > >>> > > > > >>> AMD [2] and Intel [3] have proposed related work in this area within the > > > > >>> last few years, listed below as reference. This new RFC reuses existing > > > > >>> cgroup controllers and takes a different approach than prior work. > > > > >>> > > > > >>> Finally, some potential discussion points for this series: > > > > >>> * merge proposed <subsys_name>.devices into a single devices directory? > > > > >>> * allow devices to have multiple registrations for subsets of resources? > > > > >>> * document a 'common charging policy' for device drivers to follow? > > > > >>> > > > > >>> [1] https://patchwork.freedesktop.org/series/56683/ > > > > >>> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > > > > >>> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > > > > >>> > > > > >>> > > > > >>> Brian Welty (5): > > > > >>> cgroup: Add cgroup_subsys per-device registration framework > > > > >>> cgroup: Change kernfs_node for directories to store > > > > >>> cgroup_subsys_state > > > > >>> memcg: Add per-device support to memory cgroup subsystem > > > > >>> drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > > > > >>> drm/i915: Use memory cgroup for enforcing device memory limit > > > > >>> > > > > >>> drivers/gpu/drm/drm_drv.c | 12 + > > > > >>> drivers/gpu/drm/drm_gem.c | 7 + > > > > >>> drivers/gpu/drm/i915/i915_drv.c | 2 +- > > > > >>> drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > > > > >>> include/drm/drm_device.h | 3 + > > > > >>> include/drm/drm_drv.h | 8 + > > > > >>> include/drm/drm_gem.h | 11 + > > > > >>> include/linux/cgroup-defs.h | 28 ++ > > > > >>> include/linux/cgroup.h | 3 + > > > > >>> include/linux/memcontrol.h | 10 + > > > > >>> kernel/cgroup/cgroup-v1.c | 10 +- > > > > >>> kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > > > > >>> mm/memcontrol.c | 183 +++++++++++- > > > > >>> 13 files changed, 552 insertions(+), 59 deletions(-) > > > > >>> > > > > >>> -- > > > > >>> 2.21.0 > > > > >>> > > > > >> _______________________________________________ > > > > >> dri-devel mailing list > > > > >> dri-devel@lists.freedesktop.org > > > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Leon Romanovsky writes: >First group (programmers) is using special API [1] through libibverbs [2] >without any notion of cgroups or any limitations. Second group (sysadmins) >is less interested in application specifics and for them "device memory" means >"memory" and not "rdma, nic specific, internal memory". I'd suggest otherwise, based on historic precedent -- sysadmins are typically very opinionated about operation of the memory subsystem (hence the endless discussions about swap, caching behaviour, etc). Especially in this case, these types of memory operate fundamentally differently and have significantly different performance and availability characteristics. That's not something that can be trivially abstracted over without non-trivial drawbacks.
On Sun, May 05, 2019 at 12:34:16PM -0400, Kenny Ho wrote: > (sent again. Not sure why my previous email was just a reply instead > of reply-all.) > > On Sun, May 5, 2019 at 12:05 PM Leon Romanovsky <leon@kernel.org> wrote: > > We are talking about two different access patterns for this device > > memory (DM). One is to use this device memory (DM) and second to configure/limit. > > Usually those actions will be performed by different groups. > > > > First group (programmers) is using special API [1] through libibverbs [2] > > without any notion of cgroups or any limitations. Second group (sysadmins) > > is less interested in application specifics and for them "device memory" means > > "memory" and not "rdma, nic specific, internal memory". > Um... I am not sure that answered it, especially in the context of > cgroup (this is just for my curiosity btw, I don't know much about > rdma.) You said sysadmins are less interested in application > specifics but then how would they make the judgement call on how much > "device memory" is provisioned to one application/container over > another (let say you have 5 cgroup sharing an rdma device)? What are > the consequences of under provisioning "device memory" to an > application? And if they are all just memory, can a sysadmin > provision more system memory in place of device memory (like, are they > interchangeable)? I guess I am confused because if device memory is > just memory (not rdma, nic specific) to sysadmins how would they know > to set the right amount? One of the immediate usages of this DM that come to my mind is very fast spinlocks for MPI applications. In such case, the amount of DM will be property of network topology in given MPI cluster. In this scenario, precise amount of memory will ensure that all jobs will continue to give maximal performance despite any programmer's error in DM allocation. For under provisioning scenario and if application is written correctly, users will experience more latency and less performance, due to the PCI accesses. Slide 3 in Liran's presentation gives brief overview about motivation. Thanks > > Regards, > Kenny > > > [1] ibv_alloc_dm() > > http://man7.org/linux/man-pages/man3/ibv_alloc_dm.3.html > > https://www.openfabrics.org/images/2018workshop/presentations/304_LLiss_OnDeviceMemory.pdf > > [2] https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/ > > > > > > > > I think we need to be careful about drawing the line between > > > duplication and over couplings between subsystems. I have other > > > thoughts and concerns and I will try to organize them into a response > > > in the next few days. > > > > > > Regards, > > > Kenny > > > > > > > > > > > > > > > > Is AMD interested in collaborating to help shape this framework? > > > > > It is intended to be device-neutral, so could be leveraged by various > > > > > types of devices. > > > > > If you have an alternative solution well underway, then maybe > > > > > we can work together to merge our efforts into one. > > > > > In the end, the DRM community is best served with common solution. > > > > > > > > > > > > > > > > > > > > > >>> and with future work, we could extend to: > > > > > >>> * track and control share of GPU time (reuse of cpu/cpuacct) > > > > > >>> * apply mask of allowed execution engines (reuse of cpusets) > > > > > >>> > > > > > >>> Instead of introducing a new cgroup subsystem for GPU devices, a new > > > > > >>> framework is proposed to allow devices to register with existing cgroup > > > > > >>> controllers, which creates per-device cgroup_subsys_state within the > > > > > >>> cgroup. This gives device drivers their own private cgroup controls > > > > > >>> (such as memory limits or other parameters) to be applied to device > > > > > >>> resources instead of host system resources. > > > > > >>> Device drivers (GPU or other) are then able to reuse the existing cgroup > > > > > >>> controls, instead of inventing similar ones. > > > > > >>> > > > > > >>> Per-device controls would be exposed in cgroup filesystem as: > > > > > >>> mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > > > > > >>> such as (for example): > > > > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > > > > > >>> mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > > > > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > > > > > >>> mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight > > > > > >>> > > > > > >>> The drm/i915 patch in this series is based on top of other RFC work [1] > > > > > >>> for i915 device memory support. > > > > > >>> > > > > > >>> AMD [2] and Intel [3] have proposed related work in this area within the > > > > > >>> last few years, listed below as reference. This new RFC reuses existing > > > > > >>> cgroup controllers and takes a different approach than prior work. > > > > > >>> > > > > > >>> Finally, some potential discussion points for this series: > > > > > >>> * merge proposed <subsys_name>.devices into a single devices directory? > > > > > >>> * allow devices to have multiple registrations for subsets of resources? > > > > > >>> * document a 'common charging policy' for device drivers to follow? > > > > > >>> > > > > > >>> [1] https://patchwork.freedesktop.org/series/56683/ > > > > > >>> [2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html > > > > > >>> [3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html > > > > > >>> > > > > > >>> > > > > > >>> Brian Welty (5): > > > > > >>> cgroup: Add cgroup_subsys per-device registration framework > > > > > >>> cgroup: Change kernfs_node for directories to store > > > > > >>> cgroup_subsys_state > > > > > >>> memcg: Add per-device support to memory cgroup subsystem > > > > > >>> drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit > > > > > >>> drm/i915: Use memory cgroup for enforcing device memory limit > > > > > >>> > > > > > >>> drivers/gpu/drm/drm_drv.c | 12 + > > > > > >>> drivers/gpu/drm/drm_gem.c | 7 + > > > > > >>> drivers/gpu/drm/i915/i915_drv.c | 2 +- > > > > > >>> drivers/gpu/drm/i915/intel_memory_region.c | 24 +- > > > > > >>> include/drm/drm_device.h | 3 + > > > > > >>> include/drm/drm_drv.h | 8 + > > > > > >>> include/drm/drm_gem.h | 11 + > > > > > >>> include/linux/cgroup-defs.h | 28 ++ > > > > > >>> include/linux/cgroup.h | 3 + > > > > > >>> include/linux/memcontrol.h | 10 + > > > > > >>> kernel/cgroup/cgroup-v1.c | 10 +- > > > > > >>> kernel/cgroup/cgroup.c | 310 ++++++++++++++++++--- > > > > > >>> mm/memcontrol.c | 183 +++++++++++- > > > > > >>> 13 files changed, 552 insertions(+), 59 deletions(-) > > > > > >>> > > > > > >>> -- > > > > > >>> 2.21.0 > > > > > >>> > > > > > >> _______________________________________________ > > > > > >> dri-devel mailing list > > > > > >> dri-devel@lists.freedesktop.org > > > > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote: > In containerized or virtualized environments, there is desire to have > controls in place for resources that can be consumed by users of a GPU > device. This RFC patch series proposes a framework for integrating > use of existing cgroup controllers into device drivers. > The i915 driver is updated in this series as our primary use case to > leverage this framework and to serve as an example for discussion. > > The patch series enables device drivers to use cgroups to control the > following resources within a GPU (or other accelerator device): > * control allocation of device memory (reuse of memcg) > and with future work, we could extend to: > * track and control share of GPU time (reuse of cpu/cpuacct) > * apply mask of allowed execution engines (reuse of cpusets) Please create a separate controller for your purposes. The memory controller is for traditional RAM. I don't see it having much in common with what you're trying to do, and it's barely reusing any of the memcg code. You can use the page_counter API directly. > Instead of introducing a new cgroup subsystem for GPU devices, a new > framework is proposed to allow devices to register with existing cgroup > controllers, which creates per-device cgroup_subsys_state within the > cgroup. This gives device drivers their own private cgroup controls > (such as memory limits or other parameters) to be applied to device > resources instead of host system resources. > Device drivers (GPU or other) are then able to reuse the existing cgroup > controls, instead of inventing similar ones. > > Per-device controls would be exposed in cgroup filesystem as: > mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files> > such as (for example): > mount/<cgroup_name>/memory.devices/<dev_name>/memory.max > mount/<cgroup_name>/memory.devices/<dev_name>/memory.current > mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat > mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight Subdirectories for anything other than actual cgroups are a no-go. If you need a hierarchy, use dotted filenames: gpu.memory.max gpu.cycles.max etc. and look at Documentation/admin-guide/cgroup-v2.rst's 'Format' and 'Conventions', as well as how the io controller works, to see how multi-key / multi-device control files are implemented in cgroup2.
Hello, On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote: > The patch series enables device drivers to use cgroups to control the > following resources within a GPU (or other accelerator device): > * control allocation of device memory (reuse of memcg) > and with future work, we could extend to: > * track and control share of GPU time (reuse of cpu/cpuacct) > * apply mask of allowed execution engines (reuse of cpusets) > > Instead of introducing a new cgroup subsystem for GPU devices, a new > framework is proposed to allow devices to register with existing cgroup > controllers, which creates per-device cgroup_subsys_state within the > cgroup. This gives device drivers their own private cgroup controls > (such as memory limits or other parameters) to be applied to device > resources instead of host system resources. > Device drivers (GPU or other) are then able to reuse the existing cgroup > controls, instead of inventing similar ones. I'm really skeptical about this approach. When creating resource controllers, I think what's the most important and challenging is establishing resource model - what resources are and how they can be distributed. This patchset is going the other way around - building out core infrastructure for bolierplates at a significant risk of mixing up resource models across different types of resources. IO controllers already implement per-device controls. I'd suggest following the same interface conventions and implementing a dedicated controller for the subsystem. Thanks.
On 5/6/2019 8:26 AM, Tejun Heo wrote: > Hello, > > On Wed, May 01, 2019 at 10:04:33AM -0400, Brian Welty wrote: >> The patch series enables device drivers to use cgroups to control the >> following resources within a GPU (or other accelerator device): >> * control allocation of device memory (reuse of memcg) >> and with future work, we could extend to: >> * track and control share of GPU time (reuse of cpu/cpuacct) >> * apply mask of allowed execution engines (reuse of cpusets) >> >> Instead of introducing a new cgroup subsystem for GPU devices, a new >> framework is proposed to allow devices to register with existing cgroup >> controllers, which creates per-device cgroup_subsys_state within the >> cgroup. This gives device drivers their own private cgroup controls >> (such as memory limits or other parameters) to be applied to device >> resources instead of host system resources. >> Device drivers (GPU or other) are then able to reuse the existing cgroup >> controls, instead of inventing similar ones. > > I'm really skeptical about this approach. When creating resource > controllers, I think what's the most important and challenging is > establishing resource model - what resources are and how they can be > distributed. This patchset is going the other way around - building > out core infrastructure for bolierplates at a significant risk of > mixing up resource models across different types of resources. > > IO controllers already implement per-device controls. I'd suggest > following the same interface conventions and implementing a dedicated > controller for the subsystem. > Okay, thanks for feedback. Preference is clear to see this done as dedicated cgroup controller. Part of my proposal was an attempt for devices with "mem like" and "cpu like" attributes to be managed by common controller. We can ignore this idea for cpu attributes, as those can just go in a GPU controller. There might still be merit in having a 'device mem' cgroup controller. The resource model at least is then no longer mixed up with host memory. RDMA community seemed to have some interest in a common controller at least for device memory aspects. Thoughts on this? I believe could still reuse the 'struct mem_cgroup' data structure. There should be some opportunity to reuse charging APIs and have some nice integration with HMM for charging to device memory, depending on backing store. -Brian
Hello, On Tue, May 07, 2019 at 12:50:50PM -0700, Welty, Brian wrote: > There might still be merit in having a 'device mem' cgroup controller. > The resource model at least is then no longer mixed up with host memory. > RDMA community seemed to have some interest in a common controller at > least for device memory aspects. > Thoughts on this? I believe could still reuse the 'struct mem_cgroup' data > structure. There should be some opportunity to reuse charging APIs and > have some nice integration with HMM for charging to device memory, depending > on backing store. Library-ish sharing is fine but in terms of interface, I think it'd be better to keep them separate at least for now. Down the line maybe these resources will interact with each other in a more integrated way but I don't think it's a good idea to try to design and implement resource models for something like that preemptively. Thanks.