diff mbox

[1/2] shmem: Support for registration of driver/file owner specific ops

Message ID 1478271776-1194-1-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com Nov. 4, 2016, 3:02 p.m. UTC
From: Chris Wilson <chris@chris-wilson.co.uk>

This provides support for the drivers or shmem file owners to register
a set of callbacks, which can be invoked from the address space
operations methods implemented by shmem.  This allow the file owners to
hook into the shmem address space operations to do some extra/custom
operations in addition to the default ones.

The private_data field of address_space struct is used to store the
pointer to driver specific ops.  Currently only one ops field is defined,
which is migratepage, but can be extended on an as-needed basis.

The need for driver specific operations arises since some of the
operations (like migratepage) may not be handled completely within shmem,
so as to be effective, and would need some driver specific handling also.
Specifically, i915.ko would like to participate in migratepage().
i915.ko uses shmemfs to provide swappable backing storage for its user
objects, but when those objects are in use by the GPU it must pin the
entire object until the GPU is idle.  As a result, large chunks of memory
can be arbitrarily withdrawn from page migration, resulting in premature
out-of-memory due to fragmentation.  However, if i915.ko can receive the
migratepage() request, it can then flush the object from the GPU, remove
its pin and thus enable the migration.

Since gfx allocations are one of the major consumer of system memory, its
imperative to have such a mechanism to effectively deal with
fragmentation.  And therefore the need for such a provision for initiating
driver specific actions during address space operations.

v2:
- Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
- Change the return type of shmem_set_device_op() to void and remove the
  check for pre-existing data. (Joonas)
- Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
  with shmem_dev_info structure. (Joonas)

Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.linux.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/linux/shmem_fs.h | 13 +++++++++++++
 mm/shmem.c               | 17 ++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

Comments

Hugh Dickins Nov. 10, 2016, 5:36 a.m. UTC | #1
On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> This provides support for the drivers or shmem file owners to register
> a set of callbacks, which can be invoked from the address space
> operations methods implemented by shmem.  This allow the file owners to
> hook into the shmem address space operations to do some extra/custom
> operations in addition to the default ones.
> 
> The private_data field of address_space struct is used to store the
> pointer to driver specific ops.  Currently only one ops field is defined,
> which is migratepage, but can be extended on an as-needed basis.
> 
> The need for driver specific operations arises since some of the
> operations (like migratepage) may not be handled completely within shmem,
> so as to be effective, and would need some driver specific handling also.
> Specifically, i915.ko would like to participate in migratepage().
> i915.ko uses shmemfs to provide swappable backing storage for its user
> objects, but when those objects are in use by the GPU it must pin the
> entire object until the GPU is idle.  As a result, large chunks of memory
> can be arbitrarily withdrawn from page migration, resulting in premature
> out-of-memory due to fragmentation.  However, if i915.ko can receive the
> migratepage() request, it can then flush the object from the GPU, remove
> its pin and thus enable the migration.
> 
> Since gfx allocations are one of the major consumer of system memory, its
> imperative to have such a mechanism to effectively deal with
> fragmentation.  And therefore the need for such a provision for initiating
> driver specific actions during address space operations.

Thank you for persisting with this, and sorry for all my delay.

> 
> v2:
> - Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
> - Change the return type of shmem_set_device_op() to void and remove the
>   check for pre-existing data. (Joonas)
> - Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
>   with shmem_dev_info structure. (Joonas)
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.linux.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

That doesn't seem quite right: the From line above implies that Chris
wrote it, and should be first Signer; but perhaps the From line is wrong.

> ---
>  include/linux/shmem_fs.h | 13 +++++++++++++
>  mm/shmem.c               | 17 ++++++++++++++++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index ff078e7..454c3ba 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -39,11 +39,24 @@ struct shmem_sb_info {
>  	unsigned long shrinklist_len; /* Length of shrinklist */
>  };
>  
> +struct shmem_dev_info {
> +	void *private_data;
> +	int (*migratepage)(struct address_space *mapping,
> +			   struct page *newpage, struct page *page,
> +			   enum migrate_mode mode, void *dev_priv_data);

Aren't the private_data field and dev_priv_data arg a little bit
confusing and redundant?  Can't the migratepage() deduce dev_priv
for itself from mapping->private_data (perhaps wrapped by a
shmem_get_dev_info()), by using container_of()?

> +};
> +
>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>  {
>  	return container_of(inode, struct shmem_inode_info, vfs_inode);
>  }
>  
> +static inline void shmem_set_dev_info(struct address_space *mapping,
> +				      struct shmem_dev_info *info)
> +{
> +	mapping->private_data = info;

Nit: if this stays as is, I'd prefer dev_info there and above,
since shmem.c uses info all over for its shmem_inode_info pointer.
But in second patch I suggest obj_info may be better than dev_info.

> +}
> +
>  /*
>   * Functions in mm/shmem.c called directly from elsewhere:
>   */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d..fce8de3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1290,6 +1290,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static int shmem_migratepage(struct address_space *mapping,
> +			     struct page *newpage, struct page *page,
> +			     enum migrate_mode mode)
> +{
> +	struct shmem_dev_info *dev_info = mapping->private_data;
> +
> +	if (dev_info && dev_info->migratepage)
> +		return dev_info->migratepage(mapping, newpage, page,
> +					     mode, dev_info->private_data);
> +
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>  {
> @@ -3654,7 +3669,7 @@ static void shmem_destroy_inodecache(void)
>  	.write_end	= shmem_write_end,
>  #endif
>  #ifdef CONFIG_MIGRATION
> -	.migratepage	= migrate_page,
> +	.migratepage	= shmem_migratepage,
>  #endif
>  	.error_remove_page = generic_error_remove_page,
>  };
> -- 
> 1.9.2

I didn't like this very much; but every time I tried to "improve" it,
found good reasons why you chose the way you did (modularity of i915,
constness of a_ops, reluctance to copy and modify a_ops, reluctance
to export those shmem methods separately).

I think perhaps later we just add a gem_ops pointer to shmem_inode_info,
for i915 or other gems to fill in as they wish (and shmem divert off to
them if set, as you've done); but for now you're trying to avoid
enlarging the shmem inode, okay.

Hugh
akash.goel@intel.com Nov. 10, 2016, 4:22 p.m. UTC | #2
On 11/10/2016 11:06 AM, Hugh Dickins wrote:
> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> This provides support for the drivers or shmem file owners to register
>> a set of callbacks, which can be invoked from the address space
>> operations methods implemented by shmem.  This allow the file owners to
>> hook into the shmem address space operations to do some extra/custom
>> operations in addition to the default ones.
>>
>> The private_data field of address_space struct is used to store the
>> pointer to driver specific ops.  Currently only one ops field is defined,
>> which is migratepage, but can be extended on an as-needed basis.
>>
>> The need for driver specific operations arises since some of the
>> operations (like migratepage) may not be handled completely within shmem,
>> so as to be effective, and would need some driver specific handling also.
>> Specifically, i915.ko would like to participate in migratepage().
>> i915.ko uses shmemfs to provide swappable backing storage for its user
>> objects, but when those objects are in use by the GPU it must pin the
>> entire object until the GPU is idle.  As a result, large chunks of memory
>> can be arbitrarily withdrawn from page migration, resulting in premature
>> out-of-memory due to fragmentation.  However, if i915.ko can receive the
>> migratepage() request, it can then flush the object from the GPU, remove
>> its pin and thus enable the migration.
>>
>> Since gfx allocations are one of the major consumer of system memory, its
>> imperative to have such a mechanism to effectively deal with
>> fragmentation.  And therefore the need for such a provision for initiating
>> driver specific actions during address space operations.
>
> Thank you for persisting with this, and sorry for all my delay.
>
>>
>> v2:
>> - Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
>> - Change the return type of shmem_set_device_op() to void and remove the
>>   check for pre-existing data. (Joonas)
>> - Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
>>   with shmem_dev_info structure. (Joonas)
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.linux.org
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> That doesn't seem quite right: the From line above implies that Chris
> wrote it, and should be first Signer; but perhaps the From line is wrong.
>
Chris only wrote this patch initially, will do the required correction.

>> ---
>>  include/linux/shmem_fs.h | 13 +++++++++++++
>>  mm/shmem.c               | 17 ++++++++++++++++-
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index ff078e7..454c3ba 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -39,11 +39,24 @@ struct shmem_sb_info {
>>  	unsigned long shrinklist_len; /* Length of shrinklist */
>>  };
>>
>> +struct shmem_dev_info {
>> +	void *private_data;
>> +	int (*migratepage)(struct address_space *mapping,
>> +			   struct page *newpage, struct page *page,
>> +			   enum migrate_mode mode, void *dev_priv_data);
>
> Aren't the private_data field and dev_priv_data arg a little bit
> confusing and redundant?  Can't the migratepage() deduce dev_priv
> for itself from mapping->private_data (perhaps wrapped by a
> shmem_get_dev_info()), by using container_of()?
>
Yes looks like migratepage() can deduce dev_priv from mapping->private_data.
Can we keep the private_data as a placeholder ?. Will 
s/dev_priv_data/private_data/.

As per your suggestion, in the other patch, object pointer can be 
derived from mapping->private_data (container_of) and dev_priv in turn 
can be derived from object pointer.

>> +};
>> +
>>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>>  {
>>  	return container_of(inode, struct shmem_inode_info, vfs_inode);
>>  }
>>
>> +static inline void shmem_set_dev_info(struct address_space *mapping,
>> +				      struct shmem_dev_info *info)
>> +{
>> +	mapping->private_data = info;
>
> Nit: if this stays as is, I'd prefer dev_info there and above,
> since shmem.c uses info all over for its shmem_inode_info pointer.
> But in second patch I suggest obj_info may be better than dev_info.
>
Fine will s/info/dev_info.

Best regards
Akash

>> +}
>> +
>>  /*
>>   * Functions in mm/shmem.c called directly from elsewhere:
>>   */
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index ad7813d..fce8de3 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1290,6 +1290,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>>  	return 0;
>>  }
>>
>> +#ifdef CONFIG_MIGRATION
>> +static int shmem_migratepage(struct address_space *mapping,
>> +			     struct page *newpage, struct page *page,
>> +			     enum migrate_mode mode)
>> +{
>> +	struct shmem_dev_info *dev_info = mapping->private_data;
>> +
>> +	if (dev_info && dev_info->migratepage)
>> +		return dev_info->migratepage(mapping, newpage, page,
>> +					     mode, dev_info->private_data);
>> +
>> +	return migrate_page(mapping, newpage, page, mode);
>> +}
>> +#endif
>> +
>>  #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
>>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>>  {
>> @@ -3654,7 +3669,7 @@ static void shmem_destroy_inodecache(void)
>>  	.write_end	= shmem_write_end,
>>  #endif
>>  #ifdef CONFIG_MIGRATION
>> -	.migratepage	= migrate_page,
>> +	.migratepage	= shmem_migratepage,
>>  #endif
>>  	.error_remove_page = generic_error_remove_page,
>>  };
>> --
>> 1.9.2
>
> I didn't like this very much; but every time I tried to "improve" it,
> found good reasons why you chose the way you did (modularity of i915,
> constness of a_ops, reluctance to copy and modify a_ops, reluctance
> to export those shmem methods separately).
>
> I think perhaps later we just add a gem_ops pointer to shmem_inode_info,
> for i915 or other gems to fill in as they wish (and shmem divert off to
> them if set, as you've done); but for now you're trying to avoid
> enlarging the shmem inode, okay.
>
> Hugh
>
Chris Wilson Nov. 11, 2016, 1:50 p.m. UTC | #3
On Thu, Nov 10, 2016 at 09:52:34PM +0530, Goel, Akash wrote:
> 
> 
> On 11/10/2016 11:06 AM, Hugh Dickins wrote:
> >On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> >>Cc: Hugh Dickins <hughd@google.com>
> >>Cc: linux-mm@kvack.org
> >>Cc: linux-kernel@vger.linux.org
> >>Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> >>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >That doesn't seem quite right: the From line above implies that Chris
> >wrote it, and should be first Signer; but perhaps the From line is wrong.
> >
> Chris only wrote this patch initially, will do the required correction.

Akash is being modest. I gave him an idea I had been toying with to help
reduce premature oom, he is the one who deserves credit for turning it
into a functional patch and putting it into production.
-Chris
diff mbox

Patch

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index ff078e7..454c3ba 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -39,11 +39,24 @@  struct shmem_sb_info {
 	unsigned long shrinklist_len; /* Length of shrinklist */
 };
 
+struct shmem_dev_info {
+	void *private_data;
+	int (*migratepage)(struct address_space *mapping,
+			   struct page *newpage, struct page *page,
+			   enum migrate_mode mode, void *dev_priv_data);
+};
+
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct shmem_inode_info, vfs_inode);
 }
 
+static inline void shmem_set_dev_info(struct address_space *mapping,
+				      struct shmem_dev_info *info)
+{
+	mapping->private_data = info;
+}
+
 /*
  * Functions in mm/shmem.c called directly from elsewhere:
  */
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d..fce8de3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1290,6 +1290,21 @@  static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	return 0;
 }
 
+#ifdef CONFIG_MIGRATION
+static int shmem_migratepage(struct address_space *mapping,
+			     struct page *newpage, struct page *page,
+			     enum migrate_mode mode)
+{
+	struct shmem_dev_info *dev_info = mapping->private_data;
+
+	if (dev_info && dev_info->migratepage)
+		return dev_info->migratepage(mapping, newpage, page,
+					     mode, dev_info->private_data);
+
+	return migrate_page(mapping, newpage, page, mode);
+}
+#endif
+
 #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
@@ -3654,7 +3669,7 @@  static void shmem_destroy_inodecache(void)
 	.write_end	= shmem_write_end,
 #endif
 #ifdef CONFIG_MIGRATION
-	.migratepage	= migrate_page,
+	.migratepage	= shmem_migratepage,
 #endif
 	.error_remove_page = generic_error_remove_page,
 };