Message ID | 20190510155202.14737-4-pagupta@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio pmem driver | expand |
On Fri, May 10, 2019 at 8:53 AM Pankaj Gupta <pagupta@redhat.com> wrote: > > This patch adds 'DAXDEV_SYNC' flag which is set > for nd_region doing synchronous flush. This later > is used to disable MAP_SYNC functionality for > ext4 & xfs filesystem for devices don't support > synchronous flush. > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com> > --- > drivers/dax/bus.c | 2 +- > drivers/dax/super.c | 13 ++++++++++++- > drivers/md/dm.c | 3 ++- > drivers/nvdimm/pmem.c | 5 ++++- > drivers/nvdimm/region_devs.c | 7 +++++++ > include/linux/dax.h | 8 ++++++-- > include/linux/libnvdimm.h | 1 + > 7 files changed, 33 insertions(+), 6 deletions(-) [..] > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 043f0761e4a0..ee007b75d9fd 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor) > sprintf(md->disk->disk_name, "dm-%d", minor); > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) { > - dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops); > + dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops, > + DAXDEV_F_SYNC); Apologies for not realizing this until now, but this is broken. Imaging a device-mapper configuration composed of both 'async' virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified across all members. I would change this argument to '0' and then arrange for it to be set at dm_table_supports_dax() time after validating that all components support synchronous dax.
> > > > This patch adds 'DAXDEV_SYNC' flag which is set > > for nd_region doing synchronous flush. This later > > is used to disable MAP_SYNC functionality for > > ext4 & xfs filesystem for devices don't support > > synchronous flush. > > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com> > > --- > > drivers/dax/bus.c | 2 +- > > drivers/dax/super.c | 13 ++++++++++++- > > drivers/md/dm.c | 3 ++- > > drivers/nvdimm/pmem.c | 5 ++++- > > drivers/nvdimm/region_devs.c | 7 +++++++ > > include/linux/dax.h | 8 ++++++-- > > include/linux/libnvdimm.h | 1 + > > 7 files changed, 33 insertions(+), 6 deletions(-) > [..] > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > index 043f0761e4a0..ee007b75d9fd 100644 > > --- a/drivers/md/dm.c > > +++ b/drivers/md/dm.c > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor) > > sprintf(md->disk->disk_name, "dm-%d", minor); > > > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) { > > - dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops); > > + dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops, > > + DAXDEV_F_SYNC); > > Apologies for not realizing this until now, but this is broken. > Imaging a device-mapper configuration composed of both 'async' > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified > across all members. I would change this argument to '0' and then > arrange for it to be set at dm_table_supports_dax() time after > validating that all components support synchronous dax. o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target components support synchronous DAX. Just a question, If device mapper configuration have composed of both virtio-pmem or pmem devices, we want to configure device mapper for async flush? Thank you, Pankaj >
On Fri, May 10, 2019 at 5:45 PM Pankaj Gupta <pagupta@redhat.com> wrote: > > > > > > > > > This patch adds 'DAXDEV_SYNC' flag which is set > > > for nd_region doing synchronous flush. This later > > > is used to disable MAP_SYNC functionality for > > > ext4 & xfs filesystem for devices don't support > > > synchronous flush. > > > > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com> > > > --- > > > drivers/dax/bus.c | 2 +- > > > drivers/dax/super.c | 13 ++++++++++++- > > > drivers/md/dm.c | 3 ++- > > > drivers/nvdimm/pmem.c | 5 ++++- > > > drivers/nvdimm/region_devs.c | 7 +++++++ > > > include/linux/dax.h | 8 ++++++-- > > > include/linux/libnvdimm.h | 1 + > > > 7 files changed, 33 insertions(+), 6 deletions(-) > > [..] > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > > index 043f0761e4a0..ee007b75d9fd 100644 > > > --- a/drivers/md/dm.c > > > +++ b/drivers/md/dm.c > > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor) > > > sprintf(md->disk->disk_name, "dm-%d", minor); > > > > > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) { > > > - dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops); > > > + dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops, > > > + DAXDEV_F_SYNC); > > > > Apologies for not realizing this until now, but this is broken. > > Imaging a device-mapper configuration composed of both 'async' > > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified > > across all members. I would change this argument to '0' and then > > arrange for it to be set at dm_table_supports_dax() time after > > validating that all components support synchronous dax. > > o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target > components support synchronous DAX. > > Just a question, If device mapper configuration have composed of both > virtio-pmem or pmem devices, we want to configure device mapper for async flush? If it's composed of both then, yes, it needs to be async flush at the device-mapper level. Otherwise MAP_SYNC will succeed and fail to trigger fsync on the host file when necessary for the virtio-pmem backed portion of the device-mapper device.
> > > > > > > > This patch adds 'DAXDEV_SYNC' flag which is set > > > > for nd_region doing synchronous flush. This later > > > > is used to disable MAP_SYNC functionality for > > > > ext4 & xfs filesystem for devices don't support > > > > synchronous flush. > > > > > > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com> > > > > --- > > > > drivers/dax/bus.c | 2 +- > > > > drivers/dax/super.c | 13 ++++++++++++- > > > > drivers/md/dm.c | 3 ++- > > > > drivers/nvdimm/pmem.c | 5 ++++- > > > > drivers/nvdimm/region_devs.c | 7 +++++++ > > > > include/linux/dax.h | 8 ++++++-- > > > > include/linux/libnvdimm.h | 1 + > > > > 7 files changed, 33 insertions(+), 6 deletions(-) > > > [..] > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > > > index 043f0761e4a0..ee007b75d9fd 100644 > > > > --- a/drivers/md/dm.c > > > > +++ b/drivers/md/dm.c > > > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor) > > > > sprintf(md->disk->disk_name, "dm-%d", minor); > > > > > > > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) { > > > > - dax_dev = alloc_dax(md, md->disk->disk_name, > > > > &dm_dax_ops); > > > > + dax_dev = alloc_dax(md, md->disk->disk_name, > > > > &dm_dax_ops, > > > > + > > > > DAXDEV_F_SYNC); > > > > > > Apologies for not realizing this until now, but this is broken. > > > Imaging a device-mapper configuration composed of both 'async' > > > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified > > > across all members. I would change this argument to '0' and then > > > arrange for it to be set at dm_table_supports_dax() time after > > > validating that all components support synchronous dax. > > > > o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target > > components support synchronous DAX. > > > > Just a question, If device mapper configuration have composed of both > > virtio-pmem or pmem devices, we want to configure device mapper for async > > flush? > > If it's composed of both then, yes, it needs to be async flush at the > device-mapper level. Otherwise MAP_SYNC will succeed and fail to > trigger fsync on the host file when necessary for the virtio-pmem > backed portion of the device-mapper device. o.k. Agree. Thanks you, Pankaj > >
Hi Dan, While testing device mapper with DAX, I faced a bug with the commit: commit ad428cdb525a97d15c0349fdc80f3d58befb50df Author: Dan Williams <dan.j.williams@intel.com> Date: Wed Feb 20 21:12:50 2019 -0800 When I reverted the condition to old code[1] it worked for me. I am thinking when we map two different devices (e.g with device mapper), will start & end pfn still point to same pgmap? Or there is something else which I am missing here. Note: I tested only EXT4. [1] - if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX) + end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL); + if (pgmap && pgmap == end_pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX + && pfn_t_to_page(pfn)->pgmap == pgmap + && pfn_t_to_page(end_pfn)->pgmap == pgmap + && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr)) + && pfn_t_to_pfn(end_pfn) == PHYS_PFN(__pa(end_kaddr))) dax_enabled = true; put_dev_pagemap(pgmap); Thanks, Pankaj
On Mon, May 13, 2019 at 10:32 AM Pankaj Gupta <pagupta@redhat.com> wrote: > > > Hi Dan, > > While testing device mapper with DAX, I faced a bug with the commit: > > commit ad428cdb525a97d15c0349fdc80f3d58befb50df > Author: Dan Williams <dan.j.williams@intel.com> > Date: Wed Feb 20 21:12:50 2019 -0800 > > When I reverted the condition to old code[1] it worked for me. I > am thinking when we map two different devices (e.g with device mapper), will > start & end pfn still point to same pgmap? Or there is something else which > I am missing here. > > Note: I tested only EXT4. > > [1] > > - if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX) > + end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL); > + if (pgmap && pgmap == end_pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX > + && pfn_t_to_page(pfn)->pgmap == pgmap > + && pfn_t_to_page(end_pfn)->pgmap == pgmap > + && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr)) > + && pfn_t_to_pfn(end_pfn) == PHYS_PFN(__pa(end_kaddr))) Ugh, yes, device-mapper continues to be an awkward fit for dax (or vice versa). We would either need a way to have a multi-level pfn to pagemap lookup for composite devices, or a way to discern that even though the pagemap is different that the result is still valid / not an indication that we have leaked into an unassociated address range. Perhaps a per-daxdev callback for ->dax_supported() so that device-mapper internals can be used for this validation. We need to get that fixed up, but I don't see it as a blocker / pre-requisite for virtio-pmem.
> > > > > > Hi Dan, > > > > While testing device mapper with DAX, I faced a bug with the commit: > > > > commit ad428cdb525a97d15c0349fdc80f3d58befb50df > > Author: Dan Williams <dan.j.williams@intel.com> > > Date: Wed Feb 20 21:12:50 2019 -0800 > > > > When I reverted the condition to old code[1] it worked for me. I > > am thinking when we map two different devices (e.g with device mapper), > > will > > start & end pfn still point to same pgmap? Or there is something else which > > I am missing here. > > > > Note: I tested only EXT4. > > > > [1] > > > > - if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX) > > + end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL); > > + if (pgmap && pgmap == end_pgmap && pgmap->type == > > MEMORY_DEVICE_FS_DAX > > + && pfn_t_to_page(pfn)->pgmap == pgmap > > + && pfn_t_to_page(end_pfn)->pgmap == pgmap > > + && pfn_t_to_pfn(pfn) == > > PHYS_PFN(__pa(kaddr)) > > + && pfn_t_to_pfn(end_pfn) == > > PHYS_PFN(__pa(end_kaddr))) > > Ugh, yes, device-mapper continues to be an awkward fit for dax (or > vice versa). We would either need a way to have a multi-level pfn to > pagemap lookup for composite devices, or a way to discern that even > though the pagemap is different that the result is still valid / not > an indication that we have leaked into an unassociated address range. > Perhaps a per-daxdev callback for ->dax_supported() so that > device-mapper internals can be used for this validation. Yes, Will look at it. > > We need to get that fixed up, but I don't see it as a blocker / > pre-requisite for virtio-pmem. Agree. Will send virtio-pmem patch series. Thank you, Pankaj > >
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 2109cfe80219..5f184e751c82 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id, * No 'host' or dax_operations since there is no access to this * device outside of mmap of the resulting character device. */ - dax_dev = alloc_dax(dev_dax, NULL, NULL); + dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC); if (!dax_dev) goto err; diff --git a/drivers/dax/super.c b/drivers/dax/super.c index bbd57ca0634a..b6c44b5062e9 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -186,6 +186,8 @@ enum dax_device_flags { DAXDEV_ALIVE, /* gate whether dax_flush() calls the low level flush routine */ DAXDEV_WRITE_CACHE, + /* flag to check if device supports synchronous flush */ + DAXDEV_SYNC, }; /** @@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(dax_write_cache_enabled); +bool dax_synchronous(struct dax_device *dax_dev) +{ + return test_bit(DAXDEV_SYNC, &dax_dev->flags); +} +EXPORT_SYMBOL_GPL(dax_synchronous); + bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(&dax_srcu); @@ -508,7 +516,7 @@ static void dax_add_host(struct dax_device *dax_dev, const char *host) } struct dax_device *alloc_dax(void *private, const char *__host, - const struct dax_operations *ops) + const struct dax_operations *ops, unsigned long flags) { struct dax_device *dax_dev; const char *host; @@ -531,6 +539,9 @@ struct dax_device *alloc_dax(void *private, const char *__host, dax_add_host(dax_dev, host); dax_dev->ops = ops; dax_dev->private = private; + if (flags & DAXDEV_F_SYNC) + set_bit(DAXDEV_SYNC, &dax_dev->flags); + return dax_dev; err_dev: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 043f0761e4a0..ee007b75d9fd 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor) sprintf(md->disk->disk_name, "dm-%d", minor); if (IS_ENABLED(CONFIG_DAX_DRIVER)) { - dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops); + dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops, + DAXDEV_F_SYNC); if (!dax_dev) goto bad; } diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 0279eb1da3ef..bdbd2b414d3d 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -365,6 +365,7 @@ static int pmem_attach_disk(struct device *dev, struct gendisk *disk; void *addr; int rc; + unsigned long flags = 0UL; pmem = devm_kzalloc(dev, sizeof(*pmem), GFP_KERNEL); if (!pmem) @@ -462,7 +463,9 @@ static int pmem_attach_disk(struct device *dev, nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res); disk->bb = &pmem->bb; - dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops); + if (is_nvdimm_sync(nd_region)) + flags = DAXDEV_F_SYNC; + dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops, flags); if (!dax_dev) { put_disk(disk); return -ENOMEM; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b4ef7d9ff22e..f3ea5369d20a 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -1197,6 +1197,13 @@ int nvdimm_has_cache(struct nd_region *nd_region) } EXPORT_SYMBOL_GPL(nvdimm_has_cache); +bool is_nvdimm_sync(struct nd_region *nd_region) +{ + return is_nd_pmem(&nd_region->dev) && + !test_bit(ND_REGION_ASYNC, &nd_region->flags); +} +EXPORT_SYMBOL_GPL(is_nvdimm_sync); + struct conflict_context { struct nd_region *nd_region; resource_size_t start, size; diff --git a/include/linux/dax.h b/include/linux/dax.h index 0dd316a74a29..ed75b7d9d178 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -7,6 +7,9 @@ #include <linux/radix-tree.h> #include <asm/pgtable.h> +/* Flag for synchronous flush */ +#define DAXDEV_F_SYNC (1UL << 0) + typedef unsigned long dax_entry_t; struct iomap_ops; @@ -32,18 +35,19 @@ extern struct attribute_group dax_attribute_group; #if IS_ENABLED(CONFIG_DAX) struct dax_device *dax_get_by_host(const char *host); struct dax_device *alloc_dax(void *private, const char *host, - const struct dax_operations *ops); + const struct dax_operations *ops, unsigned long flags); void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev); +bool dax_synchronous(struct dax_device *dax_dev); #else static inline struct dax_device *dax_get_by_host(const char *host) { return NULL; } static inline struct dax_device *alloc_dax(void *private, const char *host, - const struct dax_operations *ops) + const struct dax_operations *ops, unsigned long flags) { /* * Callers should check IS_ENABLED(CONFIG_DAX) to know if this diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index feb342d026f2..3238a206e563 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -264,6 +264,7 @@ void nvdimm_flush(struct nd_region *nd_region); int nvdimm_has_flush(struct nd_region *nd_region); int nvdimm_has_cache(struct nd_region *nd_region); int nvdimm_in_overwrite(struct nvdimm *nvdimm); +bool is_nvdimm_sync(struct nd_region *nd_region); static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
This patch adds 'DAXDEV_SYNC' flag which is set for nd_region doing synchronous flush. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem for devices don't support synchronous flush. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> --- drivers/dax/bus.c | 2 +- drivers/dax/super.c | 13 ++++++++++++- drivers/md/dm.c | 3 ++- drivers/nvdimm/pmem.c | 5 ++++- drivers/nvdimm/region_devs.c | 7 +++++++ include/linux/dax.h | 8 ++++++-- include/linux/libnvdimm.h | 1 + 7 files changed, 33 insertions(+), 6 deletions(-)