diff mbox

[v3,2/4] libnvdimm: unconditionally deep flush on *sync

Message ID 20180606164515.25677-2-ross.zwisler@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ross Zwisler June 6, 2018, 4:45 p.m. UTC
Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
write to each of the flush hints for a region) in response to an
msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
we were setting up the request queue.  This happens due to the write cache
value passed in to blk_queue_write_cache(), which then causes the block
layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
"write_cache" sysfs entry for namespaces, i.e.:

  /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache

which can be used to control whether or not the kernel thinks a given
namespace has a write cache, but this didn't modify the deep flush behavior
that we set up when the driver was initialized.  Instead, it only modified
whether or not DAX would flush CPU caches via dax_flush() in response to
*sync calls.

Simplify this by making the *sync deep flush always happen, regardless of
the write cache setting of a namespace.  The DAX CPU cache flushing will
still be controlled the write_cache setting of the namespace.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Dan Williams June 6, 2018, 5:57 p.m. UTC | #1
On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
> write to each of the flush hints for a region) in response to an
> msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
> we were setting up the request queue.  This happens due to the write cache
> value passed in to blk_queue_write_cache(), which then causes the block
> layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
> "write_cache" sysfs entry for namespaces, i.e.:
>
>   /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
>
> which can be used to control whether or not the kernel thinks a given
> namespace has a write cache, but this didn't modify the deep flush behavior
> that we set up when the driver was initialized.  Instead, it only modified
> whether or not DAX would flush CPU caches via dax_flush() in response to
> *sync calls.
>
> Simplify this by making the *sync deep flush always happen, regardless of
> the write cache setting of a namespace.  The DAX CPU cache flushing will
> still be controlled the write_cache setting of the namespace.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>

Looks, good. I believe we want this one and ["PATCH v3 4/4] libnvdimm:
don't flush power-fail protected CPU caches" marked for -stable and
tagged with:

Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices
via fsync()")

...any concerns with that?
Ross Zwisler June 6, 2018, 6:16 p.m. UTC | #2
On Wed, Jun 06, 2018 at 10:57:59AM -0700, Dan Williams wrote:
> On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
> > Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
> > write to each of the flush hints for a region) in response to an
> > msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
> > we were setting up the request queue.  This happens due to the write cache
> > value passed in to blk_queue_write_cache(), which then causes the block
> > layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
> > "write_cache" sysfs entry for namespaces, i.e.:
> >
> >   /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
> >
> > which can be used to control whether or not the kernel thinks a given
> > namespace has a write cache, but this didn't modify the deep flush behavior
> > that we set up when the driver was initialized.  Instead, it only modified
> > whether or not DAX would flush CPU caches via dax_flush() in response to
> > *sync calls.
> >
> > Simplify this by making the *sync deep flush always happen, regardless of
> > the write cache setting of a namespace.  The DAX CPU cache flushing will
> > still be controlled the write_cache setting of the namespace.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> 
> Looks, good. I believe we want this one and ["PATCH v3 4/4] libnvdimm:
> don't flush power-fail protected CPU caches" marked for -stable and
> tagged with:
> 
> Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices
> via fsync()")
> 
> ...any concerns with that?

Nope, sounds good.  Can you fix that up when you apply, or would it be helpful
for me to send another revision with those tags?
Dan Williams June 6, 2018, 6:36 p.m. UTC | #3
On Wed, Jun 6, 2018 at 11:16 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Wed, Jun 06, 2018 at 10:57:59AM -0700, Dan Williams wrote:
>> On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
>> <ross.zwisler@linux.intel.com> wrote:
>> > Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
>> > write to each of the flush hints for a region) in response to an
>> > msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
>> > we were setting up the request queue.  This happens due to the write cache
>> > value passed in to blk_queue_write_cache(), which then causes the block
>> > layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
>> > "write_cache" sysfs entry for namespaces, i.e.:
>> >
>> >   /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
>> >
>> > which can be used to control whether or not the kernel thinks a given
>> > namespace has a write cache, but this didn't modify the deep flush behavior
>> > that we set up when the driver was initialized.  Instead, it only modified
>> > whether or not DAX would flush CPU caches via dax_flush() in response to
>> > *sync calls.
>> >
>> > Simplify this by making the *sync deep flush always happen, regardless of
>> > the write cache setting of a namespace.  The DAX CPU cache flushing will
>> > still be controlled the write_cache setting of the namespace.
>> >
>> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
>>
>> Looks, good. I believe we want this one and ["PATCH v3 4/4] libnvdimm:
>> don't flush power-fail protected CPU caches" marked for -stable and
>> tagged with:
>>
>> Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices
>> via fsync()")
>>
>> ...any concerns with that?
>
> Nope, sounds good.  Can you fix that up when you apply, or would it be helpful
> for me to send another revision with those tags?

I'll fix it up, thanks Ross.
diff mbox

Patch

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 252adfab1e47..97b4c39a9267 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -294,7 +294,7 @@  static int pmem_attach_disk(struct device *dev,
 {
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	struct nd_region *nd_region = to_nd_region(dev->parent);
-	int nid = dev_to_node(dev), fua, wbc;
+	int nid = dev_to_node(dev), fua;
 	struct resource *res = &nsio->res;
 	struct resource bb_res;
 	struct nd_pfn *nd_pfn = NULL;
@@ -330,7 +330,6 @@  static int pmem_attach_disk(struct device *dev,
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 		fua = 0;
 	}
-	wbc = nvdimm_has_cache(nd_region);
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
 				dev_name(&ndns->dev))) {
@@ -377,7 +376,7 @@  static int pmem_attach_disk(struct device *dev,
 		return PTR_ERR(addr);
 	pmem->virt_addr = addr;
 
-	blk_queue_write_cache(q, wbc, fua);
+	blk_queue_write_cache(q, true, fua);
 	blk_queue_make_request(q, pmem_make_request);
 	blk_queue_physical_block_size(q, PAGE_SIZE);
 	blk_queue_logical_block_size(q, pmem_sector_size(ndns));
@@ -408,7 +407,7 @@  static int pmem_attach_disk(struct device *dev,
 		put_disk(disk);
 		return -ENOMEM;
 	}
-	dax_write_cache(dax_dev, wbc);
+	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	pmem->dax_dev = dax_dev;
 
 	gendev = disk_to_dev(disk);