diff mbox

[2/2] pmem: Add simple and slow fsync/msync support

Message ID 1446070176-14568-3-git-send-email-ross.zwisler@linux.intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Ross Zwisler Oct. 28, 2015, 10:09 p.m. UTC
Make blkdev_issue_flush() behave correctly according to its required
semantics - all volatile cached data is flushed to stable storage.

Eventually this needs to be replaced with something much more precise by
tracking dirty DAX entries via the radix tree in struct address_space, but
for now this gives us correctness even if the performance is quite bad.

Userspace applications looking to avoid the fsync/msync penalty should
consider more fine-grained flushing via the NVML library:

https://github.com/pmem/nvml

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 drivers/nvdimm/pmem.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Dan Williams Oct. 28, 2015, 11:02 p.m. UTC | #1
On Thu, Oct 29, 2015 at 7:09 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> Make blkdev_issue_flush() behave correctly according to its required
> semantics - all volatile cached data is flushed to stable storage.
>
> Eventually this needs to be replaced with something much more precise by
> tracking dirty DAX entries via the radix tree in struct address_space, but
> for now this gives us correctness even if the performance is quite bad.
>
> Userspace applications looking to avoid the fsync/msync penalty should
> consider more fine-grained flushing via the NVML library:
>
> https://github.com/pmem/nvml
>
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  drivers/nvdimm/pmem.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 0ba6a97..eea7997 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -80,7 +80,14 @@ static void pmem_make_request(struct request_queue *q, struct bio *bio)
>         if (do_acct)
>                 nd_iostat_end(bio, start);
>
> -       if (bio_data_dir(bio))
> +       if (bio->bi_rw & REQ_FLUSH) {
> +               void __pmem *addr = pmem->virt_addr + pmem->data_offset;
> +               size_t size = pmem->size - pmem->data_offset;
> +
> +               wb_cache_pmem(addr, size);
> +       }
> +

So I think this will be too expensive to run synchronously in the
submission path for very large pmem ranges and should be farmed out to
an async thread. Then, as long as we're farming it out, might as well
farm it out to more than one cpu.  I'll take a stab at this on the
flight back from KS.

Another optimization is that we can make the flush a nop up until
pmem_direct_access() is first called, because we know there is nothing
to flush when all the i/o is coming through the driver.  That at least
helps the "pmem as a fast SSD" use case avoid the overhead.

Bikeshed alert... wb_cache_pmem() should probably become
mmio_wb_cache() and live next to mmio_flush_cache() since it is not
specific to persistent memory.
diff mbox

Patch

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 0ba6a97..eea7997 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -80,7 +80,14 @@  static void pmem_make_request(struct request_queue *q, struct bio *bio)
 	if (do_acct)
 		nd_iostat_end(bio, start);
 
-	if (bio_data_dir(bio))
+	if (bio->bi_rw & REQ_FLUSH) {
+		void __pmem *addr = pmem->virt_addr + pmem->data_offset;
+		size_t size = pmem->size - pmem->data_offset;
+
+		wb_cache_pmem(addr, size);
+	}
+
+	if (bio_data_dir(bio) || (bio->bi_rw & REQ_FLUSH))
 		wmb_pmem();
 
 	bio_endio(bio);
@@ -189,6 +196,7 @@  static int pmem_attach_disk(struct device *dev,
 	blk_queue_physical_block_size(pmem->pmem_queue, PAGE_SIZE);
 	blk_queue_max_hw_sectors(pmem->pmem_queue, UINT_MAX);
 	blk_queue_bounce_limit(pmem->pmem_queue, BLK_BOUNCE_ANY);
+	blk_queue_flush(pmem->pmem_queue, REQ_FLUSH);
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);
 
 	disk = alloc_disk(0);