Message ID | 1435934443-17090-4-git-send-email-matthew.r.wilcox@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 03, 2015 at 10:40:40AM -0400, Matthew Wilcox wrote: > From: Matthew Wilcox <willy@linux.intel.com> > > DAX relies on the get_block function either zeroing newly allocated blocks > before they're findable by subsequent calls to get_block, or marking newly > allocated blocks as unwritten. ext4_get_block() cannot create unwritten > extents, but ext4_get_block_write() can. To be clear, this patch has no prerequistes or dependencies, right? That is, it would be fine if I take this through the ext4 git tree? Or is there a reason or a preference for carrying this patch somewhere else? Also, is there a way I can test the DAX functionality in ext4 using KVM? If so, can you give me a cheat sheet about how I can do that? Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 03, 2015 at 02:30:27PM -0400, Theodore Ts'o wrote: > On Fri, Jul 03, 2015 at 10:40:40AM -0400, Matthew Wilcox wrote: > > From: Matthew Wilcox <willy@linux.intel.com> > > > > DAX relies on the get_block function either zeroing newly allocated blocks > > before they're findable by subsequent calls to get_block, or marking newly > > allocated blocks as unwritten. ext4_get_block() cannot create unwritten > > extents, but ext4_get_block_write() can. > > To be clear, this patch has no prerequistes or dependencies, right? > That is, it would be fine if I take this through the ext4 git tree? > Or is there a reason or a preference for carrying this patch > somewhere else? Right, no dependencies or prerequisites, completely independent of all the other patches. > Also, is there a way I can test the DAX functionality in ext4 using > KVM? If so, can you give me a cheat sheet about how I can do that? I don't use KVM, but I can tell you what I do ... (additional explanation added, not for Ted's benefit, but because people less familiar with Linux than Ted is may happen upon this email for their own purposes). In /etc/default/grub, I have this line: GRUB_CMDLINE_LINUX="memmap=4G!4G" memmap=nn[KMG]!ss[KMG] [KNL,X86] Mark specific memory as protected. Region of memory to be used, from ss to ss+nn. The memory region may be marked as e820 type 12 (0xc) and is NVDIMM or ADR memory. In my kernel config, I have: CONFIG_X86_PMEM_LEGACY=y CONFIG_LIBNVDIMM=y CONFIG_BLK_DEV_PMEM=m At boot, I "modprobe pmem". On the desktop-class system I'm using as my development machine, the BIOS doesn't clear RAM between boots (only power cycles), so the partition table and ext4 filesystem stays good, and all I have to do is: mount -odax /dev/pmem0p1 /mnt/ram0/ Also my xfstests local.config: TEST_DEV=/dev/pmem0p1 TEST_DIR=/mnt/ram0 SCRATCH_DEV=/dev/pmem0p2 SCRATCH_MNT=/mnt/ram1 TEST_FS_MOUNT_OPTS="-o dax" EXT_MOUNT_OPTIONS="-o dax" MKFS_OPTIONS="-b4096" Hope I didn't forget anything. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 03, 2015 at 02:48:24PM -0400, Matthew Wilcox wrote: > > At boot, I "modprobe pmem". Is there a reason why it's important to build and load pmem as a module? If I use CONFIG_BLK_DEV_PMEM=y (which is more convenient given how I launch my KVM test appliance), should I expect any problems? I assume that this won't detect any bugs caused by missing CLFLUSH instructions, but I assume that when using NVM as a block device, this isn't much of an issue, as long as we don't care about torn writes? (How using NVM with metdata checksums, or any checksums for that matter, seems to be an interesting question --- how do we recover from a checksum failure after a power failure?) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/03/2015 10:07 PM, Theodore Ts'o wrote: > On Fri, Jul 03, 2015 at 02:48:24PM -0400, Matthew Wilcox wrote: >> >> At boot, I "modprobe pmem". > > Is there a reason why it's important to build and load pmem as a > module? If I use CONFIG_BLK_DEV_PMEM=y (which is more convenient > given how I launch my KVM test appliance), should I expect any > problems? > This (=y) should work fine. We use it a lot. (with KVM even boot an image with -dax, with a trick) Note that DAX need not be tested with pmem only, you can always use brd at any given point without any reboot. One more trick for xfstest I use: memmap=2G!4G,2G!6G And have two pmem0/1 and don't need to bother with any fdisk. Do need to mkfs every boot though. BTW: with kvm a reboot with above memmap will give you back the exact same memory. halt and "virsh start" is a different story. > I assume that this won't detect any bugs caused by missing CLFLUSH > instructions, but I assume that when using NVM as a block device, this > isn't much of an issue, as long as we don't care about torn writes? > (How using NVM with metdata checksums, or any checksums for that > matter, seems to be an interesting question --- how do we recover from > a checksum failure after a power failure?) > Currently pmem maps a very-(very) slow ioremap_nocache. So any Kernel memory access should be pmem persistent. For a real world faster ioremap, there are few major pieces still missing in the stack to make it persistent. Note the even today with ioremap_nocache, any application mmap (like git) is not persistent. > - Ted Cheers Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index bc313ac..e5bdcb7 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -195,7 +195,7 @@ out: static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate) { struct inode *inode = bh->b_assoc_map->host; - /* XXX: breaks on 32-bit > 16GB. Is that even supported? */ + /* XXX: breaks on 32-bit > 16TB. Is that even supported? */ loff_t offset = (loff_t)(uintptr_t)bh->b_private << inode->i_blkbits; int err; if (!uptodate) @@ -206,13 +206,13 @@ static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate) static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { - return dax_fault(vma, vmf, ext4_get_block, ext4_end_io_unwritten); - /* Is this the right get_block? */ + return dax_fault(vma, vmf, ext4_get_block_write, ext4_end_io_unwritten); } static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) { - return dax_mkwrite(vma, vmf, ext4_get_block, ext4_end_io_unwritten); + return dax_mkwrite(vma, vmf, ext4_get_block_write, + ext4_end_io_unwritten); } static const struct vm_operations_struct ext4_dax_vm_ops = {