Message ID | 1452103263-1592-2-git-send-email-ross.zwisler@linux.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > __dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into > bdevname() where is is dereferenced. This assumption isn't always true - > when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer > head where bh->b_bdev is never set. I hit this BUG while testing the DAX > PMD fault path. > > Instead, verify that we have a valid bh->b_bdev, else just say "unknown" > for the block device. > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > --- > fs/dax.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 7af8797..03cc4a3 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, > { > if (bh) { > char bname[BDEVNAME_SIZE]; > - bdevname(bh->b_bdev, bname); > + > + if (bh->b_bdev) > + bdevname(bh->b_bdev, bname); > + else > + snprintf(bname, BDEVNAME_SIZE, "unknown"); > + > pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " > "length %zd fallback: %s\n", fn, current->comm, > address, bname, bh->b_state, (u64)bh->b_blocknr, I'm assuming there's no danger of a such a buffer_head ever being used for the bdev parameter to dax_map_atomic()? Shouldn't we also/instead go fix ext4 to not send partially filled buffer_heads?
On Wed 06-01-16 11:14:09, Dan Williams wrote: > On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler > <ross.zwisler@linux.intel.com> wrote: > > __dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into > > bdevname() where is is dereferenced. This assumption isn't always true - > > when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer > > head where bh->b_bdev is never set. I hit this BUG while testing the DAX > > PMD fault path. > > > > Instead, verify that we have a valid bh->b_bdev, else just say "unknown" > > for the block device. > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > > Cc: Dan Williams <dan.j.williams@intel.com> > > --- > > fs/dax.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 7af8797..03cc4a3 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, > > { > > if (bh) { > > char bname[BDEVNAME_SIZE]; > > - bdevname(bh->b_bdev, bname); > > + > > + if (bh->b_bdev) > > + bdevname(bh->b_bdev, bname); > > + else > > + snprintf(bname, BDEVNAME_SIZE, "unknown"); > > + > > pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " > > "length %zd fallback: %s\n", fn, current->comm, > > address, bname, bh->b_state, (u64)bh->b_blocknr, > > I'm assuming there's no danger of a such a buffer_head ever being used > for the bdev parameter to dax_map_atomic()? Shouldn't we also/instead > go fix ext4 to not send partially filled buffer_heads? No. The real problem is a long-standing abuse of struct buffer_head to be used for passing block mapping information (it's on my todo list to remove that at least from DAX code and use cleaner block mapping interface but first I want basic DAX functionality to settle down to avoid unnecessary conflicts). Filesystem is not supposed to touch bh->b_bdev. If you need that filled in, set it yourself in before passing bh to the block mapping function. Honza
On Thu, Jan 7, 2016 at 1:34 AM, Jan Kara <jack@suse.cz> wrote: > On Wed 06-01-16 11:14:09, Dan Williams wrote: >> On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler >> <ross.zwisler@linux.intel.com> wrote: >> > __dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into >> > bdevname() where is is dereferenced. This assumption isn't always true - >> > when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer >> > head where bh->b_bdev is never set. I hit this BUG while testing the DAX >> > PMD fault path. >> > >> > Instead, verify that we have a valid bh->b_bdev, else just say "unknown" >> > for the block device. >> > >> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> >> > Cc: Dan Williams <dan.j.williams@intel.com> >> > --- >> > fs/dax.c | 7 ++++++- >> > 1 file changed, 6 insertions(+), 1 deletion(-) >> > >> > diff --git a/fs/dax.c b/fs/dax.c >> > index 7af8797..03cc4a3 100644 >> > --- a/fs/dax.c >> > +++ b/fs/dax.c >> > @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, >> > { >> > if (bh) { >> > char bname[BDEVNAME_SIZE]; >> > - bdevname(bh->b_bdev, bname); >> > + >> > + if (bh->b_bdev) >> > + bdevname(bh->b_bdev, bname); >> > + else >> > + snprintf(bname, BDEVNAME_SIZE, "unknown"); >> > + >> > pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " >> > "length %zd fallback: %s\n", fn, current->comm, >> > address, bname, bh->b_state, (u64)bh->b_blocknr, >> >> I'm assuming there's no danger of a such a buffer_head ever being used >> for the bdev parameter to dax_map_atomic()? Shouldn't we also/instead >> go fix ext4 to not send partially filled buffer_heads? > > No. The real problem is a long-standing abuse of struct buffer_head to be > used for passing block mapping information (it's on my todo list to remove > that at least from DAX code and use cleaner block mapping interface but > first I want basic DAX functionality to settle down to avoid unnecessary > conflicts). Filesystem is not supposed to touch bh->b_bdev. If you need > that filled in, set it yourself in before passing bh to the block mapping > function. > Ok, makes sense. Ross, can you fix this instead by unconditionally looking up the bdev rather that saying "unknown". The bdev should always be retrievable.
On Thu, Jan 07, 2016 at 07:17:22AM -0800, Dan Williams wrote: > On Thu, Jan 7, 2016 at 1:34 AM, Jan Kara <jack@suse.cz> wrote: > > On Wed 06-01-16 11:14:09, Dan Williams wrote: > >> On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler > >> <ross.zwisler@linux.intel.com> wrote: > >> > __dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into > >> > bdevname() where is is dereferenced. This assumption isn't always true - > >> > when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer > >> > head where bh->b_bdev is never set. I hit this BUG while testing the DAX > >> > PMD fault path. > >> > > >> > Instead, verify that we have a valid bh->b_bdev, else just say "unknown" > >> > for the block device. > >> > > >> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > >> > Cc: Dan Williams <dan.j.williams@intel.com> > >> > --- > >> > fs/dax.c | 7 ++++++- > >> > 1 file changed, 6 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/fs/dax.c b/fs/dax.c > >> > index 7af8797..03cc4a3 100644 > >> > --- a/fs/dax.c > >> > +++ b/fs/dax.c > >> > @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, > >> > { > >> > if (bh) { > >> > char bname[BDEVNAME_SIZE]; > >> > - bdevname(bh->b_bdev, bname); > >> > + > >> > + if (bh->b_bdev) > >> > + bdevname(bh->b_bdev, bname); > >> > + else > >> > + snprintf(bname, BDEVNAME_SIZE, "unknown"); > >> > + > >> > pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " > >> > "length %zd fallback: %s\n", fn, current->comm, > >> > address, bname, bh->b_state, (u64)bh->b_blocknr, > >> > >> I'm assuming there's no danger of a such a buffer_head ever being used > >> for the bdev parameter to dax_map_atomic()? Shouldn't we also/instead > >> go fix ext4 to not send partially filled buffer_heads? > > > > No. The real problem is a long-standing abuse of struct buffer_head to be > > used for passing block mapping information (it's on my todo list to remove > > that at least from DAX code and use cleaner block mapping interface but > > first I want basic DAX functionality to settle down to avoid unnecessary > > conflicts). Filesystem is not supposed to touch bh->b_bdev. If you need > > that filled in, set it yourself in before passing bh to the block mapping > > function. > > > > Ok, makes sense. > > Ross, can you fix this instead by unconditionally looking up the bdev > rather that saying "unknown". The bdev should always be retrievable. Sure, will do.
On Thu, Jan 07, 2016 at 10:34:02AM +0100, Jan Kara wrote: > On Wed 06-01-16 11:14:09, Dan Williams wrote: > > On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler > > <ross.zwisler@linux.intel.com> wrote: > > > __dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into > > > bdevname() where is is dereferenced. This assumption isn't always true - > > > when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer > > > head where bh->b_bdev is never set. I hit this BUG while testing the DAX > > > PMD fault path. > > > > > > Instead, verify that we have a valid bh->b_bdev, else just say "unknown" > > > for the block device. > > > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > --- > > > fs/dax.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > index 7af8797..03cc4a3 100644 > > > --- a/fs/dax.c > > > +++ b/fs/dax.c > > > @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, > > > { > > > if (bh) { > > > char bname[BDEVNAME_SIZE]; > > > - bdevname(bh->b_bdev, bname); > > > + > > > + if (bh->b_bdev) > > > + bdevname(bh->b_bdev, bname); > > > + else > > > + snprintf(bname, BDEVNAME_SIZE, "unknown"); > > > + > > > pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " > > > "length %zd fallback: %s\n", fn, current->comm, > > > address, bname, bh->b_state, (u64)bh->b_blocknr, > > > > I'm assuming there's no danger of a such a buffer_head ever being used > > for the bdev parameter to dax_map_atomic()? Shouldn't we also/instead > > go fix ext4 to not send partially filled buffer_heads? > > No. The real problem is a long-standing abuse of struct buffer_head to be > used for passing block mapping information (it's on my todo list to remove > that at least from DAX code and use cleaner block mapping interface but > first I want basic DAX functionality to settle down to avoid unnecessary > conflicts). Filesystem is not supposed to touch bh->b_bdev. That has not been true for a long, long time. e.g. XFS always rewrites bh->b_bdev in get_blocks because the file may not reside on the primary block device of the filesystem. i.e.: /* * If this is a realtime file, data may be on a different device. * to that pointed to from the buffer_head b_bdev currently. */ bh_result->b_bdev = xfs_find_bdev_for_inode(inode); > If you need > that filled in, set it yourself in before passing bh to the block mapping > function. That may be true, but we cannot assume that the bdev coming back out of get_block is the same one that was passed in. Cheers, Dave.
On Fri, Jan 08, 2016 at 10:10:00AM +1100, Dave Chinner wrote: > On Thu, Jan 07, 2016 at 10:34:02AM +0100, Jan Kara wrote: > > On Wed 06-01-16 11:14:09, Dan Williams wrote: > > > On Wed, Jan 6, 2016 at 10:00 AM, Ross Zwisler > > > <ross.zwisler@linux.intel.com> wrote: > > > > __dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into > > > > bdevname() where is is dereferenced. This assumption isn't always true - > > > > when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer > > > > head where bh->b_bdev is never set. I hit this BUG while testing the DAX > > > > PMD fault path. > > > > > > > > Instead, verify that we have a valid bh->b_bdev, else just say "unknown" > > > > for the block device. > > > > > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > --- > > > > fs/dax.c | 7 ++++++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > > index 7af8797..03cc4a3 100644 > > > > --- a/fs/dax.c > > > > +++ b/fs/dax.c > > > > @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, > > > > { > > > > if (bh) { > > > > char bname[BDEVNAME_SIZE]; > > > > - bdevname(bh->b_bdev, bname); > > > > + > > > > + if (bh->b_bdev) > > > > + bdevname(bh->b_bdev, bname); > > > > + else > > > > + snprintf(bname, BDEVNAME_SIZE, "unknown"); > > > > + > > > > pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " > > > > "length %zd fallback: %s\n", fn, current->comm, > > > > address, bname, bh->b_state, (u64)bh->b_blocknr, > > > > > > I'm assuming there's no danger of a such a buffer_head ever being used > > > for the bdev parameter to dax_map_atomic()? Shouldn't we also/instead > > > go fix ext4 to not send partially filled buffer_heads? > > > > No. The real problem is a long-standing abuse of struct buffer_head to be > > used for passing block mapping information (it's on my todo list to remove > > that at least from DAX code and use cleaner block mapping interface but > > first I want basic DAX functionality to settle down to avoid unnecessary > > conflicts). Filesystem is not supposed to touch bh->b_bdev. > > That has not been true for a long, long time. e.g. XFS always > rewrites bh->b_bdev in get_blocks because the file may not reside on > the primary block device of the filesystem. i.e.: > > /* > * If this is a realtime file, data may be on a different device. > * to that pointed to from the buffer_head b_bdev currently. > */ > bh_result->b_bdev = xfs_find_bdev_for_inode(inode); > > > If you need > > that filled in, set it yourself in before passing bh to the block mapping > > function. > > That may be true, but we cannot assume that the bdev coming back > out of get_block is the same one that was passed in. For our use case I think this is fine - we just need the bdev to be filled in so that we can print reasonable error messages. If the filesystem updates bh->b_bdev during get_blocks(), we are fine with that.
diff --git a/fs/dax.c b/fs/dax.c index 7af8797..03cc4a3 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -563,7 +563,12 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address, { if (bh) { char bname[BDEVNAME_SIZE]; - bdevname(bh->b_bdev, bname); + + if (bh->b_bdev) + bdevname(bh->b_bdev, bname); + else + snprintf(bname, BDEVNAME_SIZE, "unknown"); + pr_debug("%s: %s addr: %lx dev %s state %lx start %lld " "length %zd fallback: %s\n", fn, current->comm, address, bname, bh->b_state, (u64)bh->b_blocknr,
__dax_dbg() currently assumes that bh->b_bdev is non-NULL, passing it into bdevname() where is is dereferenced. This assumption isn't always true - when called for reads of holes, ext4_dax_mmap_get_block() returns a buffer head where bh->b_bdev is never set. I hit this BUG while testing the DAX PMD fault path. Instead, verify that we have a valid bh->b_bdev, else just say "unknown" for the block device. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> --- fs/dax.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)