Message ID | 20240705081514.1901580-1-dongliang.cui@unisoc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | exfat: check disk status during buffer write | expand |
> We found that when writing a large file through buffer write, > if the disk is inaccessible, exFAT does not return an error > normally, which leads to the writing process not stopping properly. > > To easily reproduce this issue, you can follow the steps below: > > 1. format a device to exFAT and then mount (with a full disk erase) > 2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192 > 3. eject the device > > You may find that the dd process does not stop immediately and may > continue for a long time. > > We compared it with the FAT, where FAT would prompt an EIO error and > immediately stop the dd operation. > > The root cause of this issue is that when the exfat_inode contains the > ALLOC_NO_FAT_CHAIN flag, exFAT does not need to access the disk to > look up directory entries or the FAT table (whereas FAT would do) > every time data is written. Instead, exFAT simply marks the buffer as > dirty and returns, delegating the writeback operation to the writeback > process. > > If the disk cannot be accessed at this time, the error will only be > returned to the writeback process, and the original process will not > receive the error, so it cannot be returned to the user side. > > Therefore, we think that when writing files with ALLOC_NO_FAT_CHAIN, > it is necessary to continuously check the status of the disk. > > When the disk cannot be accessed normally, an error should be returned > to stop the writing process. > > Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com> > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> > --- > fs/exfat/exfat_fs.h | 5 +++++ > fs/exfat/inode.c | 5 +++++ > 2 files changed, 10 insertions(+) > > diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h > index ecc5db952deb..c5f5a7a8b672 100644 > --- a/fs/exfat/exfat_fs.h > +++ b/fs/exfat/exfat_fs.h > @@ -411,6 +411,11 @@ static inline unsigned int > exfat_sector_to_cluster(struct exfat_sb_info *sbi, > EXFAT_RESERVED_CLUSTERS; > } > > +static inline bool exfat_check_disk_error(struct block_device *bdev) > +{ > + return blk_queue_dying(bdev_get_queue(bdev)); Why don't you check it like ext4? static int block_device_ejected(struct super_block *sb) { struct inode *bd_inode = sb->s_bdev->bd_inode; struct backing_dev_info *bdi = inode_to_bdi(bd_inode); return bdi->dev == NULL; } > +} > + > static inline bool is_valid_cluster(struct exfat_sb_info *sbi, > unsigned int clus) > { > diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c > index dd894e558c91..efd02c1c83a6 100644 > --- a/fs/exfat/inode.c > +++ b/fs/exfat/inode.c > @@ -147,6 +147,11 @@ static int exfat_map_cluster(struct inode *inode, > unsigned int clu_offset, > *clu = last_clu = ei->start_clu; > > if (ei->flags == ALLOC_NO_FAT_CHAIN) { > + if (exfat_check_disk_error(sb->s_bdev)) { > + exfat_fs_error(sb, "device inaccessiable!\n"); > + return -EIO; This patch looks useful when using removable storage devices. BTW, in case of "ei->flags != ALLOC_NO_FAT_CHAIN", There could be the same problem if it can be found from lru_cache. So, it would be nice to check disk_error regardless ei->flags. Also, Calling exfat_fs_error() seems unnecessary. Instead, let's return -ENODEV instead of -EIO. I believe that these errors will be handled on exfat_get_block() Thanks. > + } > + > if (clu_offset > 0 && *clu != EXFAT_EOF_CLUSTER) { > last_clu += clu_offset - 1; > > -- > 2.25.1
> We found that when writing a large file through buffer write, if the > disk is inaccessible, exFAT does not return an error normally, which > leads to the writing process not stopping properly. > > To easily reproduce this issue, you can follow the steps below: > > 1. format a device to exFAT and then mount (with a full disk erase) 2. > dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192 3. eject the > device > > You may find that the dd process does not stop immediately and may > continue for a long time. > > We compared it with the FAT, where FAT would prompt an EIO error and > immediately stop the dd operation. > > The root cause of this issue is that when the exfat_inode contains the > ALLOC_NO_FAT_CHAIN flag, exFAT does not need to access the disk to > look up directory entries or the FAT table (whereas FAT would do) > every time data is written. Instead, exFAT simply marks the buffer as > dirty and returns, delegating the writeback operation to the writeback > process. > > If the disk cannot be accessed at this time, the error will only be > returned to the writeback process, and the original process will not > receive the error, so it cannot be returned to the user side. > > Therefore, we think that when writing files with ALLOC_NO_FAT_CHAIN, > it is necessary to continuously check the status of the disk. > > When the disk cannot be accessed normally, an error should be returned > to stop the writing process. > > Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com> > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> > --- > fs/exfat/exfat_fs.h | 5 +++++ > fs/exfat/inode.c | 5 +++++ > 2 files changed, 10 insertions(+) > > diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h index > ecc5db952deb..c5f5a7a8b672 100644 > --- a/fs/exfat/exfat_fs.h > +++ b/fs/exfat/exfat_fs.h > @@ -411,6 +411,11 @@ static inline unsigned int > exfat_sector_to_cluster(struct exfat_sb_info *sbi, > EXFAT_RESERVED_CLUSTERS; } > > +static inline bool exfat_check_disk_error(struct block_device *bdev) > +{ > + return blk_queue_dying(bdev_get_queue(bdev)); Why don't you check it like ext4? static int block_device_ejected(struct super_block *sb) { struct inode *bd_inode = sb->s_bdev->bd_inode; struct backing_dev_info *bdi = inode_to_bdi(bd_inode); return bdi->dev == NULL; } The block_device->bd_inode has been removed in the latest code. We might be able to use super_block->s_bdi->dev for the judgment, or perhaps use blk_queue_dying? > +} > + > static inline bool is_valid_cluster(struct exfat_sb_info *sbi, > unsigned int clus) > { > diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c index > dd894e558c91..efd02c1c83a6 100644 > --- a/fs/exfat/inode.c > +++ b/fs/exfat/inode.c > @@ -147,6 +147,11 @@ static int exfat_map_cluster(struct inode *inode, > unsigned int clu_offset, > *clu = last_clu = ei->start_clu; > > if (ei->flags == ALLOC_NO_FAT_CHAIN) { > + if (exfat_check_disk_error(sb->s_bdev)) { > + exfat_fs_error(sb, "device inaccessiable!\n"); > + return -EIO; This patch looks useful when using removable storage devices. BTW, in case of "ei->flags != ALLOC_NO_FAT_CHAIN", There could be the same problem if it can be found from lru_cache. So, it would be nice to check disk_error regardless ei->flags. Also, Calling exfat_fs_error() seems unnecessary. Instead, let's return -ENODEV instead of -EIO. I believe that these errors will be handled on exfat_get_block() Thanks. > + } > + > if (clu_offset > 0 && *clu != EXFAT_EOF_CLUSTER) { > last_clu += clu_offset - 1; > > -- > 2.25.1
On Mon, Jul 15, 2024 at 4:51 PM 崔东亮 (Dongliang Cui) <Dongliang.Cui@unisoc.com> wrote: > > > We found that when writing a large file through buffer write, if the > > disk is inaccessible, exFAT does not return an error normally, which > > leads to the writing process not stopping properly. > > > > To easily reproduce this issue, you can follow the steps below: > > > > 1. format a device to exFAT and then mount (with a full disk erase) 2. > > dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192 3. eject the > > device > > > > You may find that the dd process does not stop immediately and may > > continue for a long time. > > > > We compared it with the FAT, where FAT would prompt an EIO error and > > immediately stop the dd operation. > > > > The root cause of this issue is that when the exfat_inode contains the > > ALLOC_NO_FAT_CHAIN flag, exFAT does not need to access the disk to > > look up directory entries or the FAT table (whereas FAT would do) > > every time data is written. Instead, exFAT simply marks the buffer as > > dirty and returns, delegating the writeback operation to the writeback > > process. > > > > If the disk cannot be accessed at this time, the error will only be > > returned to the writeback process, and the original process will not > > receive the error, so it cannot be returned to the user side. > > > > Therefore, we think that when writing files with ALLOC_NO_FAT_CHAIN, > > it is necessary to continuously check the status of the disk. > > > > When the disk cannot be accessed normally, an error should be returned > > to stop the writing process. > > > > Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com> > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> > > --- > > fs/exfat/exfat_fs.h | 5 +++++ > > fs/exfat/inode.c | 5 +++++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h index > > ecc5db952deb..c5f5a7a8b672 100644 > > --- a/fs/exfat/exfat_fs.h > > +++ b/fs/exfat/exfat_fs.h > > @@ -411,6 +411,11 @@ static inline unsigned int > > exfat_sector_to_cluster(struct exfat_sb_info *sbi, > > EXFAT_RESERVED_CLUSTERS; } > > > > +static inline bool exfat_check_disk_error(struct block_device *bdev) > > +{ > > + return blk_queue_dying(bdev_get_queue(bdev)); > Why don't you check it like ext4? > > static int block_device_ejected(struct super_block *sb) { > struct inode *bd_inode = sb->s_bdev->bd_inode; > struct backing_dev_info *bdi = inode_to_bdi(bd_inode); > > return bdi->dev == NULL; > } > > The block_device->bd_inode has been removed in the latest code. > We might be able to use super_block->s_bdi->dev for the judgment, > or perhaps use blk_queue_dying? Hi, To provide more information, Related commits for the removal of bd_inode, 203c1ce0bb06 RIP ->bd_inode Since bd_inode has been removed, referencing the method of ext4 might not be feasible. Can we consider using one of the following two methods instead? For example, struct int exfat_block_device_ejected(struct super_block *sb) { struct backing_dev_info *bdi = sb->s_bdi; return bdi->dev == NULL; } Or, static inline bool exfat_check_disk_error(struct block_device *bdev) { return blk_queue_dying(bdev_get_queue(bdev)); } Are there any other suggestions? > > static inline bool is_valid_cluster(struct exfat_sb_info *sbi, > > unsigned int clus) > > { > > diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c index > > dd894e558c91..efd02c1c83a6 100644 > > --- a/fs/exfat/inode.c > > +++ b/fs/exfat/inode.c > > @@ -147,6 +147,11 @@ static int exfat_map_cluster(struct inode *inode, > > unsigned int clu_offset, > > *clu = last_clu = ei->start_clu; > > > > if (ei->flags == ALLOC_NO_FAT_CHAIN) { > > + if (exfat_check_disk_error(sb->s_bdev)) { > > + exfat_fs_error(sb, "device inaccessiable!\n"); > > + return -EIO; > This patch looks useful when using removable storage devices. > BTW, in case of "ei->flags != ALLOC_NO_FAT_CHAIN", There could be the same problem if it can be found from lru_cache. So, it would be nice to check disk_error regardless ei->flags. Also, Calling exfat_fs_error() seems unnecessary. Instead, let's return -ENODEV instead of -EIO. > I believe that these errors will be handled on exfat_get_block() > > > Thanks. > > + } > > + > > if (clu_offset > 0 && *clu != EXFAT_EOF_CLUSTER) { > > last_clu += clu_offset - 1; > > > > -- > > 2.25.1 > >
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h index ecc5db952deb..c5f5a7a8b672 100644 --- a/fs/exfat/exfat_fs.h +++ b/fs/exfat/exfat_fs.h @@ -411,6 +411,11 @@ static inline unsigned int exfat_sector_to_cluster(struct exfat_sb_info *sbi, EXFAT_RESERVED_CLUSTERS; } +static inline bool exfat_check_disk_error(struct block_device *bdev) +{ + return blk_queue_dying(bdev_get_queue(bdev)); +} + static inline bool is_valid_cluster(struct exfat_sb_info *sbi, unsigned int clus) { diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c index dd894e558c91..efd02c1c83a6 100644 --- a/fs/exfat/inode.c +++ b/fs/exfat/inode.c @@ -147,6 +147,11 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset, *clu = last_clu = ei->start_clu; if (ei->flags == ALLOC_NO_FAT_CHAIN) { + if (exfat_check_disk_error(sb->s_bdev)) { + exfat_fs_error(sb, "device inaccessiable!\n"); + return -EIO; + } + if (clu_offset > 0 && *clu != EXFAT_EOF_CLUSTER) { last_clu += clu_offset - 1;