[v2,0/5] rewrite error handling during mounting stage

Message ID	20220413082957.28774-1-heming.zhao@suse.com (mailing list archive)
Headers	show Return-Path: <bounces+ocfs2-devel=archiver.kernel.org@phx1.rp.oracleemaildelivery.com> To: ocfs2-devel@oss.oracle.com, joseph.qi@linux.alibaba.com Date: Wed, 13 Apr 2022 16:29:52 +0800 Message-id: <20220413082957.28774-1-heming.zhao@suse.com> MIME-version: 1.0 Subject: [Ocfs2-devel] [PATCH v2 0/5] rewrite error handling during mounting stage Precedence: list From: Heming Zhao via Ocfs2-devel <ocfs2-devel@oss.oracle.com> Reply-to: Heming Zhao <heming.zhao@suse.com> Content-type: text/plain; charset="us-ascii" Content-transfer-encoding: 7bit Errors-to: ocfs2-devel-bounces@oss.oracle.com Reporting-Meta: AAEXpB8emGTs2biMMLq/QtX4InpYqftWrvNAKVAzJzlIUqSR0GTrJhelPHnNtR0K VvnahBUxJvfwS5iCtBMImCeVJk49YvtUGIta9sE8Aw7/bpuDbA/7ZmPViTWFEzas sffJoTrDFhY/qT4BLN7kJIUvQqbvHNG1Fpm8edcN/4yZS8MwbfdfzogeJ8hy/Mfb reVUCaIChrUbyCMOzlV5+X/PsNGCxELFltkJ754D1L8j8K3Vni5Td3hf53mxreBu KN28yL9lwow3r5jIXvq7+gMw04dVjlZ3cz6RtgZTZzswkqxrrzEkQreZf/3Omwk8 5uUq89NNM8Zz51cKxgocwxX9MhmdKEo3NsZiJUYP6XdPg/8/gxu+TYoNXSMdVWqT /6n1LN5NE2XZObMLxYDCKmy4cOECIziTMOmog0mwWjw14247FusBNvS3EQttzLaH 9iEYtbi1xXyCXJfaZsU2TW2Gusr++ic9G+WYbSRZKJUvqLVTmALGN/SjiLFQVVoH 0zPQBQ0firIQ+PQ0Ur4Ih07AeWdF8wLLdeHKyiWpEG1I
Series	rewrite error handling during mounting stage \| expand [v2,0/5] rewrite error handling during mounting stage [v2,1/5] ocfs2: fix mounting crash if journal is not alloced [v2,2/5] ocfs2: change return type of ocfs2_resmap_init [v2,3/5] ocfs2: ocfs2_initialize_super does cleanup job before return error [v2,4/5] ocfs2: ocfs2_mount_volume does cleanup job before return error [v2,5/5] ocfs2: rewrite error handling of ocfs2_fill_super

heming.zhao@suse.com April 13, 2022, 8:29 a.m. UTC

** v2 **

patch 1/5:
 - change patch subject.
 - add a new function ocfs2_journal_alloc
 - add new comment at front of ocfs2_journal_alloc
 - in ocfs2_initialize_super, change legacy comment before calling
   ocfs2_journal_alloc.

patch 2/5:
 - revise commit log according review comment.
 - remove meanless comment of ocfs2_resmap_init in header file.

patch 3/5:
 - set "sb->s_fs_info = NULL" (osb is NULL) when ocfs2_initialize_super
   fails.

patch 4/5:
 for ocfs2_mount_volume():
 - make it return 0 (before return "status") as successful.
 - remove goto label "out_journal_shutdown"
 - add more clean job in label "out_system_inodes"
   - add ocfs2_shutdown_local_alloc() to release local alloc resources.
   - add ocfs2_journal_shutdown() to clean up journal.

 for ocfs2_check_volume():
 - add new comment at the front of ocfs2_check_volume.
 - change comment style/format for "struct ocfs2_dinode *local_alloc"

patch 5/5:
 revise commit log.

 in ocfs2_fill_super():
 - change goto label "out_journal" to "out_dismount"
 - add "goto out" in label "out_dismount".

** draft -> v1 ** 

- split one patch into 5 patches.
- goto labal name change to out_xxx style.
- only test for mount/umount & 0001-xx.patch related issue.

Heming Zhao (5):
  ocfs2: fix mounting crash if journal is not alloced
  ocfs2: change return type of ocfs2_resmap_init
  ocfs2: ocfs2_initialize_super does cleanup job before return error
  ocfs2: ocfs2_mount_volume does cleanup job before return error
  ocfs2: rewrite error handling of ocfs2_fill_super

 fs/ocfs2/inode.c        |   4 +-
 fs/ocfs2/journal.c      |  32 ++++---
 fs/ocfs2/reservations.c |   4 +-
 fs/ocfs2/reservations.h |   9 +-
 fs/ocfs2/super.c        | 188 +++++++++++++++++++++++++---------------
 5 files changed, 143 insertions(+), 94 deletions(-)

Joseph Qi April 13, 2022, 10:54 a.m. UTC | #1

On 4/13/22 4:29 PM, Heming Zhao wrote:
> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
> journal init later than before, it makes NULL pointer access in free
> routine.
> 
> Crash flow:
> 
> ocfs2_fill_super
>  + ocfs2_mount_volume
>  |  + ocfs2_dlm_init //fail & return, osb->journal is NULL.
>  |  + ...
>  |  + ocfs2_check_volume //no chance to init osb->journal
>  |
>  + ...
>  + ocfs2_dismount_volume
>     ocfs2_release_system_inodes
>       ...
>        evict
>         ...
>          ocfs2_clear_inode
>           ocfs2_checkpoint_inode
>            ocfs2_ci_fully_checkpointed
>             time_after(journal->j_trans_id, ci->ci_last_trans)
>              + journal is empty, crash!
> 
> For fixing, there are three solutions:
> 
> 1> Partly revert commit da5e7c87827e8
> 
> For avoiding kernel crash, this make sense for us. We only concerned
> whether there has any non-system inode access before dlm init. The
> answer is NO. And all journal replay/recovery handling happen after
> dlm & journal init done. So this method is not graceful but workable.
> 
> 2> Add osb->journal check in free inode routine (eg ocfs2_clear_inode)
> 
> The fix code is special for mounting phase, but it will continue
> working after mounting stage. In another word, this method adds useless
> code in normal inode free flow.
> 
> 3> Do directly free inode in mounting phase
> 
> This method is brutal/complex and may introduce unsafe code, currently
> maintainer didn't like.
> 
> At last, we chose method <1> and did partly reverted job.
> We reverted journal init codes, and kept cleanup codes flow.
> 
> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/inode.c   |  4 ++--
>  fs/ocfs2/journal.c | 32 ++++++++++++++++++++++----------
>  fs/ocfs2/super.c   | 16 ++++++++++++++++
>  3 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 5739dc301569..bb116c39b581 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -125,6 +125,7 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>  	struct inode *inode = NULL;
>  	struct super_block *sb = osb->sb;
>  	struct ocfs2_find_inode_args args;
> +	journal_t *journal = osb->journal->j_journal;
>  
>  	trace_ocfs2_iget_begin((unsigned long long)blkno, flags,
>  			       sysfile_type);
> @@ -171,11 +172,10 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>  	 * part of the transaction - the inode could have been reclaimed and
>  	 * now it is reread from disk.
>  	 */
> -	if (osb->journal) {
> +	if (journal) {
>  		transaction_t *transaction;
>  		tid_t tid;
>  		struct ocfs2_inode_info *oi = OCFS2_I(inode);
> -		journal_t *journal = osb->journal->j_journal;
>  
>  		read_lock(&journal->j_state_lock);
>  		if (journal->j_running_transaction)
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index 1887a2708709..49255fddce45 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -810,22 +810,20 @@ void ocfs2_set_journal_params(struct ocfs2_super *osb)
>  	write_unlock(&journal->j_state_lock);
>  }
>  
> -int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
> +/*
> + * alloc & initialize skeleton for journal structure.
> + * ocfs2_journal_init() will make fs have journal ability.
> + */
> +int ocfs2_journal_alloc(struct ocfs2_super *osb)
>  {
> -	int status = -1;
> -	struct inode *inode = NULL; /* the journal inode */
> -	journal_t *j_journal = NULL;
> -	struct ocfs2_journal *journal = NULL;
> -	struct ocfs2_dinode *di = NULL;
> -	struct buffer_head *bh = NULL;
> -	int inode_lock = 0;
> +	int status = 0;
> +	struct ocfs2_journal *journal;
>  
> -	/* initialize our journal structure */
>  	journal = kzalloc(sizeof(struct ocfs2_journal), GFP_KERNEL);
>  	if (!journal) {
>  		mlog(ML_ERROR, "unable to alloc journal\n");
>  		status = -ENOMEM;
> -		goto done;
> +		goto bail;
>  	}
>  	osb->journal = journal;
>  	journal->j_osb = osb;
> @@ -839,6 +837,20 @@ int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
>  	INIT_WORK(&journal->j_recovery_work, ocfs2_complete_recovery);
>  	journal->j_state = OCFS2_JOURNAL_FREE;
>  
> +bail:
> +	return status;
> +}
> +
> +int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
> +{
> +	int status = -1;
> +	struct inode *inode = NULL; /* the journal inode */
> +	journal_t *j_journal = NULL;
> +	struct ocfs2_journal *journal = osb->journal;
> +	struct ocfs2_dinode *di = NULL;
> +	struct buffer_head *bh = NULL;
> +	int inode_lock = 0;
> +

Better to leave a BUG_ON(!journal) here like before.

>  	/* already have the inode for our journal */
>  	inode = ocfs2_get_system_file_inode(osb, JOURNAL_SYSTEM_INODE,
>  					    osb->slot_num);
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 477cdf94122e..babec2c9d638 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -2015,6 +2015,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	int i, cbits, bbits;
>  	struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;
>  	struct inode *inode = NULL;
> +	struct ocfs2_journal *journal;
>  	struct ocfs2_super *osb;
>  	u64 total_blocks;
>  
> @@ -2195,6 +2196,15 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  
>  	get_random_bytes(&osb->s_next_generation, sizeof(u32));
>  
> +	/*
> +	 * FIXME
> +	 * This should be done in ocfs2_journal_init(), but any inode
> +	 * writes back operation will cause the filesystem to crash.
> +	 */
> +	status = ocfs2_journal_alloc(osb);
> +	if (status)

Explicitly mark 'status < 0'?

Thanks,
Joseph

> +		goto bail;
> +
>  	INIT_WORK(&osb->dquot_drop_work, ocfs2_drop_dquot_refs);
>  	init_llist_head(&osb->dquot_drop_list);
>  
> @@ -2483,6 +2493,12 @@ static void ocfs2_delete_osb(struct ocfs2_super *osb)
>  
>  	kfree(osb->osb_orphan_wipes);
>  	kfree(osb->slot_recovery_generations);
> +	/* FIXME
> +	 * This belongs in journal shutdown, but because we have to
> +	 * allocate osb->journal at the middle of ocfs2_initialize_super(),
> +	 * we free it here.
> +	 */
> +	kfree(osb->journal);
>  	kfree(osb->local_alloc_copy);
>  	kfree(osb->uuid_str);
>  	kfree(osb->vol_label);

Joseph Qi April 13, 2022, 11:02 a.m. UTC | #2

On 4/13/22 4:29 PM, Heming Zhao wrote:
> After this patch, when error, ocfs2_fill_super doesn't take care to
> release resources which are allocated in ocfs2_initialize_super.
> 
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
>  fs/ocfs2/super.c | 59 +++++++++++++++++++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 7076125f5b46..4302c3e9598c 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -2023,7 +2023,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (!osb) {
>  		status = -ENOMEM;
>  		mlog_errno(status);
> -		goto bail;
> +		goto out;
>  	}
>  
>  	sb->s_fs_info = osb;
> @@ -2084,7 +2084,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  		mlog(ML_ERROR, "Invalid number of node slots (%u)\n",
>  		     osb->max_slots);
>  		status = -EINVAL;
> -		goto bail;
> +		goto out;
>  	}
>  
>  	ocfs2_orphan_scan_init(osb);
> @@ -2093,7 +2093,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (status) {
>  		mlog(ML_ERROR, "Unable to initialize recovery state\n");
>  		mlog_errno(status);
> -		goto bail;
> +		goto out;
>  	}
>  
>  	init_waitqueue_head(&osb->checkpoint_event);
> @@ -2117,7 +2117,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (!osb->vol_label) {
>  		mlog(ML_ERROR, "unable to alloc vol label\n");
>  		status = -ENOMEM;
> -		goto bail;
> +		goto out_recovery_map;
>  	}
>  
>  	osb->slot_recovery_generations =
> @@ -2126,7 +2126,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (!osb->slot_recovery_generations) {
>  		status = -ENOMEM;
>  		mlog_errno(status);
> -		goto bail;
> +		goto out_vol_label;
>  	}
>  
>  	init_waitqueue_head(&osb->osb_wipe_event);
> @@ -2136,7 +2136,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (!osb->osb_orphan_wipes) {
>  		status = -ENOMEM;
>  		mlog_errno(status);
> -		goto bail;
> +		goto out_slot_recovery_gen;
>  	}
>  
>  	osb->osb_rf_lock_tree = RB_ROOT;
> @@ -2152,13 +2152,13 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  		mlog(ML_ERROR, "couldn't mount because of unsupported "
>  		     "optional features (%x).\n", i);
>  		status = -EINVAL;
> -		goto bail;
> +		goto out_orphan_wipes;
>  	}
>  	if (!sb_rdonly(osb->sb) && (i = OCFS2_HAS_RO_COMPAT_FEATURE(osb->sb, ~OCFS2_FEATURE_RO_COMPAT_SUPP))) {
>  		mlog(ML_ERROR, "couldn't mount RDWR because of "
>  		     "unsupported optional features (%x).\n", i);
>  		status = -EINVAL;
> -		goto bail;
> +		goto out_orphan_wipes;
>  	}
>  
>  	if (ocfs2_clusterinfo_valid(osb)) {
> @@ -2179,7 +2179,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  			     "cluster stack label (%s) \n",
>  			     osb->osb_cluster_stack);
>  			status = -EINVAL;
> -			goto bail;
> +			goto out_orphan_wipes;
>  		}
>  		memcpy(osb->osb_cluster_name,
>  			OCFS2_RAW_SB(di)->s_cluster_info.ci_cluster,
> @@ -2199,7 +2199,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	 */
>  	status = ocfs2_journal_alloc(osb);
>  	if (status)
> -		goto bail;
> +		goto out_orphan_wipes;
>  
>  	INIT_WORK(&osb->dquot_drop_work, ocfs2_drop_dquot_refs);
>  	init_llist_head(&osb->dquot_drop_list);
> @@ -2214,7 +2214,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  		mlog(ML_ERROR, "Volume has invalid cluster size (%d)\n",
>  		     osb->s_clustersize);
>  		status = -EINVAL;
> -		goto bail;
> +		goto out_journal;
>  	}
>  
>  	total_blocks = ocfs2_clusters_to_blocks(osb->sb,
> @@ -2226,14 +2226,14 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  		mlog(ML_ERROR, "Volume too large "
>  		     "to mount safely on this system");
>  		status = -EFBIG;
> -		goto bail;
> +		goto out_journal;
>  	}
>  
>  	if (ocfs2_setup_osb_uuid(osb, di->id2.i_super.s_uuid,
>  				 sizeof(di->id2.i_super.s_uuid))) {
>  		mlog(ML_ERROR, "Out of memory trying to setup our uuid.\n");
>  		status = -ENOMEM;
> -		goto bail;
> +		goto out_journal;
>  	}
>  
>  	strlcpy(osb->vol_label, di->id2.i_super.s_label,
> @@ -2253,7 +2253,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (!osb->osb_dlm_debug) {
>  		status = -ENOMEM;
>  		mlog_errno(status);
> -		goto bail;
> +		goto out_uuid_str;
>  	}
>  
>  	atomic_set(&osb->vol_state, VOLUME_INIT);
> @@ -2262,7 +2262,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	status = ocfs2_init_global_system_inodes(osb);
>  	if (status < 0) {
>  		mlog_errno(status);
> -		goto bail;
> +		goto out_dlm_out;
>  	}
>  
>  	/*
> @@ -2273,7 +2273,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	if (!inode) {
>  		status = -EINVAL;
>  		mlog_errno(status);
> -		goto bail;
> +		goto out_system_inodes;
>  	}
>  
>  	osb->bitmap_blkno = OCFS2_I(inode)->ip_blkno;
> @@ -2286,16 +2286,39 @@ static int ocfs2_initialize_super(struct super_block *sb,
>  	status = ocfs2_init_slot_info(osb);
>  	if (status < 0) {
>  		mlog_errno(status);
> -		goto bail;
> +		goto out_system_inodes;
>  	}
>  
>  	osb->ocfs2_wq = alloc_ordered_workqueue("ocfs2_wq", WQ_MEM_RECLAIM);
>  	if (!osb->ocfs2_wq) {
>  		status = -ENOMEM;
>  		mlog_errno(status);
> +		goto out_slot_info;
>  	}
>  
> -bail:
> +	return status;
> +
> +out_slot_info:
> +	ocfs2_free_slot_info(osb);
> +out_system_inodes:
> +	ocfs2_release_system_inodes(osb);
> +out_dlm_out:
> +	ocfs2_put_dlm_debug(osb->osb_dlm_debug);
> +out_uuid_str:
> +	kfree(osb->uuid_str);
> +out_journal:
> +	kfree(osb->journal);
> +out_orphan_wipes:
> +	kfree(osb->osb_orphan_wipes);
> +out_slot_recovery_gen:
> +	kfree(osb->slot_recovery_generations);
> +out_vol_label:
> +	kfree(osb->vol_label);
> +out_recovery_map:
> +	kfree(osb->recovery_map);
> +out:
> +	kfree(osb);
> +	sb->s_fs_info = NULL;
>  	return status;
>  }
>

heming.zhao@suse.com April 13, 2022, 1:11 p.m. UTC | #3

On Wed, Apr 13, 2022 at 07:37:59PM +0800, Joseph Qi wrote:
> It mostly looks good to me.
> Have you done error injection test after this patchset applied?
> I'm afraid ocfs2-test can't cover this as of now.
> 

I plan to write a debugfs api for error injection test.
The code only for these patches test

My idea:
echo N > /sys/kernel/debug/ocfs2/xxx/error
N: 1,2,3,4,...

"echo N" will make ocfs2 trigger Nth error point become true:

```
 	status = ocfs2_initialize_super(sb, bh, sector_size, &stats);
 	brelse(bh);
 	bh = NULL;
-	if (status < 0)
+	if (error_inject(n)) //eg. "echo 1 > xx" will trigger there "if()" becomes true.
 		goto out;
```

What's your opinion about my idea?
Or do you have other method to test?

- Heming

> Thanks,
> Joseph
> 
> On 4/13/22 4:29 PM, Heming Zhao wrote:
> > Current ocfs2_fill_super() uses one goto label "read_super_error" to
> > handle all error cases. And with previous serial patches, the error
> > handling should fork more branches to handle different error cases.
> > This patch rewrite the error handling of ocfs2_fill_super.
> > 
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/super.c | 67 +++++++++++++++++++++++-------------------------
> >  1 file changed, 32 insertions(+), 35 deletions(-)
> > 
> > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> > index 5e860d7162d7..72673d40d29c 100644
> > --- a/fs/ocfs2/super.c
> > +++ b/fs/ocfs2/super.c
> > @@ -989,28 +989,27 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  	if (!ocfs2_parse_options(sb, data, &parsed_options, 0)) {
> >  		status = -EINVAL;
> > -		goto read_super_error;
> > +		goto out;
> >  	}
> >  
> >  	/* probe for superblock */
> >  	status = ocfs2_sb_probe(sb, &bh, &sector_size, &stats);
> >  	if (status < 0) {
> >  		mlog(ML_ERROR, "superblock probe failed!\n");
> > -		goto read_super_error;
> > +		goto out;
> >  	}
> >  
> >  	status = ocfs2_initialize_super(sb, bh, sector_size, &stats);
> > -	osb = OCFS2_SB(sb);
> > -	if (status < 0) {
> > -		mlog_errno(status);
> > -		goto read_super_error;
> > -	}
> >  	brelse(bh);
> >  	bh = NULL;
> > +	if (status < 0)
> > +		goto out;
> > +
> > +	osb = OCFS2_SB(sb);
> >  
> >  	if (!ocfs2_check_set_options(sb, &parsed_options)) {
> >  		status = -EINVAL;
> > -		goto read_super_error;
> > +		goto out_super;
> >  	}
> >  	osb->s_mount_opt = parsed_options.mount_opt;
> >  	osb->s_atime_quantum = parsed_options.atime_quantum;
> > @@ -1027,7 +1026,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  	status = ocfs2_verify_userspace_stack(osb, &parsed_options);
> >  	if (status)
> > -		goto read_super_error;
> > +		goto out_super;
> >  
> >  	sb->s_magic = OCFS2_SUPER_MAGIC;
> >  
> > @@ -1041,7 +1040,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  			status = -EACCES;
> >  			mlog(ML_ERROR, "Readonly device detected but readonly "
> >  			     "mount was not specified.\n");
> > -			goto read_super_error;
> > +			goto out_super;
> >  		}
> >  
> >  		/* You should not be able to start a local heartbeat
> > @@ -1050,7 +1049,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  			status = -EROFS;
> >  			mlog(ML_ERROR, "Local heartbeat specified on readonly "
> >  			     "device.\n");
> > -			goto read_super_error;
> > +			goto out_super;
> >  		}
> >  
> >  		status = ocfs2_check_journals_nolocks(osb);
> > @@ -1059,9 +1058,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  				mlog(ML_ERROR, "Recovery required on readonly "
> >  				     "file system, but write access is "
> >  				     "unavailable.\n");
> > -			else
> > -				mlog_errno(status);
> > -			goto read_super_error;
> > +			goto out_super;
> >  		}
> >  
> >  		ocfs2_set_ro_flag(osb, 1);
> > @@ -1077,10 +1074,8 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  	}
> >  
> >  	status = ocfs2_verify_heartbeat(osb);
> > -	if (status < 0) {
> > -		mlog_errno(status);
> > -		goto read_super_error;
> > -	}
> > +	if (status < 0)
> > +		goto out_super;
> >  
> >  	osb->osb_debug_root = debugfs_create_dir(osb->uuid_str,
> >  						 ocfs2_debugfs_root);
> > @@ -1094,15 +1089,14 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  	status = ocfs2_mount_volume(sb);
> >  	if (status < 0)
> > -		goto read_super_error;
> > +		goto out_debugfs;
> >  
> >  	if (osb->root_inode)
> >  		inode = igrab(osb->root_inode);
> >  
> >  	if (!inode) {
> >  		status = -EIO;
> > -		mlog_errno(status);
> > -		goto read_super_error;
> > +		goto out_dismount;
> >  	}
> >  
> >  	osb->osb_dev_kset = kset_create_and_add(sb->s_id, NULL,
> > @@ -1110,7 +1104,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  	if (!osb->osb_dev_kset) {
> >  		status = -ENOMEM;
> >  		mlog(ML_ERROR, "Unable to create device kset %s.\n", sb->s_id);
> > -		goto read_super_error;
> > +		goto out_dismount;
> >  	}
> >  
> >  	/* Create filecheck sysfs related directories/files at
> > @@ -1119,14 +1113,13 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  		status = -ENOMEM;
> >  		mlog(ML_ERROR, "Unable to create filecheck sysfs directory at "
> >  			"/sys/fs/ocfs2/%s/filecheck.\n", sb->s_id);
> > -		goto read_super_error;
> > +		goto out_dismount;
> >  	}
> >  
> >  	root = d_make_root(inode);
> >  	if (!root) {
> >  		status = -ENOMEM;
> > -		mlog_errno(status);
> > -		goto read_super_error;
> > +		goto out_dismount;
> >  	}
> >  
> >  	sb->s_root = root;
> > @@ -1178,17 +1171,21 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  	return status;
> >  
> > -read_super_error:
> > -	brelse(bh);
> > -
> > -	if (status)
> > -		mlog_errno(status);
> > +out_dismount:
> > +	atomic_set(&osb->vol_state, VOLUME_DISABLED);
> > +	wake_up(&osb->osb_mount_event);
> > +	ocfs2_dismount_volume(sb, 1);
> > +	goto out;
> >  
> > -	if (osb) {
> > -		atomic_set(&osb->vol_state, VOLUME_DISABLED);
> > -		wake_up(&osb->osb_mount_event);
> > -		ocfs2_dismount_volume(sb, 1);
> > -	}
> > +out_debugfs:
> > +	debugfs_remove_recursive(osb->osb_debug_root);
> > +out_super:
> > +	ocfs2_release_system_inodes(osb);
> > +	kfree(osb->recovery_map);
> > +	ocfs2_delete_osb(osb);
> > +	kfree(osb);
> > +out:
> > +	mlog_errno(status);
> >  
> >  	return status;
> >  }
>

David Sterba via Ocfs2-devel April 13, 2022, 1:18 p.m. UTC | #4

On 4/13/22 18:54, Joseph Qi wrote:
> 
> 
> On 4/13/22 4:29 PM, Heming Zhao wrote:
>> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
>> journal init later than before, it makes NULL pointer access in free
>> routine.
>>
>> Crash flow:
>>
>> ocfs2_fill_super
>>   + ocfs2_mount_volume
>>   |  + ocfs2_dlm_init //fail & return, osb->journal is NULL.
>>   |  + ...
>>   |  + ocfs2_check_volume //no chance to init osb->journal
>>   |
>>   + ...
>>   + ocfs2_dismount_volume
>>      ocfs2_release_system_inodes
>>        ...
>>         evict
>>          ...
>>           ocfs2_clear_inode
>>            ocfs2_checkpoint_inode
>>             ocfs2_ci_fully_checkpointed
>>              time_after(journal->j_trans_id, ci->ci_last_trans)
>>               + journal is empty, crash!
>>
>> For fixing, there are three solutions:
>>
>> 1> Partly revert commit da5e7c87827e8
>>
>> For avoiding kernel crash, this make sense for us. We only concerned
>> whether there has any non-system inode access before dlm init. The
>> answer is NO. And all journal replay/recovery handling happen after
>> dlm & journal init done. So this method is not graceful but workable.
>>
>> 2> Add osb->journal check in free inode routine (eg ocfs2_clear_inode)
>>
>> The fix code is special for mounting phase, but it will continue
>> working after mounting stage. In another word, this method adds useless
>> code in normal inode free flow.
>>
>> 3> Do directly free inode in mounting phase
>>
>> This method is brutal/complex and may introduce unsafe code, currently
>> maintainer didn't like.
>>
>> At last, we chose method <1> and did partly reverted job.
>> We reverted journal init codes, and kept cleanup codes flow.
>>
>> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>> ---
>>   fs/ocfs2/inode.c   |  4 ++--
>>   fs/ocfs2/journal.c | 32 ++++++++++++++++++++++----------
>>   fs/ocfs2/super.c   | 16 ++++++++++++++++
>>   3 files changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
>> index 5739dc301569..bb116c39b581 100644
>> --- a/fs/ocfs2/inode.c
>> +++ b/fs/ocfs2/inode.c
>> @@ -125,6 +125,7 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>>   	struct inode *inode = NULL;
>>   	struct super_block *sb = osb->sb;
>>   	struct ocfs2_find_inode_args args;
>> +	journal_t *journal = osb->journal->j_journal;
>>   
>>   	trace_ocfs2_iget_begin((unsigned long long)blkno, flags,
>>   			       sysfile_type);
>> @@ -171,11 +172,10 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>>   	 * part of the transaction - the inode could have been reclaimed and
>>   	 * now it is reread from disk.
>>   	 */
>> -	if (osb->journal) {
>> +	if (journal) {
>>   		transaction_t *transaction;
>>   		tid_t tid;
>>   		struct ocfs2_inode_info *oi = OCFS2_I(inode);
>> -		journal_t *journal = osb->journal->j_journal;
>>   
>>   		read_lock(&journal->j_state_lock);
>>   		if (journal->j_running_transaction)
>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>> index 1887a2708709..49255fddce45 100644
>> --- a/fs/ocfs2/journal.c
>> +++ b/fs/ocfs2/journal.c
>> @@ -810,22 +810,20 @@ void ocfs2_set_journal_params(struct ocfs2_super *osb)
>>   	write_unlock(&journal->j_state_lock);
>>   }
>>   
>> -int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
>> +/*
>> + * alloc & initialize skeleton for journal structure.
>> + * ocfs2_journal_init() will make fs have journal ability.
>> + */
>> +int ocfs2_journal_alloc(struct ocfs2_super *osb)
>>   {
>> -	int status = -1;
>> -	struct inode *inode = NULL; /* the journal inode */
>> -	journal_t *j_journal = NULL;
>> -	struct ocfs2_journal *journal = NULL;
>> -	struct ocfs2_dinode *di = NULL;
>> -	struct buffer_head *bh = NULL;
>> -	int inode_lock = 0;
>> +	int status = 0;
>> +	struct ocfs2_journal *journal;
>>   
>> -	/* initialize our journal structure */
>>   	journal = kzalloc(sizeof(struct ocfs2_journal), GFP_KERNEL);
>>   	if (!journal) {
>>   		mlog(ML_ERROR, "unable to alloc journal\n");
>>   		status = -ENOMEM;
>> -		goto done;
>> +		goto bail;
>>   	}
>>   	osb->journal = journal;
>>   	journal->j_osb = osb;
>> @@ -839,6 +837,20 @@ int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
>>   	INIT_WORK(&journal->j_recovery_work, ocfs2_complete_recovery);
>>   	journal->j_state = OCFS2_JOURNAL_FREE;
>>   
>> +bail:
>> +	return status;
>> +}
>> +
>> +int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
>> +{
>> +	int status = -1;
>> +	struct inode *inode = NULL; /* the journal inode */
>> +	journal_t *j_journal = NULL;
>> +	struct ocfs2_journal *journal = osb->journal;
>> +	struct ocfs2_dinode *di = NULL;
>> +	struct buffer_head *bh = NULL;
>> +	int inode_lock = 0;
>> +
> 
> Better to leave a BUG_ON(!journal) here like before.

OK. restore BUG_ON().
The reason I removed it: In theory it's absolutely impossible become NULL.
> 
>>   	/* already have the inode for our journal */
>>   	inode = ocfs2_get_system_file_inode(osb, JOURNAL_SYSTEM_INODE,
>>   					    osb->slot_num);
>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>> index 477cdf94122e..babec2c9d638 100644
>> --- a/fs/ocfs2/super.c
>> +++ b/fs/ocfs2/super.c
>> @@ -2015,6 +2015,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>>   	int i, cbits, bbits;
>>   	struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;
>>   	struct inode *inode = NULL;
>> +	struct ocfs2_journal *journal;
>>   	struct ocfs2_super *osb;
>>   	u64 total_blocks;
>>   
>> @@ -2195,6 +2196,15 @@ static int ocfs2_initialize_super(struct super_block *sb,
>>   
>>   	get_random_bytes(&osb->s_next_generation, sizeof(u32));
>>   
>> +	/*
>> +	 * FIXME
>> +	 * This should be done in ocfs2_journal_init(), but any inode
>> +	 * writes back operation will cause the filesystem to crash.
>> +	 */
>> +	status = ocfs2_journal_alloc(osb);
>> +	if (status)
> 
> Explicitly mark 'status < 0'?
> 

Sorry, my stupid mistake. I didn't compile & test v2 patch, and plan to test before v3.
For test, please check my comment in patch 5/5

- heming

> 
>> +		goto bail;
>> +
>>   	INIT_WORK(&osb->dquot_drop_work, ocfs2_drop_dquot_refs);
>>   	init_llist_head(&osb->dquot_drop_list);
>>   
>> @@ -2483,6 +2493,12 @@ static void ocfs2_delete_osb(struct ocfs2_super *osb)
>>   
>>   	kfree(osb->osb_orphan_wipes);
>>   	kfree(osb->slot_recovery_generations);
>> +	/* FIXME
>> +	 * This belongs in journal shutdown, but because we have to
>> +	 * allocate osb->journal at the middle of ocfs2_initialize_super(),
>> +	 * we free it here.
>> +	 */
>> +	kfree(osb->journal);
>>   	kfree(osb->local_alloc_copy);
>>   	kfree(osb->uuid_str);
>>   	kfree(osb->vol_label);
>

Joseph Qi April 14, 2022, 1:47 a.m. UTC | #5

On 4/13/22 9:11 PM, Heming Zhao wrote:
> On Wed, Apr 13, 2022 at 07:37:59PM +0800, Joseph Qi wrote:
>> It mostly looks good to me.
>> Have you done error injection test after this patchset applied?
>> I'm afraid ocfs2-test can't cover this as of now.
>>
> 
> I plan to write a debugfs api for error injection test.
> The code only for these patches test
> 
> My idea:
> echo N > /sys/kernel/debug/ocfs2/xxx/error
> N: 1,2,3,4,...
> 
> "echo N" will make ocfs2 trigger Nth error point become true:
> 
> ```
>  	status = ocfs2_initialize_super(sb, bh, sector_size, &stats);
>  	brelse(bh);
>  	bh = NULL;
> -	if (status < 0)
> +	if (error_inject(n)) //eg. "echo 1 > xx" will trigger there "if()" becomes true.
>  		goto out;
> ```
> 
> What's your opinion about my idea?
> Or do you have other method to test?

Good idea, actually we did it in this way before since existing
fault-injection can only inject common failures such as slab allocation.

Thanks,
Joseph

[v2,0/5] rewrite error handling during mounting stage

Message

Comments