From patchwork Wed Dec 17 21:33:31 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 5509341 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 1337E9F326 for ; Wed, 17 Dec 2014 21:34:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 010E7209F9 for ; Wed, 17 Dec 2014 21:34:26 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A781C209E5 for ; Wed, 17 Dec 2014 21:34:24 +0000 (UTC) Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id sBHLYATj001190 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 17 Dec 2014 21:34:10 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id sBHLY8SY019709 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 17 Dec 2014 21:34:08 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Y1MES-0000Aa-3B; Wed, 17 Dec 2014 13:34:08 -0800 Received: from ucsinet22.oracle.com ([156.151.31.94]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Y1ME1-0000AG-Q8 for ocfs2-devel@oss.oracle.com; Wed, 17 Dec 2014 13:33:42 -0800 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by ucsinet22.oracle.com (8.14.5+Sun/8.14.5) with ESMTP id sBHLXekr015510 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Wed, 17 Dec 2014 21:33:41 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id sBHLXcfi009930 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 17 Dec 2014 21:33:40 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.14.7/8.14.7) with SMTP id sBHLX64H038840 for ; Wed, 17 Dec 2014 21:33:38 GMT Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) by userp2040.oracle.com with ESMTP id 1rb9ewfjrw-1 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 17 Dec 2014 21:33:38 +0000 Received: from akpm3.mtv.corp.google.com (unknown [216.239.45.95]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 3FDE8AB9; Wed, 17 Dec 2014 21:33:32 +0000 (UTC) Date: Wed, 17 Dec 2014 13:33:31 -0800 From: Andrew Morton To: ocfs2-devel@oss.oracle.com, vicky.yangwenfang@huawei.com, jlbec@evilplan.org, mfasheh@suse.com Message-Id: <20141217133331.282ffc6f592fb329f3b7edad@linux-foundation.org> In-Reply-To: <548f65d1.4P/QIizRlX/OfJ/p%akpm@linux-foundation.org> References: <548f65d1.4P/QIizRlX/OfJ/p%akpm@linux-foundation.org> X-Mailer: Sylpheed 3.4.0beta7 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 X-ServerName: mail.linuxfoundation.org X-Proofpoint-Virus-Version: vendor=nai engine=5600 definitions=7655 signatures=670597 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1412170212 Subject: Re: [Ocfs2-devel] [patch 03/15] ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock() X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP So I now have a mess on my hands due to reordering ocfs2-fix-journal-commit-deadlock.patch ahead of this patch. It concerns the label "out:". Should it be placed before or after the call to ocfs2_unlock_pages()? My current copy of ocfs2_write_end_nolock() is below, followed by my current version of ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch Thanks. int ocfs2_write_end_nolock(struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { int i, ret; unsigned from, to, start = pos & (PAGE_CACHE_SIZE - 1); struct inode *inode = mapping->host; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct ocfs2_write_ctxt *wc = fsdata; struct ocfs2_dinode *di = (struct ocfs2_dinode *)wc->w_di_bh->b_data; handle_t *handle = wc->w_handle; struct page *tmppage; if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) { ocfs2_write_end_inline(inode, pos, len, &copied, di, wc); goto out_write_size; } if (unlikely(copied < len)) { if (!PageUptodate(wc->w_target_page)) copied = 0; ocfs2_zero_new_buffers(wc->w_target_page, start+copied, start+len); } flush_dcache_page(wc->w_target_page); for(i = 0; i < wc->w_num_pages; i++) { tmppage = wc->w_pages[i]; if (tmppage == wc->w_target_page) { from = wc->w_target_from; to = wc->w_target_to; BUG_ON(from > PAGE_CACHE_SIZE || to > PAGE_CACHE_SIZE || to < from); } else { /* * Pages adjacent to the target (if any) imply * a hole-filling write in which case we want * to flush their entire range. */ from = 0; to = PAGE_CACHE_SIZE; } if (page_has_buffers(tmppage)) { if (ocfs2_should_order_data(inode)) ocfs2_jbd2_file_inode(wc->w_handle, inode); block_commit_write(tmppage, from, to); } } ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc->w_di_bh, OCFS2_JOURNAL_ACCESS_WRITE); if (ret) { copied = ret; mlog_errno(ret); goto out; } out_write_size: pos += copied; if (pos > i_size_read(inode)) { i_size_write(inode, pos); mark_inode_dirty(inode); } inode->i_blocks = ocfs2_inode_sector_count(inode); di->i_size = cpu_to_le64((u64)i_size_read(inode)); inode->i_mtime = inode->i_ctime = CURRENT_TIME; di->i_mtime = di->i_ctime = cpu_to_le64(inode->i_mtime.tv_sec); di->i_mtime_nsec = di->i_ctime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec); ocfs2_update_inode_fsync_trans(handle, inode, 1); ocfs2_journal_dirty(handle, wc->w_di_bh); /* unlock pages before dealloc since it needs acquiring j_trans_barrier * lock, or it will cause a deadlock since journal commit threads holds * this lock and will ask for the page lock when flushing the data. * put it here to preserve the unlock order. */ ocfs2_unlock_pages(wc); out: ocfs2_commit_trans(osb, handle); ocfs2_run_deallocs(osb, &wc->w_dealloc); brelse(wc->w_di_bh); kfree(wc); return copied; } From: yangwenfang Subject: ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock() After we call ocfs2_journal_access_di() in ocfs2_write_begin(), jbd2_journal_restart() may also be called, in this function transaction A's t_updates-- and obtains a new transaction B. If jbd2_journal_commit_transaction() is happened to commit transaction A, when t_updates==0, it will continue to complete commit and unfile buffer. So when jbd2_journal_dirty_metadata(), the handle is pointed a new transaction B, and the buffer head's journal head is already freed, jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns EINVAL, So it triggers the BUG_ON(status). thread 1: jbd2: ocfs2_write_begin jbd2_journal_commit_transaction ocfs2_write_begin_nolock ocfs2_start_trans jbd2__journal_start(t_updates+1, transaction A) ocfs2_journal_access_di ocfs2_write_cluster_by_desc ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_extend_rotate_transaction jbd2_journal_restart (t_updates-1,transaction B) t_updates==0 __jbd2_journal_refile_buffer ocfs2_write_end ocfs2_write_end_nolock ocfs2_journal_dirty jbd2_journal_dirty_metadata(bug) ocfs2_commit_trans In ext4, I found that: jbd2_journal_get_write_access() called by ext4_write_end. ext4_write_begin ext4_journal_start __ext4_journal_start_sb ext4_journal_check_start jbd2__journal_start ext4_write_end ext4_mark_inode_dirty ext4_reserve_inode_write ext4_journal_get_write_access jbd2_journal_get_write_access ext4_mark_iloc_dirty ext4_do_update_inode ext4_handle_dirty_metadata jbd2_journal_dirty_metadata So I think we should put ocfs2_journal_access_di before ocfs2_journal_dirty in the ocfs2_write_end. and it works well after my modification. Signed-off-by: vicky Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton --- fs/ocfs2/aops.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff -puN fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock fs/ocfs2/aops.c --- a/fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock +++ a/fs/ocfs2/aops.c @@ -1822,16 +1822,6 @@ try_again: if (ret) goto out_commit; } - /* - * We don't want this to fail in ocfs2_write_end(), so do it - * here. - */ - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc->w_di_bh, - OCFS2_JOURNAL_ACCESS_WRITE); - if (ret) { - mlog_errno(ret); - goto out_quota; - } /* * Fill our page array first. That way we've grabbed enough so @@ -1982,7 +1972,7 @@ int ocfs2_write_end_nolock(struct addres loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { - int i; + int i, ret; unsigned from, to, start = pos & (PAGE_CACHE_SIZE - 1); struct inode *inode = mapping->host; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); @@ -2032,6 +2022,14 @@ int ocfs2_write_end_nolock(struct addres } } + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc->w_di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) { + copied = ret; + mlog_errno(ret); + goto out; + } + out_write_size: pos += copied; if (pos > i_size_read(inode)) { @@ -2053,6 +2051,7 @@ out_write_size: */ ocfs2_unlock_pages(wc); +out: ocfs2_commit_trans(osb, handle); ocfs2_run_deallocs(osb, &wc->w_dealloc);