From patchwork Wed Mar 25 13:38:37 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boaz Harrosh X-Patchwork-Id: 6090911 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2C2F1BF90F for ; Wed, 25 Mar 2015 13:38:46 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 023DA20357 for ; Wed, 25 Mar 2015 13:38:45 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3736202C8 for ; Wed, 25 Mar 2015 13:38:43 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C9268816A2; Wed, 25 Mar 2015 06:38:43 -0700 (PDT) X-Original-To: linux-nvdimm@ml01.01.org Delivered-To: linux-nvdimm@ml01.01.org Received: from mail-wg0-f54.google.com (mail-wg0-f54.google.com [74.125.82.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 4CBFB8167F for ; Wed, 25 Mar 2015 06:38:42 -0700 (PDT) Received: by wgbcc7 with SMTP id cc7so27794538wgb.0 for ; Wed, 25 Mar 2015 06:38:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=KC4ThIJHHOd17m1RCLi/zSV5Rld++Lq8LzE6aMyduLo=; b=HyXjlCdLb4ihSj9T8Nx6GnO3BoTij5sVlJCy7ZShuQVMoijwtBI4uOhzJuHynUBK6c MwbLRtmGOZfEHMEPLp65j2KnP6PDW2OV1k/pn8fvATWANdbvvkHq1WxhOfJcGcavAv3x GieIwzzFhzh9AXpG2WWZouJa5HMd5c3so75RN3ud8P7+56FRI1q/yEkAS7XOZ7qj2+7/ 5HfcnYF6feJpD2XKfD96hQPbUD1lCEnjZbHwFovqN5/AJkDUOJx6NMRg524QwCxa6vnj fvp8vjhTPgDzLMjXDrXs9AhD0EtC1fdOxDXw6EgtkhBzOH/QMr1wi1ubMqie/S6/UNbw QW/g== X-Gm-Message-State: ALoCoQme+5eJSNz7ipxXhjX3vu33Zqb/ku5IZ0j6SPNvNkXYuqRn1L0DzWOfwtnDHGTWFYSfY48h X-Received: by 10.180.35.97 with SMTP id g1mr37508060wij.17.1427290720505; Wed, 25 Mar 2015 06:38:40 -0700 (PDT) Received: from [10.0.0.5] ([207.232.55.62]) by mx.google.com with ESMTPSA id kr5sm3819806wjc.1.2015.03.25.06.38.38 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Mar 2015 06:38:39 -0700 (PDT) Message-ID: <5512BA5D.8070609@plexistor.com> Date: Wed, 25 Mar 2015 15:38:37 +0200 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Dave Chinner , Matthew Wilcox , Andrew Morton , "Kirill A. Shutemov" , Jan Kara , Hugh Dickins , Mel Gorman , linux-mm@kvack.org, linux-nvdimm , linux-fsdevel , Eryu Guan References: <5512B961.8070409@plexistor.com> In-Reply-To: <5512B961.8070409@plexistor.com> Subject: [Linux-nvdimm] [PATCH 1/3] mm: New pfn_mkwrite same as page_mkwrite for VM_PFNMAP X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yigal Korman This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to get notified when access is a write to a read-only PFN. This can happen if we mmap() a file then first mmap-read from it to page-in a read-only PFN, than we mmap-write to the same page. We need this functionality to fix a DAX bug, where in the scenario above we fail to set ctime/mtime though we modified the file. An xfstest is attached to this patchset that shows the failure and the fix. (A DAX patch will follow) This functionality is extra important for us, because upon dirtying of a pmem page we also want to RDMA the page to a remote cluster node. We define a new pfn_mkwrite and do not reuse page_mkwrite because 1 - The name ;-) 2 - But mainly because it would take a very long and tedious audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP users. To make sure they do not now CRASH. For example current DAX code (which this is for) would crash. If we would want to reuse page_mkwrite, We will need to first patch all users, so to not-crash-on-no-page. Then enable this patch. But even if I did that I would not sleep so well at night. Adding a new vector is the safest thing to do, and is not that expensive. an extra pointer at a static function vector per driver. Also the new vector is better for performance, because else we Will call all current Kernel vectors, so to: check-ha-no-page-do-nothing and return. No need to call it from do_shared_fault because do_wp_page is called to change pte permissions anyway. CC: Matthew Wilcox CC: Kirill A. Shutemov CC: Jan Kara CC: Andrew Morton CC: Hugh Dickins CC: Mel Gorman CC: linux-mm@kvack.org Signed-off-by: Yigal Korman Signed-off-by: Boaz Harrosh --- include/linux/mm.h | 2 ++ mm/memory.c | 28 +++++++++++++++++++++++++++- 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..1cd820c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -250,6 +250,8 @@ struct vm_operations_struct { /* notification that a previously read-only page is about to become * writable, if an error is returned it will cause a SIGBUS */ int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); + /* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */ + int (*pfn_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); /* called by access_process_vm when get_user_pages() fails, typically * for use by special VMAs that can switch between memory and hardware diff --git a/mm/memory.c b/mm/memory.c index 8068893..8d640d1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1982,6 +1982,23 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, return ret; } +static int do_pfn_mkwrite(struct vm_area_struct *vma, unsigned long address) +{ + if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { + struct vm_fault vmf = { + .page = 0, + .pgoff = (((address & PAGE_MASK) - vma->vm_start) + >> PAGE_SHIFT) + vma->vm_pgoff, + .virtual_address = (void __user *)(address & PAGE_MASK), + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE, + }; + + return vma->vm_ops->pfn_mkwrite(vma, &vmf); + } + + return 0; +} + /* * This routine handles present pages, when users try to write * to a shared page. It is done by copying the page to a new address @@ -2025,8 +2042,17 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, * accounting on raw pfn maps. */ if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == - (VM_WRITE|VM_SHARED)) + (VM_WRITE|VM_SHARED)) { + pte_unmap_unlock(page_table, ptl); + ret = do_pfn_mkwrite(vma, address); + if (ret & VM_FAULT_ERROR) + return ret; + page_table = pte_offset_map_lock(mm, pmd, address, + &ptl); + if (!pte_same(*page_table, orig_pte)) + goto unlock; goto reuse; + } goto gotten; }