From patchwork Wed Apr 5 18:04:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhang X-Patchwork-Id: 9665387 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5C31A602B5 for ; Wed, 5 Apr 2017 18:20:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 542D12858B for ; Wed, 5 Apr 2017 18:20:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4907E285A4; Wed, 5 Apr 2017 18:20:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 942712858B for ; Wed, 5 Apr 2017 18:20:39 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cvpVU-0000Y1-Mj; Wed, 05 Apr 2017 18:18:12 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cvpVU-0000Xu-Dp for xen-devel@lists.xen.org; Wed, 05 Apr 2017 18:18:12 +0000 Received: from [193.109.254.147] by server-3.bemta-6.messagelabs.com id B2/5B-27751-3E435E85; Wed, 05 Apr 2017 18:18:11 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrAIsWRWlGSWpSXmKPExsVywNykQveRydM Ig8eTjCyWfFzM4sDocXT3b6YAxijWzLyk/IoE1ozHi2oKlppUrJhyiq2Bcal2FyMXB4tAI7PE pyNT2EAcIYHpjBLnWu4xdjFyckgI8EocWTaDtYuRA8j2k5h3QR+iZj6jRO/qR6wgNcICZRIX+ v6zg9giAlESPb9XQQ16CjR12lkwh1lgPpPE0YnPwarYBLQlfqz+DbaBV0BP4vqnk2CTWARUJN r6OsFqRAWiJXafa2CGqBGUODnzCQuIzSmgL3Hq7n6wGmYBM4l5mx8yQ9jyEtvfzmGewCg4C0n LLCRls5CULWBkXsWoUZxaVJZapGtkpJdUlJmeUZKbmJmja2hgppebWlycmJ6ak5hUrJecn7uJ ERi6DECwg3HN/MBDjJIcTEqivAo+TyKE+JLyUyozEosz4otKc1KLDzHKcHAoSfCaA2NBSLAoN T21Ii0zBxhFMGkJDh4lEd7NxkBp3uKCxNzizHSI1ClGXY45s3e/YRJiycvPS5US5z0AUiQAUp RRmgc3AhbRlxhlpYR5GYGOEuIpSC3KzSxBlX/FKM7BqCQMMYUnM68EbtMroCOYgI54cuchyBE liQgpqQZGpfnF/x7daD5lKJVt8Cm3fNm1vBPWqgmT/f+krtz76pR/bMLPLWe7D1QLPDjCNLuJ fUlKz5kmhpSX3y/pvLt+SSjQPKyn7tYRTvP9SzlKN6kLdwrVqr6efCmCId1bKrdv8dqmQ35Fq YuXCvf+lFnudnWVTbfBbt8YtZCg/hCNF/NO1+RrmKsosRRnJBpqMRcVJwIAm7sAveMCAAA= X-Env-Sender: yu.c.zhang@linux.intel.com X-Msg-Ref: server-13.tower-27.messagelabs.com!1491416288!86419508!1 X-Originating-IP: [192.55.52.120] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 28093 invoked from network); 5 Apr 2017 18:18:10 -0000 Received: from mga04.intel.com (HELO mga04.intel.com) (192.55.52.120) by server-13.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 5 Apr 2017 18:18:10 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1491416290; x=1522952290; h=subject:to:references:cc:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=fHpMtqBjWuM019p1+3cs6APMl39UR5svpNDjDy/XCDs=; b=tKsPTctEvJ9+ND4vgOfFhjQQ4gAuHHeYvqxXLrZfdxEGOFPTaBCpgzHl iwxwykHp4MjnKbqsSDtGgDTanVOaZA==; Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Apr 2017 11:18:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,279,1488873600"; d="scan'208";a="85165672" Received: from zhangyu-win7x64.ccr.corp.intel.com (HELO [10.238.135.171]) ([10.238.135.171]) by fmsmga006.fm.intel.com with ESMTP; 05 Apr 2017 11:18:06 -0700 To: George Dunlap , George Dunlap References: <1491135867-623-1-git-send-email-yu.c.zhang@linux.intel.com> <1491135867-623-6-git-send-email-yu.c.zhang@linux.intel.com> <58E519AB.7050004@linux.intel.com> <413a1f07-fa11-ab1c-af40-573ed0260e69@citrix.com> <58E51C2C.6050108@linux.intel.com> <546abc48-848a-e5c9-c9ec-e6757053ab5e@citrix.com> <58E526DD.4070309@linux.intel.com> <58E52922.1030500@linux.intel.com> <58E5311F.2080205@linux.intel.com> From: Yu Zhang Message-ID: <58E531B9.2040600@linux.intel.com> Date: Thu, 6 Apr 2017 02:04:41 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <58E5311F.2080205@linux.intel.com> Cc: Kevin Tian , Jun Nakajima , Andrew Cooper , "xen-devel@lists.xen.org" , Paul Durrant , "Lv, Zhiyuan" , Jan Beulich Subject: Re: [Xen-devel] [PATCH v10 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On 4/6/2017 2:02 AM, Yu Zhang wrote: > > > On 4/6/2017 1:28 AM, Yu Zhang wrote: >> >> >> On 4/6/2017 1:18 AM, Yu Zhang wrote: >>> >>> >>> On 4/6/2017 1:01 AM, George Dunlap wrote: >>>> On 05/04/17 17:32, Yu Zhang wrote: >>>>> >>>>> On 4/6/2017 12:35 AM, George Dunlap wrote: >>>>>> On 05/04/17 17:22, Yu Zhang wrote: >>>>>>> On 4/5/2017 10:41 PM, George Dunlap wrote: >>>>>>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang >>>>>>>> >>>>>>>> wrote: >>>>>>>>> After an ioreq server has unmapped, the remaining >>>>>>>>> p2m_ioreq_server >>>>>>>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>>>>>>> asynchronously with the current p2m_change_entry_type_global() >>>>>>>>> interface. >>>>>>>>> >>>>>>>>> New field entry_count is introduced in struct p2m_domain, to >>>>>>>>> record >>>>>>>>> the number of p2m_ioreq_server p2m page table entries. One >>>>>>>>> nature of >>>>>>>>> these entries is that they only point to 4K sized page frames, >>>>>>>>> because >>>>>>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw >>>>>>>>> ones in >>>>>>>>> p2m_change_type_one(). We do not need to worry about the >>>>>>>>> counting for >>>>>>>>> 2M/1G sized pages. >>>>>>>> Assuming that all p2m_ioreq_server entries are *created* by >>>>>>>> p2m_change_type_one() may valid, but can you assume that they >>>>>>>> are only >>>>>>>> ever *removed* by p2m_change_type_one() (or recalculation)? >>>>>>>> >>>>>>>> What happens, for instance, if a guest balloons out one of the ram >>>>>>>> pages? I don't immediately see anything preventing a >>>>>>>> p2m_ioreq_server >>>>>>>> page from being ballooned out, nor anything on the >>>>>>>> decrease_reservation() path decreasing p2m->ioreq.entry_count. >>>>>>>> Or did >>>>>>>> I miss something? >>>>>>>> >>>>>>>> Other than that, only one minor comment... >>>>>>> Thanks for your thorough consideration, George. But I do not >>>>>>> think we >>>>>>> need to worry about this: >>>>>>> >>>>>>> If the emulation is in process, the balloon driver cannot get a >>>>>>> p2m_ioreq_server page - because >>>>>>> it is already allocated. >>>>>> In theory, yes, the guest *shouldn't* do this. But what if the >>>>>> guest OS >>>>>> makes a mistake? Or, what if the ioreq server makes a mistake and >>>>>> places a watch on a page that *isn't* allocated by the device >>>>>> driver, or >>>>>> forgets to change a page type back to ram when the device driver >>>>>> frees >>>>>> it back to the guest kernel? >>>>> Then the lazy p2m change code will be triggered, and this page is >>>>> reset >>>>> to p2m_ram_rw >>>>> before being set to p2m_invalid, just like the normal path. Will >>>>> this be >>>>> a problem? >>>> No, I'm talking about before the ioreq server detaches. >>> Sorry, I do not get it. Take scenario 1 for example: >>>> Scenario 1: Bug in driver >>>> 1. Guest driver allocates page A >>>> 2. dm marks A as p2m_ioreq_server >>> Here in step 2. the ioreq.entry_count increases; >>>> 3. Guest driver accidentally frees A to the kernel >>>> 4. guest kernel balloons out page A; ioreq.entry_count is wrong >>> >>> Here in step 4. the ioreq.entry_count decreases. >> >> Oh. I figured out. This entry is not invalidated yet if ioreq is not >> unmapped. Sorry. >> >>> Isn't this what we are expecting? >>> >>> Yu >>>> >>>> Scenario 2: Bug in the kernel >>>> 1. Guest driver allocates page A >>>> 2. dm marks A as p2m_ioreq_server >>>> 3. Guest kernel tries to balloon out page B, but makes a calculation >>>> mistake and balloons out A instead; now ioreq.entry_count is wrong >>>> >>>> Scenario 3: Off-by-one bug in devicemodel >>>> 1. Guest driver allocates pages A-D >>>> 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one >>>> extra >>>> page) >>>> 3. guest kernel balloons out page E; now ioreq.entry_count is wrong >>>> >>>> Scenario 4: "Leak" in devicemodel >>>> 1. Guest driver allocates page A >>>> 2. dm marks A as p2m_ioreq_server >>>> 3. Guest driver is done with page A, but DM forgets to reset it to >>>> p2m_ram_rw >>>> 4. Guest driver frees A to guest kernel >>>> 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong >>>> >>>> I could keep going on; there are *lots* of bugs in the driver, the >>>> kernel, or the devicemodel which could cause pages marked >>>> p2m_ioreq_server to end up being ballooned out; which under the >>>> current >>>> code would make ioreq.entry_count wrong. >>>> >>>> It's the hypervisor's job to do the right thing even when other >>>> components have bugs in them. This is why I initially suggested >>>> keeping >>>> count in atomic_write_ept_entry() -- no matter how the entry is >>>> changed, >>>> we always know exactly how many entries of type p2m_ioreq_server we >>>> have. >>>> >> >> Well, count in atomic_write_ept_entry() only works for ept. Besides, >> it requires >> interface changes - need to pass the p2m. >> Another thought is - now in XenGT, PoD is disabled to make sure >> gfn->mfn does >> not change. So how about we disable ballooning if ioreq.entry_count >> is not 0? > > Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is > set to p2m_invalid? > Like below code: > > diff --git a/xen/common/memory.c b/xen/common/memory.c > > index 7dbddda..40e5f63 100644 > > --- a/xen/common/memory.c > > +++ b/xen/common/memory.c > > @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned > long gmfn) > > put_gfn(d, gmfn); > > return 1; > > } > > +if ( unlikely(p2mt == p2m_ioreq_server) ) > > +p2m_change_type_one(d, gmfn, > > +p2m_ioreq_server, p2m_ram_rw); > > + > > #else > > mfn = gfn_to_mfn(d, _gfn(gmfn)); > > #endif > > Sorry for the format. Please ignore above code, and see below one: #endif Yu > Yu > >> >> Yu >>>> -George >>>> >>>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> https://lists.xen.org/xen-devel >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> https://lists.xen.org/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel diff --git a/xen/common/memory.c b/xen/common/memory.c index 7dbddda..40e5f63 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) put_gfn(d, gmfn); return 1; } + if ( unlikely(p2mt == p2m_ioreq_server) ) + p2m_change_type_one(d, gmfn, + p2m_ioreq_server, p2m_ram_rw); + #else mfn = gfn_to_mfn(d, _gfn(gmfn));