From patchwork Tue Jul 18 08:28:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Kasireddy X-Patchwork-Id: 13316890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2FD0EB64DD for ; Tue, 18 Jul 2023 08:50:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 588578D0003; Tue, 18 Jul 2023 04:50:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 539A78D0001; Tue, 18 Jul 2023 04:50:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38C4E8D0003; Tue, 18 Jul 2023 04:50:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 29C218D0001 for ; Tue, 18 Jul 2023 04:50:30 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E726B1C88CE for ; Tue, 18 Jul 2023 08:50:29 +0000 (UTC) X-FDA: 81024111378.27.D05F263 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf21.hostedemail.com (Postfix) with ESMTP id D00BA1C0012 for ; Tue, 18 Jul 2023 08:50:27 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Wl6WAxQn; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689670228; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i8LvTVgkUZSPsSalhK4cD1SKV0V7UxM9yfNCPiSPNkM=; b=CVemkzkWyPJqOgm3h+Ydk6wgbQKrtsfwv1/j+UVzF+jAimkmhnlVtnk6XGCQ+9avMChGi0 3HV8zw69iMa2Z6HvV5OcQRHynblXcMKGb+1vB+a3rCgNYScd62l740oZRpjGhPUPoUQmXz mPddaJ/LzSZqpWrgIH/ywzTBe4boyaw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Wl6WAxQn; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689670228; a=rsa-sha256; cv=none; b=N01FS4gNsvIVwJHlCuxiyR4Mjl2C/sG7SJXU6XcbwjnChb5MAojZz+jilxKPSe8ZF6zb// TtyCcDBeKXk3Ra7BJHa3FnT255I5FCl38adMelBdfEif3uCVEHDhv2+BTSQbQM9kFffz0u +9/VlD+S0MbQ8LMVOpjLnUJ6dTudKdU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689670227; x=1721206227; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qXZ6AwfAvrK7KjXkWKm1YdazaQCCVWSV4uTMM4BsPII=; b=Wl6WAxQnF6y0mkcwQyzwG6xnDreCKSvqexsiTYjFHUlUyQgwm/S+lMq4 ibfeoQMIW0oSB5og/x/EI8mzrnco98BCW31GV8744HNVSOlzgQ9rTp6fq I4aNYMu7kuZq1nX6CfNrcKV/d3ys7K45+jrwebGdRX28QBN3GbR/qKc9S /qLkKq9R3GHZDAECi4+Y1zWJsKL9XE73bYhWJoS3Z0PxsTu8MPzQ3PL5o y77E+yv4a+wZgoiI0288WjR1u5TQB+N6eQKJ0IvP2DCdLHOLs6PGKTC1/ TBPQefPTOfmV1OaURQHf9/jhMUqTERGsuxpvVcuOvVlQ6UGHYPWu4Evgg g==; X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="363616486" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="363616486" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10774"; a="837205685" X-IronPort-AV: E=Sophos;i="6.01,213,1684825200"; d="scan'208";a="837205685" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2023 01:50:23 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [RFC v1 2/3] udmabuf: Replace pages when there is FALLOC_FL_PUNCH_HOLE in memfd Date: Tue, 18 Jul 2023 01:28:57 -0700 Message-Id: <20230718082858.1570809-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230718082858.1570809-1-vivek.kasireddy@intel.com> References: <20230718082858.1570809-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D00BA1C0012 X-Stat-Signature: c8aw6juutotw6aqrqfqghmnnosykohja X-HE-Tag: 1689670227-91465 X-HE-Meta: U2FsdGVkX1/cvhceNMtlGlIc4FuXvwth15iTjASkZ1IdyPZl4FfUpZxIEg6izxZ+RwbYirmVieIO7POt/KRb1aNVa1WxINXzzdMv95FBGSZzW8G6/cCl26oyQ4dr1AWr5MVWoMa/JleU8RkWuFS2Ru54PFuBXZ1dtLD+ySSbaGiGEFHVcMp3TJPy7EA2d3RO0UMod9rYSpiFX8Kxng5U5KHR2BcLLuTldGqAXzhvW2W6DgFZF5kqhyIDmRrdssi8qZB7EuLGQCd++wmlHyDSIoYt+jSKcPet53ulT9XVJq7/Y29qwKtgA+J9gs5r6T0DO2LL6wPg7fLDgttAQWoIukThfWNQYh2vnZKVcpOS0sNGG9EfOzSFCIUugYYkhFgTAuPhsL8pHbYJ0CSTDkEIVMn0xc7mAB47OVUg1ZJX+aaE++NEX5vMB3DZuZXSaKqPq5Cz5JUBDRU1dekLGt6e8yUT0MiPJGXqS35UObSZs839DTgJfe0H0MxTm7GstoFpbylt36aNvJgj1YGGyb8j3OVScVLozGu7tT1Dyc4aAIT/4btCiKEj7QO+GwIorA3j8dJbYt5bLGm1VXGt5iYrv/EMYkLRbom4FT6MJhVMd9UezS03hyJ38ytfWn9rS3zXwhNUorGgvoayaTUc1I/GGpFNJOrCtUV0gJP3hwtSIerTc+JLaAtzB4QHjmTSKI/ic5gr/LP9Lok5g2BS0iIwWrlGxPXWSbhFZCJ7HRMkxXAYcVTwADAO3OBXCMzFCpvtYgjh1y7+SNMrSsNYqwqWDVoLcICeyLvjGnFpTR74dLOD1+HHfgLQ4O6m6nHj1RvnRfyM/kpfFE134lnMHOnyeijLcR1eAGw9KLQpE8ROruZq6+KlJrMe1WsKuGAYj0Zi2WgxNKplvP2B3a531N5JZ6FzjYAnvyOM0FCpLwJeBkksNdLM8eDCCXjvYOaAIqjLbudHDLcynlKmABeWL7U TSTgS0mL X50n+Kke68k8QM8GFGyVtXazOpZMIXuagFDuw9nsPBielvL7MveBfxZtjudxNmqjceSdlO/wB0thILYHFJhUW/E6JekFlb6g5A3GhodnTXzFSqVSz/Wn/Hya7CCyANv9VVWdxqpBhzziWd8S3oB2AAfsJCUvrEaqiLHf+oOqDH25a2e7CkP74UJBD437gtBBC98FjeBiAwG3burcjbpLTns5j2+lRKpqDTPyVSXNyZL2SLr/froPDtvX6ohGmSlOmTrQnBdg1eykOVxE+wODlMBOS4FYLWvOumdGaFJCLl0bvUEQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a hole is punched in the memfd or when a page is replaced for any reason, the udmabuf driver needs to get notified in order to update its list of pages with the new page. To accomplish this, we first identify the vma ranges where pages associated with a given udmabuf are mapped to and then register a handler for update_mapping mmu notifier for receiving mapping updates. Once we get notified about a new page faulted in at a given offset in the mapping (backed by shmem or hugetlbfs), the list of pages is updated and we also zap the relevant PTEs associated with the vmas that have mmap'd the udmabuf fd. Cc: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 172 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 172 insertions(+) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 10c47bf77fb5..189a36c41906 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include #include #include @@ -30,6 +32,23 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; pgoff_t *offsets; + struct udmabuf_vma_range *ranges; + unsigned int num_ranges; + struct mmu_notifier notifier; + struct mutex mn_lock; + struct list_head mmap_vmas; +}; + +struct udmabuf_vma_range { + struct file *memfd; + pgoff_t ubufindex; + unsigned long start; + unsigned long end; +}; + +struct udmabuf_mmap_vma { + struct list_head vma_link; + struct vm_area_struct *vma; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -42,28 +61,54 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; + mutex_lock(&ubuf->mn_lock); pfn = page_to_pfn(ubuf->pages[pgoff]); if (ubuf->offsets) { pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; } + mutex_unlock(&ubuf->mn_lock); return vmf_insert_pfn(vma, vmf->address, pfn); } +static void udmabuf_vm_close(struct vm_area_struct *vma) +{ + struct udmabuf *ubuf = vma->vm_private_data; + struct udmabuf_mmap_vma *mmap_vma; + + list_for_each_entry(mmap_vma, &ubuf->mmap_vmas, vma_link) { + if (mmap_vma->vma == vma) { + list_del(&mmap_vma->vma_link); + kfree(mmap_vma); + break; + } + } +} + static const struct vm_operations_struct udmabuf_vm_ops = { .fault = udmabuf_vm_fault, + .close = udmabuf_vm_close, }; static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) { struct udmabuf *ubuf = buf->priv; + struct udmabuf_mmap_vma *mmap_vma; if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) return -EINVAL; + mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); + if (!mmap_vma) + return -ENOMEM; + vma->vm_ops = &udmabuf_vm_ops; vma->vm_private_data = ubuf; vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); + + mmap_vma->vma = vma; + list_add(&mmap_vma->vma_link, &ubuf->mmap_vmas); + return 0; } @@ -109,6 +154,7 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, if (ret < 0) goto err_alloc; + mutex_lock(&ubuf->mn_lock); for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { offset = ubuf->offsets ? ubuf->offsets[i] : 0; sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, offset); @@ -116,9 +162,12 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) goto err_map; + + mutex_unlock(&ubuf->mn_lock); return sg; err_map: + mutex_unlock(&ubuf->mn_lock); sg_free_table(sg); err_alloc: kfree(sg); @@ -157,6 +206,9 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + + mmu_notifier_unregister(&ubuf->notifier, ubuf->notifier.mm); + kfree(ubuf->ranges); kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); @@ -208,6 +260,93 @@ static const struct dma_buf_ops udmabuf_ops = { .end_cpu_access = end_cpu_udmabuf, }; +static void invalidate_mmap_vmas(struct udmabuf *ubuf, + struct udmabuf_vma_range *range, + unsigned long address, unsigned long size) +{ + struct udmabuf_mmap_vma *vma; + unsigned long start = range->ubufindex << PAGE_SHIFT; + + start += address - range->start; + list_for_each_entry(vma, &ubuf->mmap_vmas, vma_link) { + zap_vma_ptes(vma->vma, vma->vma->vm_start + start, size); + } +} + +static struct udmabuf_vma_range *find_udmabuf_range(struct udmabuf *ubuf, + unsigned long address) +{ + struct udmabuf_vma_range *range; + int i; + + for (i = 0; i < ubuf->num_ranges; i++) { + range = &ubuf->ranges[i]; + if (address >= range->start && address < range->end) + return range; + } + + return NULL; +} + +static void update_udmabuf(struct mmu_notifier *mn, struct mm_struct *mm, + unsigned long address, unsigned long pfn) +{ + struct udmabuf *ubuf = container_of(mn, struct udmabuf, notifier); + struct udmabuf_vma_range *range = find_udmabuf_range(ubuf, address); + struct page *old_page, *new_page; + pgoff_t pgoff, pgshift = PAGE_SHIFT; + unsigned long size = 0; + + if (!range || !pfn_valid(pfn)) + return; + + if (is_file_hugepages(range->memfd)) + pgshift = huge_page_shift(hstate_file(range->memfd)); + + mutex_lock(&ubuf->mn_lock); + pgoff = range->ubufindex + ((address - range->start) >> pgshift); + old_page = ubuf->pages[pgoff]; + new_page = pfn_to_page(pfn); + + do { + ubuf->pages[pgoff] = new_page; + get_page(new_page); + put_page(old_page); + size += PAGE_SIZE; + } while (ubuf->pages[++pgoff] == old_page); + + mutex_unlock(&ubuf->mn_lock); + invalidate_mmap_vmas(ubuf, range, address, size); +} + +static const struct mmu_notifier_ops udmabuf_update_ops = { + .update_mapping = update_udmabuf, +}; + +static struct vm_area_struct *find_guest_ram_vma(struct udmabuf *ubuf, + struct mm_struct *vmm_mm) +{ + struct vm_area_struct *vma = NULL; + MA_STATE(mas, &vmm_mm->mm_mt, 0, 0); + unsigned long addr; + pgoff_t pg; + + mas_set(&mas, 0); + mmap_read_lock(vmm_mm); + mas_for_each(&mas, vma, ULONG_MAX) { + for (pg = 0; pg < ubuf->pagecount; pg++) { + addr = page_address_in_vma(ubuf->pages[pg], vma); + if (addr == -EFAULT) + break; + } + if (addr != -EFAULT) + break; + } + mmap_read_unlock(vmm_mm); + + return vma; +} + #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) @@ -218,6 +357,7 @@ static long udmabuf_create(struct miscdevice *device, DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct file *memfd = NULL; struct address_space *mapping = NULL; + struct vm_area_struct *guest_ram; struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; @@ -252,6 +392,13 @@ static long udmabuf_create(struct miscdevice *device, goto err; } + ubuf->ranges = kmalloc_array(head->count, sizeof(*ubuf->ranges), + GFP_KERNEL); + if (!ubuf->ranges) { + ret = -ENOMEM; + goto err; + } + pgbuf = 0; for (i = 0; i < head->count; i++) { ret = -EBADFD; @@ -270,6 +417,8 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + ubuf->ranges[i].ubufindex = pgbuf; + ubuf->ranges[i].memfd = memfd; if (is_file_hugepages(memfd)) { if (!ubuf->offsets) { ubuf->offsets = kmalloc_array(ubuf->pagecount, @@ -299,6 +448,7 @@ static long udmabuf_create(struct miscdevice *device, get_page(hpage); ubuf->pages[pgbuf] = hpage; ubuf->offsets[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { put_page(hpage); hpage = NULL; @@ -334,6 +484,25 @@ static long udmabuf_create(struct miscdevice *device, goto err; } + guest_ram = find_guest_ram_vma(ubuf, current->mm); + if (!guest_ram) + goto err; + + ubuf->notifier.ops = &udmabuf_update_ops; + ret = mmu_notifier_register(&ubuf->notifier, current->mm); + if (ret) + goto err; + + ubuf->num_ranges = head->count; + for (i = 0; i < ubuf->num_ranges; i++) { + page = ubuf->pages[ubuf->ranges[i].ubufindex]; + ubuf->ranges[i].start = page_address_in_vma(page, guest_ram); + ubuf->ranges[i].end = ubuf->ranges[i].start + list[i].size; + } + + INIT_LIST_HEAD(&ubuf->mmap_vmas); + mutex_init(&ubuf->mn_lock); + flags = 0; if (head->flags & UDMABUF_FLAGS_CLOEXEC) flags |= O_CLOEXEC; @@ -344,6 +513,9 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + if (ubuf->notifier.mm) + mmu_notifier_unregister(&ubuf->notifier, ubuf->notifier.mm); + kfree(ubuf->ranges); kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf);