From patchwork Wed Mar 23 15:24:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charan Teja Kalla X-Patchwork-Id: 12789833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F374C433EF for ; Wed, 23 Mar 2022 15:24:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A099F6B0073; Wed, 23 Mar 2022 11:24:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 991046B0074; Wed, 23 Mar 2022 11:24:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E4A66B0075; Wed, 23 Mar 2022 11:24:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 6D46F6B0073 for ; Wed, 23 Mar 2022 11:24:50 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1CA1F978E5 for ; Wed, 23 Mar 2022 15:24:50 +0000 (UTC) X-FDA: 79276023540.16.CA2BB6C Received: from alexa-out-sd-02.qualcomm.com (alexa-out-sd-02.qualcomm.com [199.106.114.39]) by imf07.hostedemail.com (Postfix) with ESMTP id 4970240037 for ; Wed, 23 Mar 2022 15:24:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1648049089; x=1679585089; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=PDhBafgZj3+zXDMLpR9U5QffixWseTxyOTJmwjftEEw=; b=K9acs0uGIIl/Wyd8currlai2EplH71NwNQNfFlBVIIVfHq5F1a9WoZob wq20y4owFU7J3QmvfXH4bkKPrft4Hk73w2DqMCr5dsILsMeF+ntHZmMLm R9/r9CMHRgHXjSppUk0ssXCgVMA+IA4RoBsOppqctiCexgaM8sxY5hHur c=; Received: from unknown (HELO ironmsg04-sd.qualcomm.com) ([10.53.140.144]) by alexa-out-sd-02.qualcomm.com with ESMTP; 23 Mar 2022 08:24:48 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg04-sd.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2022 08:24:48 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 23 Mar 2022 08:24:48 -0700 Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 23 Mar 2022 08:24:44 -0700 From: Charan Teja Kalla To: , , , , , , , CC: , , Charan Teja Kalla Subject: [PATCH 1/2] Revert "mm: madvise: skip unmapped vma holes passed to process_madvise" Date: Wed, 23 Mar 2022 20:54:09 +0530 Message-ID: X-Mailer: git-send-email 2.7.4 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-Stat-Signature: z9tgn4xeuxsx5c14wk3uaxx419ot5o66 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4970240037 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcdkim header.b=K9acs0uG; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf07.hostedemail.com: domain of quic_charante@quicinc.com designates 199.106.114.39 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com X-Rspam-User: X-HE-Tag: 1648049089-92283 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This reverts commit 08095d6310a7 ("mm: madvise: skip unmapped vma holes passed to process_madvise") as process_madvise() fails to return exact processed bytes at other cases too. As an example: if the process_madvise() hits mlocked pages after processing some initial bytes passed in [start, end), it just returns EINVAL though some bytes are processed. Thus making an exception only for ENOMEM is partially fixing the problem of returning the proper advised bytes. Thus revert this patch and return proper bytes advised, if there any, for all the error types in the following patch. Signed-off-by: Charan Teja Kalla Acked-by: Michal Hocko --- mm/madvise.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 39b712f..0d8fd17 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1433,16 +1433,9 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter); - /* - * do_madvise returns ENOMEM if unmapped holes are present - * in the passed VMA. process_madvise() is expected to skip - * unmapped holes passed to it in the 'struct iovec' list - * and not fail because of them. Thus treat -ENOMEM return - * from do_madvise as valid and continue processing. - */ ret = do_madvise(mm, (unsigned long)iovec.iov_base, iovec.iov_len, behavior); - if (ret < 0 && ret != -ENOMEM) + if (ret < 0) break; iov_iter_advance(&iter, iovec.iov_len); } From patchwork Wed Mar 23 15:24:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charan Teja Kalla X-Patchwork-Id: 12789834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56F51C433F5 for ; Wed, 23 Mar 2022 15:24:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4D9D6B0074; Wed, 23 Mar 2022 11:24:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD6956B0075; Wed, 23 Mar 2022 11:24:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C76F76B0078; Wed, 23 Mar 2022 11:24:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id B68D96B0074 for ; Wed, 23 Mar 2022 11:24:54 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 8253281C78 for ; Wed, 23 Mar 2022 15:24:54 +0000 (UTC) X-FDA: 79276023708.13.D8E9073 Received: from alexa-out.qualcomm.com (alexa-out.qualcomm.com [129.46.98.28]) by imf20.hostedemail.com (Postfix) with ESMTP id BCB3C1C0040 for ; Wed, 23 Mar 2022 15:24:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1648049094; x=1679585094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=2MwBeQeAIpRqth/4cukdHA1X4XJIp/JPCi6+z40HpNo=; b=RX52+SuAl1jBRtP3ijULITWxq9MStUCneIuszxS1/CSaUIGXUaGmunSu jyaeoYc95/1ymtPaKWc3HdQ2DYgF5ZEIXNuvDsCWz02p9BnsChIrZokzB Os/eOmLXSQf8xtntnqt8wfnH5SljI3RlwNRSfqB+MAlaTVACcW+1oLEpQ I=; Received: from ironmsg07-lv.qualcomm.com ([10.47.202.151]) by alexa-out.qualcomm.com with ESMTP; 23 Mar 2022 08:24:53 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg07-lv.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2022 08:24:52 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 23 Mar 2022 08:24:51 -0700 Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 23 Mar 2022 08:24:48 -0700 From: Charan Teja Kalla To: , , , , , , , CC: , , Charan Teja Reddy Subject: [PATCH 2/2] mm: madvise: return exact bytes advised with process_madvise under error Date: Wed, 23 Mar 2022 20:54:10 +0530 Message-ID: <0fa1bdb5009e898189f339610b90ecca16f243f4.1648046642.git.quic_charante@quicinc.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-Rspam-User: X-Stat-Signature: 7sccsjnt8d1y1793eegntsfqg3ir1xxa Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcdkim header.b=RX52+SuA; spf=pass (imf20.hostedemail.com: domain of quic_charante@quicinc.com designates 129.46.98.28 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BCB3C1C0040 X-HE-Tag: 1648049093-435234 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Charan Teja Reddy The commit 5bd009c7c9a9 ("mm: madvise: return correct bytes advised with process_madvise") fixes the issue to return number of bytes that are successfully advised before hitting error with iovec elements processing. But, when the user passed unmapped ranges in iovec, the syscall ignores these holes and continues processing and returns ENOMEM in the end, which is same as madvise semantic. This is a problem for vector processing where user may want to know how many bytes were exactly processed in a iovec element to make better decissions in the user space. As in ENOMEM case, we processed all bytes in a iovec element but still returned error which will confuse the user whether it is failed or succeeded to advise. As an example, consider below ranges were passed by the user in struct iovec: iovec1(ranges: vma1), iovec2(ranges: vma2 -- vma3 -- hole) and iovec3(ranges: vma4). In the current implementation, it fully advise iovec1 and iovec2 but just returns number of processed bytes as iovec1 range. Then user may repeat the processing of iovec2, which is already processed, which then returns with ENOMEM. Then user may want to skip iovec2 and starts processing from iovec3. Here because of wrong return processed bytes, iovec2 is processed twice. This problem is solved with commit 08095d6310a7 ("mm: madvise: skip unmapped vma holes passed to process_madvise"), where the user now returns iovec1 and iovec2 as processed and he may restart from iovec3. Some problems with this patch are that: 1) User may wanted to be notified as unmapped address ranges were passed by returning ENOMEM[1]. 2) It didn't consider the case where there exists partially advised bytes with other error types too, eg EINVAL. Thus fixing only for ENOMEM is partially solving the problem[2]. 3) Even if no vma is found in the passed iovec range, it is still considered as processed instead of returning ENOMEM. These can be fixed by having process_madvise() with its own semantics[3], different from madvise(), where it will have its own iterator and returns exact bytes it addressed. Now process_madvise() stops iterating if it encounters a hole or an invalid vma and returns the bytes till processed in that iovec element. In the above example, it first returns the processed bytes as the ranges of iovec1(vma1) and iovec2(vma2, vma3) so that user can exactly know that hole/invalid vma exists after vma3 in the passed iovec elements. And thus user can skip hole/invalid vma in the next retry and starts processing from iovec3. [1]https://lore.kernel.org/linux-mm/YjmLmBUmROr+hshO@dhcp22.suse.cz/ [2]https://lore.kernel.org/linux-mm/YjFAzuLKWw5eadtf@google.com/ [3]https://lore.kernel.org/linux-mm/YjNgoeg1yOocsjWC@google.com/ Signed-off-by: Charan Teja Reddy --- mm/madvise.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 87 insertions(+), 3 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 0d8fd17..9169b16 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1381,6 +1381,89 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) return do_madvise(current->mm, start, len_in, behavior); } +/* + * TODO: Add documentation for process_madvise() + */ +static int do_process_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, + int behavior, size_t *partial_bytes_advised) +{ + unsigned long end, tmp; + struct vm_area_struct *vma, *prev; + int error = -EINVAL; + size_t len; + size_t tmp_bytes_advised = 0; + struct blk_plug plug; + + *partial_bytes_advised = 0; + /* + * TODO: Move these checks to a common function to be used by both + * madvise() and process_madvise(). + */ + start = untagged_addr(start); + if (!PAGE_ALIGNED(start)) + return error; + len = PAGE_ALIGN(len_in); + + /* Check to see whether len was rounded up from small -ve to zero */ + if (len_in && !len) + return error; + + end = start + len; + if (end < start) + return error; + + error = 0; + if (end == start) + return error; + + mmap_read_lock(mm); + + vma = find_vma_prev(mm, start, &prev); + if (vma && start > vma->vm_start) + prev = vma; + + blk_start_plug(&plug); + for (;;) { + /* + * It it hits a unmapped address range in the [start, end), + * stop processing and return ENOMEM. + */ + if (!vma || start < vma->vm_start) { + error = -ENOMEM; + goto out; + } + + tmp = vma->vm_end; + if (end < tmp) + tmp = end; + + error = madvise_vma_behavior(vma, &prev, start, tmp, behavior); + if (error) + goto out; + tmp_bytes_advised += tmp - start; + start = tmp; + if (prev && start < prev->vm_end) + start = prev->vm_end; + if (start >= end) + goto out; + if (prev) + vma = prev->vm_next; + else + vma = find_vma(mm, start); + } +out: + /* + * partial_bytes_advised may contain non-zero bytes indicating + * the number of bytes advised before failure. Holds zero incase + * of success. + */ + *partial_bytes_advised = error ? tmp_bytes_advised : 0; + blk_finish_plug(&plug); + mmap_read_unlock(mm); + + return error; +} + SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, size_t, vlen, int, behavior, unsigned int, flags) { @@ -1391,6 +1474,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, struct task_struct *task; struct mm_struct *mm; size_t total_len; + size_t partial_bytes_advised; unsigned int f_flags; if (flags != 0) { @@ -1433,14 +1517,14 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter); - ret = do_madvise(mm, (unsigned long)iovec.iov_base, - iovec.iov_len, behavior); + ret = do_process_madvise(mm, (unsigned long)iovec.iov_base, + iovec.iov_len, behavior, &partial_bytes_advised); if (ret < 0) break; iov_iter_advance(&iter, iovec.iov_len); } - ret = (total_len - iov_iter_count(&iter)) ? : ret; + ret = (total_len - iov_iter_count(&iter) + partial_bytes_advised) ? : ret; release_mm: mmput(mm);