From patchwork Wed Jan 13 12:15:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 12016737 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94552C433DB for ; Wed, 13 Jan 2021 12:42:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A75B233CE for ; Wed, 13 Jan 2021 12:42:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A75B233CE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 379698D004D; Wed, 13 Jan 2021 07:42:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32AB98D002E; Wed, 13 Jan 2021 07:42:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 219148D004D; Wed, 13 Jan 2021 07:42:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 0D2128D002E for ; Wed, 13 Jan 2021 07:42:33 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 93ED0362C for ; Wed, 13 Jan 2021 12:42:32 +0000 (UTC) X-FDA: 77700715344.28.basin98_4814bf42751e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 763316D64 for ; Wed, 13 Jan 2021 12:42:32 +0000 (UTC) X-HE-Tag: basin98_4814bf42751e X-Filterd-Recvd-Size: 14547 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Wed, 13 Jan 2021 12:42:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610541751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kjalbyF368DbvSLAWbR9MHG3/Qei1O4VsANqTy6iia8=; b=fKJPrxOQ79PW/rjvhbF9L98Jk9eitm+3sOO8H11jSv/GYb/9szSR6C7N2cuS4xBAB88BIG kpc67QeztLZa3gZKRBJh5GufoY0mbGI98Kj/KOBUwFaFyeJdCZiJ8PlSwqegAFF1SYcNq/ 3KrTT4A8/2BwMbZBdB40jmlcuy8xhcs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-261-yL8IRVvoNtigc07kexC-GA-1; Wed, 13 Jan 2021 07:42:29 -0500 X-MC-Unique: yL8IRVvoNtigc07kexC-GA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CCD6A18C89DE; Wed, 13 Jan 2021 12:42:27 +0000 (UTC) Received: from fuller.cnet (ovpn-112-8.gru2.redhat.com [10.97.112.8]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 76C9960CE0; Wed, 13 Jan 2021 12:42:21 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id 37B654111927; Wed, 13 Jan 2021 09:15:44 -0300 (-03) Date: Wed, 13 Jan 2021 09:15:44 -0300 From: Marcelo Tosatti To: Alex Belits Cc: "tglx@linutronix.de" , "cl@linux.com" , "pauld@redhat.com" , "linux-mm@kvack.org" , "frederic@kernel.org" , "willy@infradead.org" , "peterz@infradead.org" , "akpm@linux-foundation.org" , Juri Lelli , Daniel Bristot de Oliveira Subject: [RFC] tentative prctl task isolation interface Message-ID: <20210113121544.GA16380@fuller.cnet> References: <20201117162805.GA274911@fuller.cnet> <20201117180356.GT29991@casper.infradead.org> <20201117202317.GA282679@fuller.cnet> <20201127154845.GA9100@fuller.cnet> <87h7p4dwus.fsf@nanos.tec.linutronix.de> <12ddb629555590cfd41db5b10854d95c1f154e24.camel@marvell.com> MIME-Version: 1.0 In-Reply-To: <12ddb629555590cfd41db5b10854d95c1f154e24.camel@marvell.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mtosatti@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, So as discussed, this is one possible prctl interface for task isolation. Is this something that is desired? If not, what is the proper way for the interface to be? (addition of a new capability CAP_TASK_ISOLATION, for permissions is still missing, should be done in the next versions). Thanks. add prctl interface for task isolation Add a new extensible interface for task isolation, and allow userspace to quiesce the CPU. This means putting the system into a quiet state by completing all workqueue items, idle all subsystems that need it and put the cpu into NOHZ mode. Suggested-by: Christopher Lameter Signed-off-by: Marcelo Tosatti Index: linux-2.6-vmstat2/include/uapi/linux/prctl.h =================================================================== --- linux-2.6-vmstat2.orig/include/uapi/linux/prctl.h +++ linux-2.6-vmstat2/include/uapi/linux/prctl.h @@ -247,4 +247,10 @@ struct prctl_mm_map { #define PR_SET_IO_FLUSHER 57 #define PR_GET_IO_FLUSHER 58 +/* Task isolation control */ +#define PR_TASK_ISOLATION_FEATURES 59 +#define PR_TASK_ISOLATION_GET 60 +#define PR_TASK_ISOLATION_SET 61 +#define PR_TASK_ISOLATION_REQUEST 62 + #endif /* _LINUX_PRCTL_H */ Index: linux-2.6-vmstat2/kernel/sys.c =================================================================== --- linux-2.6-vmstat2.orig/kernel/sys.c +++ linux-2.6-vmstat2/kernel/sys.c @@ -58,6 +58,7 @@ #include #include #include +#include #include #include #include @@ -2530,6 +2531,25 @@ SYSCALL_DEFINE5(prctl, int, option, unsi error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER; break; + case PR_TASK_ISOLATION_FEATURES: { + struct isolation_features ifeat; + + memset(&ifeat, 0, sizeof(ifeat)); + + prctl_task_isolation_features(&ifeat); + if (copy_to_user((char __user *)arg2, &ifeat, sizeof(ifeat))) + return -EFAULT; + break; + } + case PR_TASK_ISOLATION_SET: + error = prctl_task_isolation_set(arg2, arg3, arg4, arg5); + break; + case PR_TASK_ISOLATION_GET: + error = prctl_task_isolation_get(arg2, arg3, arg4, arg5); + break; + case PR_TASK_ISOLATION_REQUEST: + error = prctl_task_isolation_request(arg2, arg3, arg4, arg5); + break; default: error = -EINVAL; break; Index: linux-2.6-vmstat2/Documentation/userspace-api/task_isolation.rst =================================================================== --- /dev/null +++ linux-2.6-vmstat2/Documentation/userspace-api/task_isolation.rst @@ -0,0 +1,99 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================ +Task isolation CPU interface +============================ + +The kernel might perform a number of activities in the background, +on a given CPU, in the form of workqueues or interrupts. + +This interface allows userspace to indicate to the kernel when +its running latency critical code (and what is the behaviour +on activities that would interrupt the CPU). + +This allows the system to take preventive measures to avoid +deferred actions and create a OS noise free environment for +the application. + +The task isolation mode is a bitmap specifying which individual +features the application desires to be enabled. + +Each individual feature can be configured via + + prctl(PR_TASK_ISOLATION_SET, ISOL_F_featurename, params...) + +Enablement of the set of features is requested via + + prctl(PR_TASK_ISOLATION_REQUEST, ISOL_F_featurename, 0, 0, 0) + +PR_TASK_ISO_feature (both GET/SET) are supported if the flags +field of struct isolation_features contains bit number ISOL_F_featurename +set (see "Example" section below). + +In summary, the usual flow is + + # Determine the supported features + prctl(PR_TASK_ISOLATION_FEATURES, ifeat, 0, 0, 0); + + # Configure the desired features, based on ifeat + if ((ifeat & PR_TASK_ISO_feature1) == ISOL_F_feature1) { + prctl(PR_TASK_ISOLATION_SET, ISOL_F_feature1, params...) + featuremask |= ISOL_F_feature1 + } + + if ((ifeat & ISOL_F_feature2) == ISOL_F_feature2) { + prctl(PR_TASK_ISOLATION_SET, ISOL_F_feature2, params...) + featuremask |= ISOL_F_feature2 + } + + ... + + # Enable isolation (feature set in bitmask), with each + # feature configured as above + prctl(PR_TASK_ISOLATION_REQUEST, featuremask, 0, 0, 0) + +Usage +===== +``PR_TASK_ISOLATION_FEATURES``: + Returns the supported features. Features are defined + at include/uapi/linux/isolation.h. + + Usage:: + + prctl(PR_TASK_ISOLATION_FEATURES, ifeat, 0, 0, 0); + + The 'ifeat' argument is a pointer to a struct isolation_features: + + struct isolation_features { + __u32 flags; + __u32 pad[3]; + }; + + Where flags contains bits set for the features the kernel supports. + +``PR_TASK_ISOLATION_SET``: + Configures task isolation features. Each individual feature is + configured separately via + + prctl(PR_TASK_ISOLATION_SET, PR_TASK_ISO_feature, params...) + +``PR_TASK_ISOLATION_GET``: + Retrieves the currently configured task isolation mode parameters + for feature PR_TASK_ISO_feature (arg1). + + prctl(PR_TASK_ISOLATION_SET, PR_TASK_ISO_feature, params...) + +``PR_TASK_ISOLATION_REQUEST``: + Enter task isolation, with features in featuremask enabled. + supported. This will quiesce any pending activity + on the CPU, and enable mode specific configurations. + +Feature list +============ + +Example +======= + +The ``samples/task_isolation/`` directory contains a sample +application. + Index: linux-2.6-vmstat2/include/uapi/linux/isolation.h =================================================================== --- /dev/null +++ linux-2.6-vmstat2/include/uapi/linux/isolation.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _UAPI_LINUX_ISOL_H +#define _UAPI_LINUX_ISOL_H + +/* For PR_TASK_ISOLATION_FEATURES */ +struct isolation_features { + __u32 flags; + __u32 pad[3]; +}; + +/* Isolation features */ +#define ISOL_F_QUIESCE 0x1 + +#endif /* _UAPI_LINUX_ISOL_H */ + Index: linux-2.6-vmstat2/kernel/isolation.c =================================================================== --- /dev/null +++ linux-2.6-vmstat2/kernel/isolation.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Implementation of task isolation. + * + * Authors: + * Chris Metcalf + * Alex Belits + * Yuri Norov + */ + +#include +#include +#include + +void prctl_task_isolation_features(struct isolation_features *ifeat) +{ + ifeat->flags = ISOL_F_QUIESCE; +} + +int prctl_task_isolation_get(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + return 0; +} + +int prctl_task_isolation_set(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + return 0; +} + +int prctl_task_isolation_request(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + int ret; + int cpu = raw_smp_processor_id(); + + ret = user_quiet_vmstat(cpu); + + return ret; +} Index: linux-2.6-vmstat2/samples/task_isolation/task_isolation.c =================================================================== --- /dev/null +++ linux-2.6-vmstat2/samples/task_isolation/task_isolation.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include + +void main(void) +{ + int ret; + void *buf = malloc(4096); + + struct isolation_features ifeat; + struct isolation_control icontrol; + unsigned long fmask = 0; + + memset(ifeat, 0, sizeof(struct isolation_features)); + + memset(buf, 1, 4096); + ret = mlock(buf, 4096); + if (ret) { + perror("mlock"); + exit(0); + } + + ret = prctl(PR_TASK_ISOLATION_FEATURES, &ifeat, 0, 0, 0); + if (ret == -1) { + perror("prctl"); + exit(0); + } + +#ifdef ISOL_F_QUIESCE + /* enable ISOL_F_QUIESCE */ + if (!(ifeat.flags & ISOL_F_QUIESCE)) { + printf("ISOL_F_QUIESCE not set!\n"); + exit(0); + } + fmask = fmask | ISOL_F_QUIESCE; +#endif + + /* busy loop */ + while (ret != 0) + memset(buf, 0, 10); + +} + Index: linux-2.6-vmstat2/include/linux/isolation.h =================================================================== --- /dev/null +++ linux-2.6-vmstat2/include/linux/isolation.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_ISOL_H +#define __LINUX_ISOL_H + +#include + +void prctl_task_isolation_features(struct isolation_features *ifeat); + +int prctl_task_isolation_get(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5); + +int prctl_task_isolation_set(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5); + +int prctl_task_isolation_request(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5); +#endif /* __LINUX_ISOL_H */ Index: linux-2.6-vmstat2/kernel/Makefile =================================================================== --- linux-2.6-vmstat2.orig/kernel/Makefile +++ linux-2.6-vmstat2/kernel/Makefile @@ -10,7 +10,7 @@ obj-y = fork.o exec_domain.o panic.o extable.o params.o \ kthread.o sys_ni.o nsproxy.o \ notifier.o ksysfs.o cred.o reboot.o \ - async.o range.o smpboot.o ucount.o regset.o + async.o range.o smpboot.o ucount.o regset.o isolation.o obj-$(CONFIG_USERMODE_DRIVER) += usermode_driver.o obj-$(CONFIG_MODULES) += kmod.o Index: linux-2.6-vmstat2/include/linux/vmstat.h =================================================================== --- linux-2.6-vmstat2.orig/include/linux/vmstat.h +++ linux-2.6-vmstat2/include/linux/vmstat.h @@ -290,6 +290,7 @@ void refresh_zone_stat_thresholds(void); struct ctl_table; int vmstat_refresh(struct ctl_table *, int write, void *buffer, size_t *lenp, loff_t *ppos); +int user_quiet_vmstat(int cpu); void drain_zonestat(struct zone *zone, struct per_cpu_pageset *); Index: linux-2.6-vmstat2/mm/vmstat.c =================================================================== --- linux-2.6-vmstat2.orig/mm/vmstat.c +++ linux-2.6-vmstat2/mm/vmstat.c @@ -1936,6 +1936,16 @@ void quiet_vmstat(void) refresh_cpu_vm_stats(false); } +int user_quiet_vmstat(int cpu) +{ + if (need_update(cpu) == true) + refresh_cpu_vm_stats(false); + + flush_delayed_work(per_cpu_ptr(&vmstat_work, cpu)); + + return 0; +} + /* * Shepherd worker thread that checks the * differentials of processors that have their worker