From patchwork Mon Mar 28 18:00:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 8680151 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 303BD9F54E for ; Mon, 28 Mar 2016 18:00:38 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 20C1B2026F for ; Mon, 28 Mar 2016 18:00:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EF50320272 for ; Mon, 28 Mar 2016 18:00:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755172AbcC1SAe (ORCPT ); Mon, 28 Mar 2016 14:00:34 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36224 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751156AbcC1SAc (ORCPT ); Mon, 28 Mar 2016 14:00:32 -0400 Received: by mail-wm0-f66.google.com with SMTP id 20so6082815wmh.3 for ; Mon, 28 Mar 2016 11:00:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=hUMZXhnztt4aNmlmIHqf4cGn7GFWtVsJLpeJMFhpGXU=; b=KwX8cSpiZWjHPBGDsw0wqNPFDZT7F81AbdbKxXS99UOPKfYKU6JSiU/8YfvGYMD0AT MOT+FbIra6b6ahG/LBHXQv8js+q7CQq6A6LCltyA0rI0sCT7qX+3OQsrfMAHOi+ERWbl WSMRfhDfDeeDygOztOg1WqOeKTykUooIb743UMspSZkG5e+tl8xPGRn0sCFGmgzOgb7l 2jpgKqlltzHq40LsxrvN6X5uFlP6GhbF8YR/pO+3U9HyhQyg1G5b4pDaNEpSsG68XIuK zkxYsOJMI6rTudqFFggRjYX8FI3VVKBuZg/2FVnkPqSHLHJprKjU6mn5eJ1v5eptJzR0 Xi/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=hUMZXhnztt4aNmlmIHqf4cGn7GFWtVsJLpeJMFhpGXU=; b=DzIVXg0w45QtK+4LWZei62BUGY/hgrJArk7IKmUx+LHiBl/yeMD0R32SZL8L2bQhtH IojEFAqjrTfb+fZR+VvQ3hp98Qj7FV0/5nmNtPRV+QKMF9R1G68SDQYgObdJtMWStIHG JWPMzNaPa7nfZL2UcyHCj7iv8f4j897jx5MReAk7M+fgCAoCg/iDn+o+pnpx2NiLmUxi 1L61XAAc/IarVFGcCKV6qFdP+h+/VnEuDJtBqWpnkQFvIbehKqyUZsgk+HUYYILzbyAv ApPGJ4NVQKuTFbppz+j4VN92jQwON3cQ/FPL1ENVnTNSgU2HPjGmn8IamF7YRcsdPG77 sg4w== X-Gm-Message-State: AD7BkJIaggQRs09Fz8h03B0IWcNDTlXB8p1aFMOUpS2e7+oIGYyqQfUHoH67+f6KS7Dj6Q== X-Received: by 10.194.21.102 with SMTP id u6mr8962592wje.124.1459188031142; Mon, 28 Mar 2016 11:00:31 -0700 (PDT) Received: from node.shutemov.name ([37.44.102.98]) by smtp.gmail.com with ESMTPSA id 198sm11131067wml.22.2016.03.28.11.00.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Mar 2016 11:00:30 -0700 (PDT) Received: by node.shutemov.name (Postfix, from userid 1000) id 6BAE6648D520; Mon, 28 Mar 2016 21:00:29 +0300 (MSK) Date: Mon, 28 Mar 2016 21:00:29 +0300 From: "Kirill A. Shutemov" To: Hugh Dickins Cc: "Kirill A. Shutemov" , Andrea Arcangeli , Andrew Morton , Dave Hansen , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Jerome Marchand , Yang Shi , Sasha Levin , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCHv4 00/25] THP-enabled tmpfs/shmem Message-ID: <20160328180029.GB25200@node.shutemov.name> References: <1457737157-38573-1-git-send-email-kirill.shutemov@linux.intel.com> <20160324091727.GA26796@node.shutemov.name> <20160325150417.GA1851@node.shutemov.name> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Mar 25, 2016 at 05:00:50PM -0700, Hugh Dickins wrote: > On Fri, 25 Mar 2016, Kirill A. Shutemov wrote: > > On Thu, Mar 24, 2016 at 12:08:55PM -0700, Hugh Dickins wrote: > > > On Thu, 24 Mar 2016, Kirill A. Shutemov wrote: > > > > On Wed, Mar 23, 2016 at 01:09:05PM -0700, Hugh Dickins wrote: > > > > > The small files thing formed my first impression. My second > > > > > impression was similar, when I tried mmap(NULL, size_of_RAM, > > > > > PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0) and > > > > > cycled around the arena touching all the pages (which of > > > > > course has to push a little into swap): that soon OOMed. > > > > > > > > > > But there I think you probably just have some minor bug to be fixed: > > > > > I spent a little while trying to debug it, but then decided I'd > > > > > better get back to writing to you. I didn't really understand what > > > > > I was seeing, but when I hacked some stats into shrink_page_list(), > > > > > converting !is_page_cache_freeable(page) to page_cache_references(page) > > > > > to return the difference instead of the bool, a large proportion of > > > > > huge tmpfs pages seemed to have count 1 too high to be freeable at > > > > > that point (and one huge tmpfs page had a count of 3477). > > > > > > > > I'll reply to your other points later, but first I wanted to address this > > > > obvious bug. > > > > > > Thanks. That works better, but is not yet right: memory isn't freed > > > as it should be, so when I exit then try to run a second time, the > > > mmap() just gets ENOMEM (with /proc/sys/vm/overcommit_memory 0): > > > MemFree is low. No rush to fix, I've other stuff to do. > > > > > > I don't get as far as that on the laptop, since the first run is OOM > > > killed while swapping; but I can't vouch for the OOM-kill-correctness > > > of the base tree I'm using, and this laptop has a history of OOMing > > > rather too easily if all's not right. > > > > Hm. I don't see the issue. > > > > I tried to reproduce it in my VM with following script: > > > > #!/bin/sh -efu > > > > swapon -a > > > > ram="$(grep MemTotal /proc/meminfo | sed 's,[^0-9\]\+,,; s, kB,k,')" > > > > usemem -w -f /dev/zero "$ram" > > > > swapoff -a > > swapon -a > > > > usemem -w -f /dev/zero "$ram" > > > > cat /proc/meminfo > > grep thp /proc/vmstat > > > > ----- > > > > usemem is a tool from this archive: > > > > http://www.spinics.net/lists/linux-mm/attachments/gtarazbJaHPaAT.gtar > > > > It works fine even if would double size of mapping. > > > > Do you have a reproducer? > > Yes, my reproducer is simpler (just cycling twice around the arena, > touching each page in order); and I too did not see it running your > script using usemem above. It looks as if that invocation isn't doing > enough work with swap: if I add a "-r 2" to those usemem lines, then > I get "usemem: mmap failed: Cannot allocate memory" on the second. > > I also added a "sleep 2" before the second call to usemem: I'm not sure > of the current state of vmstat, but historically it's slow to gather > back from each cpu to global, and I think it used to leave some cpu > counts stranded indefinitely once upon a time. In my own testing, > I have a /proc/sys/vm/stat_refresh to touch before checking meminfo > or vmstat - and I think the vm_enough_memory() check in mmap() may > need that same care, since it refers to NR_FREE_PAGES etc. > > 8GB is my ramsize, if that matters. I think I found it. I have refcounting screwed up in faultaround. This should fix the problem: diff --git a/mm/filemap.c b/mm/filemap.c index 94c097ec08e7..1325bb4568d1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2292,19 +2292,18 @@ repeat: if (fe->pte) fe->pte += iter.index - last_pgoff; last_pgoff = iter.index; - alloc_set_pte(fe, NULL, page); + if (alloc_set_pte(fe, NULL, page)) + goto unlock; unlock_page(page); - /* Huge page is mapped? No need to proceed. */ - if (pmd_trans_huge(*fe->pmd)) - break; - /* Failed to setup page table? */ - VM_BUG_ON(!fe->pte); goto next; unlock: unlock_page(page); skip: page_cache_release(page); next: + /* Huge page is mapped? No need to proceed. */ + if (pmd_trans_huge(*fe->pmd)) + break; if (iter.index == end_pgoff) break; }