diff mbox

[03/15] coroutine-ucontext: reduce stack size to 64kB

Message ID 1467104499-27517-4-git-send-email-pl@kamp.de (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Lieven June 28, 2016, 9:01 a.m. UTC
evaluation with the recently introduced maximum stack size monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
64kB which should still give enough head room.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 util/coroutine-ucontext.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Paolo Bonzini June 28, 2016, 10:54 a.m. UTC | #1
On 28/06/2016 11:01, Peter Lieven wrote:
> evaluation with the recently introduced maximum stack size monitoring revealed
> that the actual used stack size was never above 4kB so allocating 1MB stack
> for each coroutine is a lot of wasted memory. So reduce the stack size to
> 64kB which should still give enough head room.

If we make the stack this much smaller, there is a non-zero chance of
smashing it.  You must add a guard page if you do this (actually more
than one because QEMU will happily have stack frames as big as 16 KB).
The stack counts for RSS but it's not actually allocated memory, so why
does it matter?

Paolo
Dr. David Alan Gilbert June 28, 2016, 10:57 a.m. UTC | #2
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 28/06/2016 11:01, Peter Lieven wrote:
> > evaluation with the recently introduced maximum stack size monitoring revealed
> > that the actual used stack size was never above 4kB so allocating 1MB stack
> > for each coroutine is a lot of wasted memory. So reduce the stack size to
> > 64kB which should still give enough head room.
> 
> If we make the stack this much smaller, there is a non-zero chance of
> smashing it.  You must add a guard page if you do this (actually more
> than one because QEMU will happily have stack frames as big as 16 KB).
> The stack counts for RSS but it's not actually allocated memory, so why
> does it matter?

I think I'd be interested in seeing the /proc/.../smaps before and after this
change to see if anything is visible and if we can see the difference
in rss etc.

Dave

> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Lieven June 28, 2016, 11:13 a.m. UTC | #3
Am 28.06.2016 um 12:54 schrieb Paolo Bonzini:
>
> On 28/06/2016 11:01, Peter Lieven wrote:
>> evaluation with the recently introduced maximum stack size monitoring revealed
>> that the actual used stack size was never above 4kB so allocating 1MB stack
>> for each coroutine is a lot of wasted memory. So reduce the stack size to
>> 64kB which should still give enough head room.
> If we make the stack this much smaller, there is a non-zero chance of
> smashing it.  You must add a guard page if you do this (actually more
> than one because QEMU will happily have stack frames as big as 16 KB).
> The stack counts for RSS but it's not actually allocated memory, so why
> does it matter?

Is there an easy way to determinate how much of the RSS is actually allocated?
I erroneously it was all allocated....

So as for the stack, the MAP_GROWSDOWN is it really important? Will the kernel
allocate all pages of the stack otherwise if the last page is written?

I am asking because I don't know if MAP_GROWSDOWN is a good idea as Peter
mentioned there were discussions to deprecate it.

Peter
Peter Lieven June 28, 2016, 11:17 a.m. UTC | #4
Am 28.06.2016 um 12:57 schrieb Dr. David Alan Gilbert:
> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>>
>> On 28/06/2016 11:01, Peter Lieven wrote:
>>> evaluation with the recently introduced maximum stack size monitoring revealed
>>> that the actual used stack size was never above 4kB so allocating 1MB stack
>>> for each coroutine is a lot of wasted memory. So reduce the stack size to
>>> 64kB which should still give enough head room.
>> If we make the stack this much smaller, there is a non-zero chance of
>> smashing it.  You must add a guard page if you do this (actually more
>> than one because QEMU will happily have stack frames as big as 16 KB).
>> The stack counts for RSS but it's not actually allocated memory, so why
>> does it matter?
> I think I'd be interested in seeing the /proc/.../smaps before and after this
> change to see if anything is visible and if we can see the difference
> in rss etc.

Can you advise what in smaps should be especially looked at.

As for RSS I can report hat the long term usage is significantly lower.
I had the strange observation that when the VM is running for some minutes
the RSS suddenly increases to the whole stack size.

Peter
Paolo Bonzini June 28, 2016, 11:26 a.m. UTC | #5
----- Original Message -----
> From: "Peter Lieven" <pl@kamp.de>
> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
> Cc: kwolf@redhat.com, "peter maydell" <peter.maydell@linaro.org>, mst@redhat.com, dgilbert@redhat.com,
> mreitz@redhat.com, kraxel@redhat.com
> Sent: Tuesday, June 28, 2016 1:13:26 PM
> Subject: Re: [PATCH 03/15] coroutine-ucontext: reduce stack size to 64kB
> 
> Am 28.06.2016 um 12:54 schrieb Paolo Bonzini:
> >
> > On 28/06/2016 11:01, Peter Lieven wrote:
> >> evaluation with the recently introduced maximum stack size monitoring
> >> revealed
> >> that the actual used stack size was never above 4kB so allocating 1MB
> >> stack
> >> for each coroutine is a lot of wasted memory. So reduce the stack size to
> >> 64kB which should still give enough head room.
> > If we make the stack this much smaller, there is a non-zero chance of
> > smashing it.  You must add a guard page if you do this (actually more
> > than one because QEMU will happily have stack frames as big as 16 KB).
> > The stack counts for RSS but it's not actually allocated memory, so why
> > does it matter?
> 
> Is there an easy way to determinate how much of the RSS is actually
> allocated? I erroneously it was all allocated....
> 
> So as for the stack, the MAP_GROWSDOWN is it really important? Will the
> kernel
> allocate all pages of the stack otherwise if the last page is written?
> 
> I am asking because I don't know if MAP_GROWSDOWN is a good idea as Peter
> mentioned there were discussions to deprecate it.

I don't know, I found those discussions too.  However I've also seen
an interesting patch to ensure a guard page is kept at the bottom of the
VMA.

But thinking more about it, if you use MAP_GROWSDOWN you don't know anymore
where the bottom of the stack and you cannot free it correctly, can you?
Or am I completely misunderstanding the purpose of the flag?

I guess it's better to steer clear of it unless we're ready to look at
kernel code for a while...

Paolo
Dr. David Alan Gilbert June 28, 2016, 11:35 a.m. UTC | #6
* Peter Lieven (pl@kamp.de) wrote:
> Am 28.06.2016 um 12:57 schrieb Dr. David Alan Gilbert:
> > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> > > 
> > > On 28/06/2016 11:01, Peter Lieven wrote:
> > > > evaluation with the recently introduced maximum stack size monitoring revealed
> > > > that the actual used stack size was never above 4kB so allocating 1MB stack
> > > > for each coroutine is a lot of wasted memory. So reduce the stack size to
> > > > 64kB which should still give enough head room.
> > > If we make the stack this much smaller, there is a non-zero chance of
> > > smashing it.  You must add a guard page if you do this (actually more
> > > than one because QEMU will happily have stack frames as big as 16 KB).
> > > The stack counts for RSS but it's not actually allocated memory, so why
> > > does it matter?
> > I think I'd be interested in seeing the /proc/.../smaps before and after this
> > change to see if anything is visible and if we can see the difference
> > in rss etc.
> 
> Can you advise what in smaps should be especially looked at.
> 
> As for RSS I can report hat the long term usage is significantly lower.
> I had the strange observation that when the VM is running for some minutes
> the RSS suddenly increases to the whole stack size.

You can see the Rss of each mapping; if you knew where your stacks were
it would be easy to see if it was the stacks that were Rss and if
there was anything else odd about them.
If you set hte mapping as growsdown then you can see the area that has a 'gd'
in it's VmFlags.

Dave

> 
> Peter
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Lieven June 28, 2016, 12:09 p.m. UTC | #7
Am 28.06.2016 um 13:35 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Am 28.06.2016 um 12:57 schrieb Dr. David Alan Gilbert:
>>> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>>>> On 28/06/2016 11:01, Peter Lieven wrote:
>>>>> evaluation with the recently introduced maximum stack size monitoring revealed
>>>>> that the actual used stack size was never above 4kB so allocating 1MB stack
>>>>> for each coroutine is a lot of wasted memory. So reduce the stack size to
>>>>> 64kB which should still give enough head room.
>>>> If we make the stack this much smaller, there is a non-zero chance of
>>>> smashing it.  You must add a guard page if you do this (actually more
>>>> than one because QEMU will happily have stack frames as big as 16 KB).
>>>> The stack counts for RSS but it's not actually allocated memory, so why
>>>> does it matter?
>>> I think I'd be interested in seeing the /proc/.../smaps before and after this
>>> change to see if anything is visible and if we can see the difference
>>> in rss etc.
>> Can you advise what in smaps should be especially looked at.
>>
>> As for RSS I can report hat the long term usage is significantly lower.
>> I had the strange observation that when the VM is running for some minutes
>> the RSS suddenly increases to the whole stack size.
> You can see the Rss of each mapping; if you knew where your stacks were
> it would be easy to see if it was the stacks that were Rss and if
> there was anything else odd about them.
> If you set hte mapping as growsdown then you can see the area that has a 'gd'
> in it's VmFlags.

Would you expect to see each 1MB allocation in smaps or is it possible that
the kernel merges some mappings to bigger ones?

And more importantly if the regions are merged Paolos comment about we
do not need a guard page would not be true because a coroutine stack could
grow into annother coroutines stack. Looking at the commit from Linus it
would also be good to have that guard page not having the gd flag.

Some of the regions above 1024kB have an RSS of exactly 4kB * (Size / 1024kB)
which leads to the assumption that it is a corouine stack where exactly one page
has been allocated.

I am asking because this is what I e.g. see for a Qemu VM with flags "gd":

cat /proc/5031/smaps | grep -B18 gd
7f808aee7000-7f808b9e6000 rw-p 00000000 00:00 0
Size:              11264 kB
Rss:                  44 kB
Pss:                  44 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        44 kB
Referenced:           44 kB
Anonymous:            44 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f808bb01000-7f8090000000 rw-p 00000000 00:00 0
Size:              70656 kB
Rss:                 276 kB
Pss:                 276 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:       276 kB
Referenced:          276 kB
Anonymous:           276 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f80940ff000-7f80943fe000 rw-p 00000000 00:00 0
Size:               3072 kB
Rss:                  12 kB
Pss:                  12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
Anonymous:            12 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f8095700000-7f80957ff000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f8097301000-7f8097400000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f80974df000-7f80975de000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
7f809760c000-7f809770b000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f8097901000-7f8097a00000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f8097b01000-7f8097c00000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f8097d01000-7f8097e00000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f8197f01000-7f8198000000 rw-p 00000000 00:00 0
Size:               1024 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7f81b4001000-7f81b4200000 rw-p 00000000 00:00 0
Size:               2048 kB
Rss:                  20 kB
Pss:                  20 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        20 kB
Referenced:           20 kB
Anonymous:            20 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac sd
--
7ffd337e2000-7ffd33805000 rw-p 00000000 00:00 0                          [stack]
Size:                144 kB
Rss:                  64 kB
Pss:                  64 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        64 kB
Referenced:           64 kB
Anonymous:            64 kB
AnonHugePages:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac

Peter
Dr. David Alan Gilbert June 28, 2016, 2:20 p.m. UTC | #8
* Peter Lieven (pl@kamp.de) wrote:
> Am 28.06.2016 um 13:35 schrieb Dr. David Alan Gilbert:
> > * Peter Lieven (pl@kamp.de) wrote:
> > > Am 28.06.2016 um 12:57 schrieb Dr. David Alan Gilbert:
> > > > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> > > > > On 28/06/2016 11:01, Peter Lieven wrote:
> > > > > > evaluation with the recently introduced maximum stack size monitoring revealed
> > > > > > that the actual used stack size was never above 4kB so allocating 1MB stack
> > > > > > for each coroutine is a lot of wasted memory. So reduce the stack size to
> > > > > > 64kB which should still give enough head room.
> > > > > If we make the stack this much smaller, there is a non-zero chance of
> > > > > smashing it.  You must add a guard page if you do this (actually more
> > > > > than one because QEMU will happily have stack frames as big as 16 KB).
> > > > > The stack counts for RSS but it's not actually allocated memory, so why
> > > > > does it matter?
> > > > I think I'd be interested in seeing the /proc/.../smaps before and after this
> > > > change to see if anything is visible and if we can see the difference
> > > > in rss etc.
> > > Can you advise what in smaps should be especially looked at.
> > > 
> > > As for RSS I can report hat the long term usage is significantly lower.
> > > I had the strange observation that when the VM is running for some minutes
> > > the RSS suddenly increases to the whole stack size.
> > You can see the Rss of each mapping; if you knew where your stacks were
> > it would be easy to see if it was the stacks that were Rss and if
> > there was anything else odd about them.
> > If you set hte mapping as growsdown then you can see the area that has a 'gd'
> > in it's VmFlags.
> 
> Would you expect to see each 1MB allocation in smaps or is it possible that
> the kernel merges some mappings to bigger ones?
> 
> And more importantly if the regions are merged Paolos comment about we
> do not need a guard page would not be true because a coroutine stack could
> grow into annother coroutines stack. Looking at the commit from Linus it
> would also be good to have that guard page not having the gd flag.

Hmm I'm not sure; one for Paolo.

> Some of the regions above 1024kB have an RSS of exactly 4kB * (Size / 1024kB)
> which leads to the assumption that it is a corouine stack where exactly one page
> has been allocated.
> 
> I am asking because this is what I e.g. see for a Qemu VM with flags "gd":

However, what that does show is that if you add up all the Rss, it's still
near-enough nothing worth worrying about.

Maybe it looks different in the old world before you mmap'd it, you could
try going back to the g_malloc'd version but printf'ing the
address you get, then comparing that with smaps to see what the malloc'd
world ended up with mapped.

Dave

> cat /proc/5031/smaps | grep -B18 gd
> 7f808aee7000-7f808b9e6000 rw-p 00000000 00:00 0
> Size:              11264 kB
> Rss:                  44 kB
> Pss:                  44 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:        44 kB
> Referenced:           44 kB
> Anonymous:            44 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f808bb01000-7f8090000000 rw-p 00000000 00:00 0
> Size:              70656 kB
> Rss:                 276 kB
> Pss:                 276 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:       276 kB
> Referenced:          276 kB
> Anonymous:           276 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f80940ff000-7f80943fe000 rw-p 00000000 00:00 0
> Size:               3072 kB
> Rss:                  12 kB
> Pss:                  12 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:        12 kB
> Referenced:           12 kB
> Anonymous:            12 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f8095700000-7f80957ff000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f8097301000-7f8097400000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f80974df000-7f80975de000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> 7f809760c000-7f809770b000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f8097901000-7f8097a00000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f8097b01000-7f8097c00000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f8097d01000-7f8097e00000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f8197f01000-7f8198000000 rw-p 00000000 00:00 0
> Size:               1024 kB
> Rss:                   4 kB
> Pss:                   4 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:         4 kB
> Referenced:            4 kB
> Anonymous:             4 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7f81b4001000-7f81b4200000 rw-p 00000000 00:00 0
> Size:               2048 kB
> Rss:                  20 kB
> Pss:                  20 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:        20 kB
> Referenced:           20 kB
> Anonymous:            20 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac sd
> --
> 7ffd337e2000-7ffd33805000 rw-p 00000000 00:00 0                          [stack]
> Size:                144 kB
> Rss:                  64 kB
> Pss:                  64 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:        64 kB
> Referenced:           64 kB
> Anonymous:            64 kB
> AnonHugePages:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Locked:                0 kB
> VmFlags: rd wr mr mw me gd ac
> 
> Peter
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Lieven June 30, 2016, 6:34 a.m. UTC | #9
Am 28.06.2016 um 16:20 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Am 28.06.2016 um 13:35 schrieb Dr. David Alan Gilbert:
>>> * Peter Lieven (pl@kamp.de) wrote:
>>>> Am 28.06.2016 um 12:57 schrieb Dr. David Alan Gilbert:
>>>>> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>>>>>> On 28/06/2016 11:01, Peter Lieven wrote:
>>>>>>> evaluation with the recently introduced maximum stack size monitoring revealed
>>>>>>> that the actual used stack size was never above 4kB so allocating 1MB stack
>>>>>>> for each coroutine is a lot of wasted memory. So reduce the stack size to
>>>>>>> 64kB which should still give enough head room.
>>>>>> If we make the stack this much smaller, there is a non-zero chance of
>>>>>> smashing it.  You must add a guard page if you do this (actually more
>>>>>> than one because QEMU will happily have stack frames as big as 16 KB).
>>>>>> The stack counts for RSS but it's not actually allocated memory, so why
>>>>>> does it matter?
>>>>> I think I'd be interested in seeing the /proc/.../smaps before and after this
>>>>> change to see if anything is visible and if we can see the difference
>>>>> in rss etc.
>>>> Can you advise what in smaps should be especially looked at.
>>>>
>>>> As for RSS I can report hat the long term usage is significantly lower.
>>>> I had the strange observation that when the VM is running for some minutes
>>>> the RSS suddenly increases to the whole stack size.
>>> You can see the Rss of each mapping; if you knew where your stacks were
>>> it would be easy to see if it was the stacks that were Rss and if
>>> there was anything else odd about them.
>>> If you set hte mapping as growsdown then you can see the area that has a 'gd'
>>> in it's VmFlags.
>> Would you expect to see each 1MB allocation in smaps or is it possible that
>> the kernel merges some mappings to bigger ones?
>>
>> And more importantly if the regions are merged Paolos comment about we
>> do not need a guard page would not be true because a coroutine stack could
>> grow into annother coroutines stack. Looking at the commit from Linus it
>> would also be good to have that guard page not having the gd flag.
> Hmm I'm not sure; one for Paolo.

My fault. The second mmap call with the pointer to the stack must carry
the MAP_FIXED flag.

Peter
diff mbox

Patch

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 27c61f3..7f1d541 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -88,7 +88,7 @@  static void coroutine_trampoline(int i0, int i1)
     }
 }
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE (1 << 16)
 
 Coroutine *qemu_coroutine_new(void)
 {