Message ID | 1467104499-27517-8-git-send-email-pl@kamp.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Peter Lieven (pl@kamp.de) wrote: > this struct is approx 75kB I wonder why it's so large. The stack size in QmpInputVisitor; it's got a 1024 element stack (QIV_STACK_SIZE) and I bet we never use anywhere near that. But even then that's 1024 * a 3 pointer stack object, 24 bytes - I don't see where the rest of that 75kB comes from. I'm a little wary about turning all these malloc's into mmap's because we do seem to use things like input visitors for small things; don't the cost of doing the mmap's add up in time instead of space? Dave > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > qapi/qmp-input-visitor.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c > index aea90a1..b6f5dfd 100644 > --- a/qapi/qmp-input-visitor.c > +++ b/qapi/qmp-input-visitor.c > @@ -17,6 +17,7 @@ > #include "qapi/qmp-input-visitor.h" > #include "qapi/visitor-impl.h" > #include "qemu/queue.h" > +#include "qemu/mmap-alloc.h" > #include "qemu-common.h" > #include "qapi/qmp/types.h" > #include "qapi/qmp/qerror.h" > @@ -378,14 +379,14 @@ Visitor *qmp_input_get_visitor(QmpInputVisitor *v) > void qmp_input_visitor_cleanup(QmpInputVisitor *v) > { > qobject_decref(v->root); > - g_free(v); > + qemu_anon_ram_munmap(v, sizeof(*v)); > } > > QmpInputVisitor *qmp_input_visitor_new(QObject *obj, bool strict) > { > QmpInputVisitor *v; > > - v = g_malloc0(sizeof(*v)); > + v = qemu_anon_ram_mmap(sizeof(*v)); > > v->visitor.type = VISITOR_INPUT; > v->visitor.start_struct = qmp_input_start_struct; > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 28.06.2016 um 11:29 schrieb Dr. David Alan Gilbert: > * Peter Lieven (pl@kamp.de) wrote: >> this struct is approx 75kB > I wonder why it's so large. > > The stack size in QmpInputVisitor; it's got a 1024 element stack > (QIV_STACK_SIZE) and I bet we never use anywhere near that. > > But even then that's 1024 * a 3 pointer stack object, 24 bytes - > I don't see where the rest of that 75kB comes from. Sorry, I had a wrong size in mind. Its 24736 bytes. But thats still larger than expecetd, right? > I'm a little wary about turning all these malloc's into mmap's > because we do seem to use things like input visitors for small > things; don't the cost of doing the mmap's add up in time > instead of space? Sure, we should discuss this. The series should act as a base for discussion. None of the things I changed into mmap are continously called during VM runtime. So it most likely only happens at setup of the vServer. It seems that the worst impact hat the PhysPageMap in exec.c Peter
On Tue, Jun 28, 2016 at 11:39:03AM +0200, Peter Lieven wrote: > Am 28.06.2016 um 11:29 schrieb Dr. David Alan Gilbert: > > * Peter Lieven (pl@kamp.de) wrote: > > > this struct is approx 75kB > > I wonder why it's so large. > > > > The stack size in QmpInputVisitor; it's got a 1024 element stack > > (QIV_STACK_SIZE) and I bet we never use anywhere near that. > > > > But even then that's 1024 * a 3 pointer stack object, 24 bytes - > > I don't see where the rest of that 75kB comes from. > > Sorry, I had a wrong size in mind. Its 24736 bytes. But thats > still larger than expecetd, right? > > > I'm a little wary about turning all these malloc's into mmap's > > because we do seem to use things like input visitors for small > > things; don't the cost of doing the mmap's add up in time > > instead of space? > > Sure, we should discuss this. The series should act as a base for > discussion. > > None of the things I changed into mmap are continously called > during VM runtime. So it most likely only happens at setup of the > vServer. QmpInputVisitor is used to parse all QMP monitor commands, so will be used continuously throughout life of QEMU, often very frequently. eg When migration is running many monitor commands per second are expected Regards, Daniel
* Daniel P. Berrange (berrange@redhat.com) wrote: > On Tue, Jun 28, 2016 at 11:39:03AM +0200, Peter Lieven wrote: > > Am 28.06.2016 um 11:29 schrieb Dr. David Alan Gilbert: > > > * Peter Lieven (pl@kamp.de) wrote: > > > > this struct is approx 75kB > > > I wonder why it's so large. > > > > > > The stack size in QmpInputVisitor; it's got a 1024 element stack > > > (QIV_STACK_SIZE) and I bet we never use anywhere near that. > > > > > > But even then that's 1024 * a 3 pointer stack object, 24 bytes - > > > I don't see where the rest of that 75kB comes from. > > > > Sorry, I had a wrong size in mind. Its 24736 bytes. But thats > > still larger than expecetd, right? > > > > > I'm a little wary about turning all these malloc's into mmap's > > > because we do seem to use things like input visitors for small > > > things; don't the cost of doing the mmap's add up in time > > > instead of space? > > > > Sure, we should discuss this. The series should act as a base for > > discussion. > > > > None of the things I changed into mmap are continously called > > during VM runtime. So it most likely only happens at setup of the > > vServer. > > QmpInputVisitor is used to parse all QMP monitor commands, so will > be used continuously throughout life of QEMU, often very frequently. > eg When migration is running many monitor commands per second are > expected Does the same input visitor get reused by each command? Dave > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Jun 28, 2016 at 11:17:59AM +0100, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berrange@redhat.com) wrote: > > On Tue, Jun 28, 2016 at 11:39:03AM +0200, Peter Lieven wrote: > > > Am 28.06.2016 um 11:29 schrieb Dr. David Alan Gilbert: > > > > * Peter Lieven (pl@kamp.de) wrote: > > > > > this struct is approx 75kB > > > > I wonder why it's so large. > > > > > > > > The stack size in QmpInputVisitor; it's got a 1024 element stack > > > > (QIV_STACK_SIZE) and I bet we never use anywhere near that. > > > > > > > > But even then that's 1024 * a 3 pointer stack object, 24 bytes - > > > > I don't see where the rest of that 75kB comes from. > > > > > > Sorry, I had a wrong size in mind. Its 24736 bytes. But thats > > > still larger than expecetd, right? > > > > > > > I'm a little wary about turning all these malloc's into mmap's > > > > because we do seem to use things like input visitors for small > > > > things; don't the cost of doing the mmap's add up in time > > > > instead of space? > > > > > > Sure, we should discuss this. The series should act as a base for > > > discussion. > > > > > > None of the things I changed into mmap are continously called > > > during VM runtime. So it most likely only happens at setup of the > > > vServer. > > > > QmpInputVisitor is used to parse all QMP monitor commands, so will > > be used continuously throughout life of QEMU, often very frequently. > > eg When migration is running many monitor commands per second are > > expected > > Does the same input visitor get reused by each command? Opps, actually no i'm wrong, its only used by the 'object_add' command but that does allocate a new one for each invokation. Regards, Daniel
On 28/06/2016 11:01, Peter Lieven wrote: > this struct is approx 75kB > > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > qapi/qmp-input-visitor.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Can you change the stack to a QSLIST instead? That's where most of the waste comes from. Thanks, Paolo > diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c > index aea90a1..b6f5dfd 100644 > --- a/qapi/qmp-input-visitor.c > +++ b/qapi/qmp-input-visitor.c > @@ -17,6 +17,7 @@ > #include "qapi/qmp-input-visitor.h" > #include "qapi/visitor-impl.h" > #include "qemu/queue.h" > +#include "qemu/mmap-alloc.h" > #include "qemu-common.h" > #include "qapi/qmp/types.h" > #include "qapi/qmp/qerror.h" > @@ -378,14 +379,14 @@ Visitor *qmp_input_get_visitor(QmpInputVisitor *v) > void qmp_input_visitor_cleanup(QmpInputVisitor *v) > { > qobject_decref(v->root); > - g_free(v); > + qemu_anon_ram_munmap(v, sizeof(*v)); > } > > QmpInputVisitor *qmp_input_visitor_new(QObject *obj, bool strict) > { > QmpInputVisitor *v; > > - v = g_malloc0(sizeof(*v)); > + v = qemu_anon_ram_mmap(sizeof(*v)); > > v->visitor.type = VISITOR_INPUT; > v->visitor.start_struct = qmp_input_start_struct; >
On 06/28/2016 04:17 AM, Dr. David Alan Gilbert wrote: >> QmpInputVisitor is used to parse all QMP monitor commands, so will >> be used continuously throughout life of QEMU, often very frequently. >> eg When migration is running many monitor commands per second are >> expected > > Does the same input visitor get reused by each command? No; in fact commit f2ff429 changed things to intentionally prevent reuse of a visitor (on the argument that it was easier to do that than to think about corner cases of reset after a partial visit encountered errors). But we can revisit that decision if reusing a static QmpInputVisitor would be wiser than allocating a fresh one for every visit.
On 06/28/2016 05:36 AM, Paolo Bonzini wrote: > > > On 28/06/2016 11:01, Peter Lieven wrote: >> this struct is approx 75kB >> >> Signed-off-by: Peter Lieven <pl@kamp.de> >> --- >> qapi/qmp-input-visitor.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) > > Can you change the stack to a QSLIST instead? That's where most of the > waste comes from. QmpInputVisitor has: struct QmpInputVisitor { Visitor visitor; /* Root of visit at visitor creation. */ QObject *root; /* Stack of objects being visited (all entries will be either * QDict or QList). */ StackObject stack[QIV_STACK_SIZE]; ... while QmpOutputVisitor has: struct QmpOutputVisitor { Visitor visitor; QStack stack; /* Stack of containers that haven't yet been finished */ QObject *root; /* Root of the output visit */ QObject **result; /* User's storage location for result */ }; The extra layer of indirection to a QStack vs. a direct array has tradeoffs, but both Markus and I have commented in the past that both files' stacks are rather wasteful, and we just have not had a reason to improve them. This thread may be the reason :)
Peter Lieven <pl@kamp.de> writes:
> this struct is approx 75kB
32KiB on x86_64, almost entirely eaten by member stack.
As Dan observed, we use many short-lived QmpInputVisitors. However, we
should not be using many simultaneously. That would be necessary for
32KiB objects to have an impact on memory use. Are you sure these have
an impact?
Implementing a stack as "big enough" array can be wasteful.
Implementing it as dynamically allocated list is differently wasteful.
Saving several mallocs and frees can be worth "wasting" a few pages of
memory for a short time.
On 30/06/2016 16:12, Markus Armbruster wrote: > Implementing a stack as "big enough" array can be wasteful. > Implementing it as dynamically allocated list is differently wasteful. > Saving several mallocs and frees can be worth "wasting" a few pages of > memory for a short time. Most usage of QmpInputVisitor at startup comes from object_property_set_qobject, which only sets small scalar objects. The stack is entirely unused in this case. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 30/06/2016 16:12, Markus Armbruster wrote: >> Implementing a stack as "big enough" array can be wasteful. >> Implementing it as dynamically allocated list is differently wasteful. >> Saving several mallocs and frees can be worth "wasting" a few pages of >> memory for a short time. > > Most usage of QmpInputVisitor at startup comes from > object_property_set_qobject, which only sets small scalar objects. The > stack is entirely unused in this case. A quick test run shows ~300 qmp_input_visitor_new() calls during startup, with at most two alive at the same time. Why would it matter whether these are in the order of 150 bytes or 25000 bytes each? How could this materially impact RSS? There's one type of waste here that I understand: we zero the whole QmpInputVisitor on allocation. I'm not opposed to changing how the stack is implemented, I just want to first understand why the current implmementation behaves badly (assuming it does).
Am 04.07.2016 um 13:18 schrieb Markus Armbruster: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 30/06/2016 16:12, Markus Armbruster wrote: >>> Implementing a stack as "big enough" array can be wasteful. >>> Implementing it as dynamically allocated list is differently wasteful. >>> Saving several mallocs and frees can be worth "wasting" a few pages of >>> memory for a short time. >> Most usage of QmpInputVisitor at startup comes from >> object_property_set_qobject, which only sets small scalar objects. The >> stack is entirely unused in this case. > A quick test run shows ~300 qmp_input_visitor_new() calls during > startup, with at most two alive at the same time. > > Why would it matter whether these are in the order of 150 bytes or 25000 > bytes each? How could this materially impact RSS? > > There's one type of waste here that I understand: we zero the whole > QmpInputVisitor on allocation. > > I'm not opposed to changing how the stack is implemented, I just want to > first understand why the current implmementation behaves badly (assuming > it does). The history behind this is that I observed that the RSS usage of Qemu has dramatically increased between Qemu 2.2.0 and 2.5.0. I observed that really clearly since we use hugetblfs everywhere and so I can clearly distinct Qemu memory from VM memory. After having bisected one increase in RSS usage to the introduction of RCU the theory came up that the memory gets fragmented because alloc and dealloc patterns have changed. So I started to trace all malloc calls above 4kB and started to use mmap everywhere where it was possible. To give you an idea of the diffence I observed I'd like to give an example. I have a blade with 22 vServers running on it. Including OS the allocated memory with current master is approx. at 6.5GB. With current master and the following environment set: MALLOC_MMAP_THRESHOLD_=32768 the allocated memory stays at approx. 2GB. Peter
On 04/07/2016 13:18, Markus Armbruster wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 30/06/2016 16:12, Markus Armbruster wrote: >>> Implementing a stack as "big enough" array can be wasteful. >>> Implementing it as dynamically allocated list is differently wasteful. >>> Saving several mallocs and frees can be worth "wasting" a few pages of >>> memory for a short time. >> >> Most usage of QmpInputVisitor at startup comes from >> object_property_set_qobject, which only sets small scalar objects. The >> stack is entirely unused in this case. > > A quick test run shows ~300 qmp_input_visitor_new() calls during > startup, with at most two alive at the same time. > > Why would it matter whether these are in the order of 150 bytes or 25000 > bytes each? How could this materially impact RSS? I think we agree that it doesn't. The question in the subthread is whether we can improve QmpInputVisitor in general. Paolo
diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c index aea90a1..b6f5dfd 100644 --- a/qapi/qmp-input-visitor.c +++ b/qapi/qmp-input-visitor.c @@ -17,6 +17,7 @@ #include "qapi/qmp-input-visitor.h" #include "qapi/visitor-impl.h" #include "qemu/queue.h" +#include "qemu/mmap-alloc.h" #include "qemu-common.h" #include "qapi/qmp/types.h" #include "qapi/qmp/qerror.h" @@ -378,14 +379,14 @@ Visitor *qmp_input_get_visitor(QmpInputVisitor *v) void qmp_input_visitor_cleanup(QmpInputVisitor *v) { qobject_decref(v->root); - g_free(v); + qemu_anon_ram_munmap(v, sizeof(*v)); } QmpInputVisitor *qmp_input_visitor_new(QObject *obj, bool strict) { QmpInputVisitor *v; - v = g_malloc0(sizeof(*v)); + v = qemu_anon_ram_mmap(sizeof(*v)); v->visitor.type = VISITOR_INPUT; v->visitor.start_struct = qmp_input_start_struct;
this struct is approx 75kB Signed-off-by: Peter Lieven <pl@kamp.de> --- qapi/qmp-input-visitor.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)