[for-4.17,v1,0/2] xenctrl.ml: improve scalability of domain_getinfolist

Message ID	cover.1667324874.git.edvin.torok@citrix.com (mailing list archive)
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> IronPort-Data: A9a23:N6RAPaBvIxI/PxVW/yvjw5YqxClBgxIJ4kV8jS/XYbTApDN0hjIFn 2EeXmCCPqncZ2v2eN5xPY3gphgH68fUzNUxQQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h yk6QoOdRCzhZiaE/n9BCpC48T8nk/nNHuCnYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs t7pyyHlEAbNNwVcbyRFtspvlDs15K6o4WpA5wRkDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIw0OlRHmdf8 KwjEG4EUkyJtsyxn7f8Rbw57igjBJGD0II3v3hhyXfSDOo8QICFSKLPjTNa9G5u3IYUR6+YP pdHL2o0BPjDS0Qn1lM/BZ4gneHumn7ldD5wo1OJv6snpWPUyWSd1ZC9a4uJIYXQFK25mG6+n WSW4VTdGSolLePDigeA6lWpq9LQyHaTtIU6S+Tjq68CbEeo7nweDlgaWEW2pdG9i1WiQJRPJ koM4C0soKMuskuxQbHVUxSmrVaesxUbWt4WFPc1gCmSzoLE7gDfAXILJgOtc/R/6pVwH2Zzk AbUwZW5XlSDrYF5V1q3+ZGLqyK3aRMtc2Y+OSU1Ywxa5p7s9dRbYg30cv5vF6u8j9vQED72w iyXoCVWu4j/nfLnxI3gowmZ3mvESozhC1dsu16JBj7NAhZRPtbNWmC+1bTMAR+sxq69R0LJg nULktP2AAsmXcDUz3zlrAng8diUCxe53N/02wAH83oJrW7FF5ufkWd4uWgWGauRGpxYEQIFm WeK0e+r2LddPWGxcYh8aJ+rBsIhwMDITIq7CaqIN4EUOcksKWdrGR2CgmbJgQjQfLUEy/lja f93j+7xZZrlNUiX5GXvHLpMuVPa7is/2XnSVfjGI+ePiNKjiIquYe5caDOzghURtvzsTPP9r 4kCbKNnCnx3DIXDX8Ug2dVIdg1WciNrXM6eRg4+XrfrHzeK0VoJU5f5qY7NsaQ890iJvo8kJ k2AZ3I= IronPort-HdrOrdr: A9a23:ThjKAKl0eFrXu0jG9iG89bSea3rpDfIg3DAbv31ZSRFFG/Fxl6 iV8sjzsiWE7wr5OUtQ/exoV5PhfZqxz/JICOoqTNKftWvdyQiVxehZhOOIqVDd8kXFh4hgPM xbEpSWZueeMbEDt7eZ3DWF From: =?utf-8?b?RWR3aW4gVMO2csO2aw==?= <edvin.torok@citrix.com> To: <xen-devel@lists.xenproject.org> CC: <pau.safont@citrix.com>, =?utf-8?b?RWR3aW4gVMO2csO2aw==?= <edvin.torok@citrix.com>, Christian Lindig <christian.lindig@citrix.com>, David Scott <dave@recoil.org>, Wei Liu <wl@xen.org>, Anthony PERARD <anthony.perard@citrix.com> Subject: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of domain_getinfolist Date: Tue, 1 Nov 2022 17:59:15 +0000 Message-ID: <cover.1667324874.git.edvin.torok@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit
Series	xenctrl.ml: improve scalability of domain_getinfolist \| expand [for-4.17,v1,0/2] xenctrl.ml: improve scalability of domain_getinfolist [for-4.17,v1,1/2] xenctrl.ml: make domain_getinfolist tail recursive [for-4.17,v1,2/2] xenctrl: use larger chunksize in domain_getinfolist

Message ID

cover.1667324874.git.edvin.torok@citrix.com (mailing list archive)

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
IronPort-Data: A9a23:N6RAPaBvIxI/PxVW/yvjw5YqxClBgxIJ4kV8jS/XYbTApDN0hjIFn
 2EeXmCCPqncZ2v2eN5xPY3gphgH68fUzNUxQQY4rX1jcSlH+JHPbTi7wuUcHAvJd5GeExg3h
 yk6QoOdRCzhZiaE/n9BCpC48T8nk/nNHuCnYAL9EngZbRd+Tys8gg5Ulec8g4p56fC0GArIs
 t7pyyHlEAbNNwVcbyRFtspvlDs15K6o4WpA5wRkDRx2lAS2e0c9Xcp3yZ6ZdxMUcqEMdsamS
 uDKyq2O/2+x13/B3fv8z94X2mVTKlLjFVDmZkh+AsBOsTAbzsAG6Y4pNeJ0VKtio27hc+ada
 jl6ncfYpQ8BZsUgkQmGOvVSO3kW0aZuoNcrLZUj2CA6IoKvn3bEmp1T4E8K0YIw0OlRHmdf8
 KwjEG4EUkyJtsyxn7f8Rbw57igjBJGD0II3v3hhyXfSDOo8QICFSKLPjTNa9G5u3IYUR6+YP
 pdHL2o0BPjDS0Qn1lM/BZ4gneHumn7ldD5wo1OJv6snpWPUyWSd1ZC9a4uJIYXQFK25mG6+n
 WSW4VTdGSolLePDigeA6lWpq9LQyHaTtIU6S+Tjq68CbEeo7nweDlgaWEW2pdG9i1WiQJRPJ
 koM4C0soKMuskuxQbHVUxSmrVaesxUbWt4WFPc1gCmSzoLE7gDfAXILJgOtc/R/6pVwH2Zzk
 AbUwZW5XlSDrYF5V1q3+ZGLqyK3aRMtc2Y+OSU1Ywxa5p7s9dRbYg30cv5vF6u8j9vQED72w
 iyXoCVWu4j/nfLnxI3gowmZ3mvESozhC1dsu16JBj7NAhZRPtbNWmC+1bTMAR+sxq69R0LJg
 nULktP2AAsmXcDUz3zlrAng8diUCxe53N/02wAH83oJrW7FF5ufkWd4uWgWGauRGpxYEQIFm
 WeK0e+r2LddPWGxcYh8aJ+rBsIhwMDITIq7CaqIN4EUOcksKWdrGR2CgmbJgQjQfLUEy/lja
 f93j+7xZZrlNUiX5GXvHLpMuVPa7is/2XnSVfjGI+ePiNKjiIquYe5caDOzghURtvzsTPP9r
 4kCbKNnCnx3DIXDX8Ug2dVIdg1WciNrXM6eRg4+XrfrHzeK0VoJU5f5qY7NsaQ890iJvo8kJ
 k2AZ3I=
IronPort-HdrOrdr: A9a23:ThjKAKl0eFrXu0jG9iG89bSea3rpDfIg3DAbv31ZSRFFG/Fxl6
 iV8sjzsiWE7wr5OUtQ/exoV5PhfZqxz/JICOoqTNKftWvdyQiVxehZhOOIqVDd8kXFh4hgPM
 xbEpSWZueeMbEDt7eZ3DWF
From: =?utf-8?b?RWR3aW4gVMO2csO2aw==?= <edvin.torok@citrix.com>
To: <xen-devel@lists.xenproject.org>
CC: <pau.safont@citrix.com>,
 =?utf-8?b?RWR3aW4gVMO2csO2aw==?= <edvin.torok@citrix.com>,
 Christian Lindig <christian.lindig@citrix.com>,
 David Scott <dave@recoil.org>, Wei Liu <wl@xen.org>,
 Anthony PERARD <anthony.perard@citrix.com>
Subject: [PATCH for-4.17 v1 0/2] xenctrl.ml: improve scalability of
 domain_getinfolist
Date: Tue, 1 Nov 2022 17:59:15 +0000
Message-ID: <cover.1667324874.git.edvin.torok@citrix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit

Series

xenctrl.ml: improve scalability of domain_getinfolist | expand

Message

Edwin Török Nov. 1, 2022, 5:59 p.m. UTC

Pau has performed some performance tests by booting 1000 mirage
unikernels test VMs and shutting them down.
We've noticed on the flamegraphs that a lot of time is spent in Xenctrl
`domain_getinfolist`, 17.7% of overall time
(This needs to be multiplied by 16 because Dom0 100% usage = 16 vCPUs)
In particular time is spent in camlXenctrl___getlist_339
as can be seen from this flamegraph, and it also creates a very deep
call stack:
https://cdn.jsdelivr.net/gh/edwintorok/xen@xenctrl-coverletter/docs/tmp/perf-merge-boot.svg?x=948.9&y=2213

After some algorithmic improvements to the code now the function barely
shows up at all on a flamegraph, taking only 0.02%.
The function is called camlXenctrl___getlist_343, but that is just due
to the changed arguments, still the same function:
https://cdn.jsdelivr.net/gh/edwintorok/xen@xenctrl-coverletter/docs/tmp/perf-xen-boot-1150.svg?x=1188.0&y=1941&s=infolist

It was calling the Xen hypercall ~500*1000 times for 1000 VMs, and
instead it is now calling it "only" 1000 times.

I would suggest to try to take this in 4.17 given the massive
improvement in scalability (number of VMs on a Xen host).

There are further improvements possible here, but they'll be in xenopsd
(part of XAPI) to avoid calling domain_getinfolist and just use
domain_getinfo: the only reason it needs use infolist is that it does
the lookup by VM UUID and not by domid, but it could have a small cache
of UUID->domid mappings and then call just domain_getinfo (or get the
mapping from xenstore if not in the cache), but it looks like that
improvement is not even needed if this function barely registers on a
flamegraph now.

P.S.: the mirage test VM is a very old PV version, at some point we'll
repeat the test with a Solo5 based PVH one.

Edwin Török (2):
  xenctrl.ml: make domain_getinfolist tail recursive
  xenctrl: use larger chunksize in domain_getinfolist

 tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Comments

Christian Lindig Nov. 2, 2022, 9:11 a.m. UTC | #1

> On 1 Nov 2022, at 17:59, Edwin Török <edvin.torok@citrix.com> wrote:
> 
> 
> Edwin Török (2):
>  xenctrl.ml: make domain_getinfolist tail recursive
>  xenctrl: use larger chunksize in domain_getinfolist
> 
> tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)

Acked-by: Christian Lindig <christian.lindig@citrix.com>


> It was calling the Xen domainfolist hypercall N/2 times.
> Optimize this such that it is called at most 2 times during normal use.
> 
> Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
> and use it instead of calling `@` multiple times.

Are there any assurances about the order in elements returned by domain_getinfolist? I understand that the change maintains the current behaviour but are we even required to maintain that order? Because otherwise we could return the reverse list and save more work.

— C

Edwin Török Nov. 2, 2022, 9:31 a.m. UTC | #2

> On 2 Nov 2022, at 09:11, Christian Lindig <christian.lindig@citrix.com> wrote:
> 
> 
> 
>> On 1 Nov 2022, at 17:59, Edwin Török <edvin.torok@citrix.com> wrote:
>> 
>> 
>> Edwin Török (2):
>> xenctrl.ml: make domain_getinfolist tail recursive
>> xenctrl: use larger chunksize in domain_getinfolist
>> 
>> tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
>> 1 file changed, 18 insertions(+), 7 deletions(-)
> 
> Acked-by: Christian Lindig <christian.lindig@citrix.com>
> 
> 
>> It was calling the Xen domainfolist hypercall N/2 times.
>> Optimize this such that it is called at most 2 times during normal use.
>> 
>> Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
>> and use it instead of calling `@` multiple times.
> 
> Are there any assurances about the order in elements returned by domain_getinfolist? I understand that the change maintains the current behaviour but are we even required to maintain that order? Because otherwise we could return the reverse list and save more work.

Maintaining the current (reversed) order in the C stubs is useful to be able to extract the largest domid so we know where to continue.
However we don't necessarily have to maintain the order on the higher level function available in the .mli, it just happens that `rev_concat` gives us the results in the order that we want.

Although perhaps using an array rather a list here might be better, and we could move resizing/reallocing that array into the C stub, and have just a nicer (and perhaps faster) interface in the C/OCaml API,
by producing less garbage than lists (a single Array.t allocation rather than N list elements).
(Although I don't usually like pushing complexity like that into C stubs, for now I'm happy with the performance of the code as is).

There are more issues in these C stubs in xenctrl, I'll be pouring over the code and try to send a few more patches.

Best regards,
--Edwin

Edwin Török Nov. 2, 2022, 9:37 a.m. UTC | #3

> On 2 Nov 2022, at 09:11, Christian Lindig <christian.lindig@citrix.com> wrote:
> 
> 
> 
>> On 1 Nov 2022, at 17:59, Edwin Török <edvin.torok@citrix.com> wrote:
>> 
>> 
>> Edwin Török (2):
>> xenctrl.ml: make domain_getinfolist tail recursive
>> xenctrl: use larger chunksize in domain_getinfolist
>> 
>> tools/ocaml/libs/xc/xenctrl.ml | 25 ++++++++++++++++++-------
>> 1 file changed, 18 insertions(+), 7 deletions(-)
> 
> Acked-by: Christian Lindig <christian.lindig@citrix.com>
> 
> 
>> It was calling the Xen domainfolist hypercall N/2 times.
>> Optimize this such that it is called at most 2 times during normal use.
>> 
>> Implement a tail recursive `rev_concat` equivalent to `concat |> rev`,
>> and use it instead of calling `@` multiple times.
> 
> Are there any assurances about the order in elements returned by domain_getinfolist? I understand that the change maintains the current behaviour but are we even required to maintain that order? Because otherwise we could return the reverse list and save more work.

After some discussion with Andrew Cooper apparently the xenctrl API is broken and cannot be used safely as is, so I'll be rewriting this patch.
Domids can be assigned randomly, or toolstacks can set a custom domid, so it is not guaranteed that new domids always show up as higher numbers,
and the only safe way to use infolist is to either request 1 domid, or request them all (domids are 15-bit, so 32768).
Anything else is prone to race conditions and might give you an inconsistent snapshot.

This is probably better fixed by changing the prototype of the C function in xenctrl to not take a count to force updating all callers
(the callers in the 'xl' toolstack are just as buggy as the one in this OCaml binding, and probably better to fix the API in xenctrl than working around the race condition in each caller).

Stay tuned for more patches...

Best regards,
--Edwin

Andrew Cooper Nov. 2, 2022, 5:26 p.m. UTC | #4

Hi,

To be clear, what is presented here are clear improvements in the status
quo, and qualify for inclusion on their own merits.  And definitely
should be considered.


That said, this is a swamp with future problems, and one rather
fundamental one in Xen which I is not going to be easy to fix.

1) (simple), there are a bunch of stubs, including
stub_xc_domain_getinfo() which don't use
caml_{enter,leave}_blocking_section() when they should.

2) stub_xc_domain_getinfo() reuses xc_domain_getinfolist() meaning that
it uses XEN_SYSCTL_getdomaininfolist rather than
XEN_DOMCTL_getdomaininfo, which is a problem because...

3) While both of these hypercalls have crazy APIs leading to loads of
broken callers, at least the DOMCTL has a fastpath for when you specify
a valid domid.  The SYSCTL (and DOMCTL slowpath) is an O(N) search of
all domains starting from d0 to find the first domain with an id >= the
input request.

The DOMCTL slowpath is useless.  Not a single caller (I've ever
encountered) wants that behaviour, whereas I've needed to bugfix caller
which didn't have an "&& info.domid == domid" to have one, to get the
semantics they wanted.  Cleaning this up will be a good improvement.

4) The (adjusted) algorithm in patch 1 still loops until it gets a
result with no entries.  Meaning that it's still going to make one
hypercall too many, searching the entire domlist, to return no data.  In
principle you could optimise to stop at any hypercall which returns
fewer than the requested number of domains, except...

5) ... if you ever use more than a single hypercall, Xen has dropped and
re-acquired the domlist read lock, meaning that you can't use the result
anyway.  Causality couldn't be broken when domains were allocated
sequentially, but we have a random allocation mode now so you can
observe things out of order.

The only safe action is to ask for all 32k domains in a single go, and
use a single hypercall.

~Andrew