diff mbox

[v4,2/3] libxl: print message how to recover from xl cpupool-cpu-remove errors

Message ID 1457587634-22819-3-git-send-email-jgross@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jürgen Groß March 10, 2016, 5:27 a.m. UTC
An error occurring when calling "xl cpupool-cpu-remove" might leave
the system in a state where a cpu is neither completely free nor in
a cpupool. This can easily be repaired by adding the cpu via
"xl cpupool-cpu-add" to the cpupool where it was removed from before.
Print a message telling this the user in case of an error.

Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/xl_cmdimpl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Ian Jackson April 14, 2016, 4:06 p.m. UTC | #1
Juergen Gross writes ("[PATCH v4 2/3] libxl: print message how to recover from xl cpupool-cpu-remove errors"):
> An error occurring when calling "xl cpupool-cpu-remove" might leave
> the system in a state where a cpu is neither completely free nor in
> a cpupool.

Surely this is a bug.  Can it not be avoided ?

> This can easily be repaired by adding the cpu via
> "xl cpupool-cpu-add" to the cpupool where it was removed from before.
> Print a message telling this the user in case of an error.
...
> -    if (libxl_cpupool_cpuremove_cpumap(ctx, poolid, &cpumap))
> -        fprintf(stderr, "some cpus may not have been removed from %s\n", pool);
> +    if (libxl_cpupool_cpuremove_cpumap(ctx, poolid, &cpumap)) {
> +        fprintf(stderr, "Some cpus may have not or only partially been removed from '%s'.\n", pool);
> +        fprintf(stderr, "If a cpu can't be added to another cpupool, add it to '%s' again and retry.\n", pool);
> +    }

If it can't be avoided then I guess this will have to do but I remain
to be convinced.

In any case, while you're editing this code, can you please wrap your
long lines.

Thanks,
Ian.
Dario Faggioli April 14, 2016, 5:10 p.m. UTC | #2
On Thu, 2016-04-14 at 17:06 +0100, Ian Jackson wrote:
> Juergen Gross writes ("[PATCH v4 2/3] libxl: print message how to
> recover from xl cpupool-cpu-remove errors"):
> > 
> > An error occurring when calling "xl cpupool-cpu-remove" might leave
> > the system in a state where a cpu is neither completely free nor in
> > a cpupool.
> Surely this is a bug.  Can it not be avoided ?
> 
Not easily (and in general not with any patch that I'd consider
appropriate for this phase of the release process), as it depends on
transient situations in the hypervisor, such as lock contention on
scheduling data structures.

> > This can easily be repaired by adding the cpu via
> > "xl cpupool-cpu-add" to the cpupool where it was removed from
> > before.
> > Print a message telling this the user in case of an error.
> ...
> > 
> > -    if (libxl_cpupool_cpuremove_cpumap(ctx, poolid, &cpumap))
> > -        fprintf(stderr, "some cpus may not have been removed from
> > %s\n", pool);
> > +    if (libxl_cpupool_cpuremove_cpumap(ctx, poolid, &cpumap)) {
> > +        fprintf(stderr, "Some cpus may have not or only partially
> > been removed from '%s'.\n", pool);
> > +        fprintf(stderr, "If a cpu can't be added to another
> > cpupool, add it to '%s' again and retry.\n", pool);
> > +    }
> If it can't be avoided then I guess this will have to do but I remain
> to be convinced.
> 
And in fact, it's not something that is introduced by this series,
which is, with this patch, just taking the chance to document things
better (although, this series introduces one more way for the issue to
occur).

Doing some retries at levels lower than this would minimize the chance
of the user actually getting to deal with the problem. For eaxmple,
what's done in libxc... but as you pointed out, that introduces other
problems, so I'm not sure. :-/

Dario
Ian Jackson April 14, 2016, 5:22 p.m. UTC | #3
Dario Faggioli writes ("Re: [Xen-devel] [PATCH v4 2/3] libxl: print message how to recover from xl cpupool-cpu-remove errors"):
> Not easily (and in general not with any patch that I'd consider
> appropriate for this phase of the release process), as it depends on
> transient situations in the hypervisor, such as lock contention on
> scheduling data structures.

I think it would be best to carry on this conversation in response to
my fixup mini-series, which contains a docs patch full of `xxx' :-).

Thanks,
Ian.
Dario Faggioli April 14, 2016, 5:27 p.m. UTC | #4
On Thu, 2016-04-14 at 18:22 +0100, Ian Jackson wrote:
> Dario Faggioli writes ("Re: [Xen-devel] [PATCH v4 2/3] libxl: print
> message how to recover from xl cpupool-cpu-remove errors"):
> > 
> > Not easily (and in general not with any patch that I'd consider
> > appropriate for this phase of the release process), as it depends
> > on
> > transient situations in the hypervisor, such as lock contention on
> > scheduling data structures.
> I think it would be best to carry on this conversation in response to
> my fixup mini-series, which contains a docs patch full of `xxx' :-).
> 
Yep, sorry, I've just saw it. I'll reply better tomorrow.

Thanks and Regards,
Dario
diff mbox

Patch

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 990d3c9..411473d 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7716,8 +7716,10 @@  int main_cpupoolcpuremove(int argc, char **argv)
         goto out;
     }
 
-    if (libxl_cpupool_cpuremove_cpumap(ctx, poolid, &cpumap))
-        fprintf(stderr, "some cpus may not have been removed from %s\n", pool);
+    if (libxl_cpupool_cpuremove_cpumap(ctx, poolid, &cpumap)) {
+        fprintf(stderr, "Some cpus may have not or only partially been removed from '%s'.\n", pool);
+        fprintf(stderr, "If a cpu can't be added to another cpupool, add it to '%s' again and retry.\n", pool);
+    }
 
     rc = EXIT_SUCCESS;