Message ID | 20220729040116.175015-1-tz.stoyanov@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 39ec10aad3e13c579851e41a8dfa504395e6a5f1 |
Headers | show |
Series | trace-cmd: Do not use instance from trace context | expand |
On Fri, 29 Jul 2022 07:01:16 +0300 "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote: > When trace-cmd initiates a connection to a trace agent over the network, > the logic in connect_to_agent() function incorrectly uses the last > instance saved in the trace context, instead of the actual instance > which is passed as input argument. This works if the remote agent is > set last on the command line, but causes a problem if there is more than > one agent or if there is a local buffer after the agent on the command > line. > > Reported-by: Alexander Aring <aahringo@redhat.com> > Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> > --- > tracecmd/trace-record.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c > index 2406489a..50039dad 100644 > --- a/tracecmd/trace-record.c > +++ b/tracecmd/trace-record.c > @@ -3934,15 +3934,15 @@ static void connect_to_agent(struct common_record_context *ctx, > use_fifos = nr_fifos > 0; > } > > - if (ctx->instance->result) { > + if (instance->result) { Bah, I kept getting confused by when to use instance vs ctx->instance, and I guess I messed this one up. Thanks Tzvetomir on fixing it. -- Steve > role = TRACECMD_TIME_SYNC_ROLE_CLIENT; > - sd = connect_addr(ctx->instance->result); > + sd = connect_addr(instance->result); > if (sd < 0) > die("Failed to connect to host %s:%u", > instance->name, instance->port); > } else { > /* If connecting to a proxy, then this is the guest */ > - if (is_proxy(ctx->instance)) > + if (is_proxy(instance)) > role = TRACECMD_TIME_SYNC_ROLE_GUEST; > else > role = TRACECMD_TIME_SYNC_ROLE_HOST;
Hi, On Fri, Jul 29, 2022 at 4:51 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 29 Jul 2022 07:01:16 +0300 > "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote: > > > When trace-cmd initiates a connection to a trace agent over the network, > > the logic in connect_to_agent() function incorrectly uses the last > > instance saved in the trace context, instead of the actual instance > > which is passed as input argument. This works if the remote agent is > > set last on the command line, but causes a problem if there is more than > > one agent or if there is a local buffer after the agent on the command > > line. > > > > Reported-by: Alexander Aring <aahringo@redhat.com> > > Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> > > --- > > tracecmd/trace-record.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c > > index 2406489a..50039dad 100644 > > --- a/tracecmd/trace-record.c > > +++ b/tracecmd/trace-record.c > > @@ -3934,15 +3934,15 @@ static void connect_to_agent(struct common_record_context *ctx, > > use_fifos = nr_fifos > 0; > > } > > > > - if (ctx->instance->result) { > > + if (instance->result) { > > Bah, I kept getting confused by when to use instance vs ctx->instance, > and I guess I messed this one up. I tested it and it seems to fix the problem..., so if it's not to late: Tested-by: Alexander Aring <aahringo@redhat.com> I am not sure what I should expect from the PTP time synchronization over IP capable interfaces (it never worked for me) but I need to say it is significantly slower than kvm time synchronization with vsock and I am using only virtual interfaces. On the agents I get a couple of: CPU 1: 787 events lost CPU 5: 3059 events lost ... the result looks to me like garbage too, my lock states do not make any sense...(maybe related due the events lost?) However I think we should move this discussion to bugzilla? - Alex
On Fri, 29 Jul 2022 17:48:08 -0400 Alexander Aring <aahringo@redhat.com> wrote: > > > > Bah, I kept getting confused by when to use instance vs ctx->instance, > > and I guess I messed this one up. > > I tested it and it seems to fix the problem..., so if it's not to late: > > Tested-by: Alexander Aring <aahringo@redhat.com> Not too late. I haven't downloaded the patch from patchwork yet (nor my other patches). > > I am not sure what I should expect from the PTP time synchronization > over IP capable interfaces (it never worked for me) but I need to say > it is significantly slower than kvm time synchronization with vsock > and I am using only virtual interfaces. On the agents I get a couple > of: Well, kvm time synchronization doesn't do much between the host and guest. And I'm looking at making it do even less. That's because the kvm synchronization is just "read the offset and shift of the guest from the host and do the calculations via the reader (trace-cmd report or kernelshark)". But the P2P is sending packets back and forth between the host and guest and trying to figure out the round trip latency. > > CPU 1: 787 events lost > CPU 5: 3059 events lost > ... > > the result looks to me like garbage too, my lock states do not make > any sense...(maybe related due the events lost?) This should be investigated. > > However I think we should move this discussion to bugzilla? > Sure. -- Steve
Hi, On Fri, Jul 29, 2022 at 8:55 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 29 Jul 2022 17:48:08 -0400 > Alexander Aring <aahringo@redhat.com> wrote: > > > > > > > Bah, I kept getting confused by when to use instance vs ctx->instance, > > > and I guess I messed this one up. > > > > I tested it and it seems to fix the problem..., so if it's not to late: > > > > Tested-by: Alexander Aring <aahringo@redhat.com> > > Not too late. I haven't downloaded the patch from patchwork yet (nor my > other patches). > > > > > I am not sure what I should expect from the PTP time synchronization > > over IP capable interfaces (it never worked for me) but I need to say > > it is significantly slower than kvm time synchronization with vsock > > and I am using only virtual interfaces. On the agents I get a couple > > of: > > Well, kvm time synchronization doesn't do much between the host and > guest. And I'm looking at making it do even less. That's because the > kvm synchronization is just "read the offset and shift of the guest > from the host and do the calculations via the reader (trace-cmd > report or kernelshark)". But the P2P is sending packets back and forth > between the host and guest and trying to figure out the round trip > latency. > okay. To be more specific it took about ~20 minutes until I saw the "Press Ctrl-C to stop recording..." message for 3 agents. Is that normal? - Alex
On Sat, Jul 30, 2022 at 4:00 AM Alexander Aring <aahringo@redhat.com> wrote: > > Hi, > > On Fri, Jul 29, 2022 at 8:55 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Fri, 29 Jul 2022 17:48:08 -0400 > > Alexander Aring <aahringo@redhat.com> wrote: > > > > > > > > > > Bah, I kept getting confused by when to use instance vs ctx->instance, > > > > and I guess I messed this one up. > > > > > > I tested it and it seems to fix the problem..., so if it's not to late: > > > > > > Tested-by: Alexander Aring <aahringo@redhat.com> > > > > Not too late. I haven't downloaded the patch from patchwork yet (nor my > > other patches). > > > > > > > > I am not sure what I should expect from the PTP time synchronization > > > over IP capable interfaces (it never worked for me) but I need to say > > > it is significantly slower than kvm time synchronization with vsock > > > and I am using only virtual interfaces. On the agents I get a couple > > > of: > > > > Well, kvm time synchronization doesn't do much between the host and > > guest. And I'm looking at making it do even less. That's because the > > kvm synchronization is just "read the offset and shift of the guest > > from the host and do the calculations via the reader (trace-cmd > > report or kernelshark)". But the P2P is sending packets back and forth > > between the host and guest and trying to figure out the round trip > > latency. > > > > okay. To be more specific it took about ~20 minutes until I saw the > "Press Ctrl-C to stop recording..." message for 3 agents. Is that > normal? It doesn't seem normal, usually takes less than a minute when I tested with 2 agents (VMs running on the same host). PTP works by exchanging a lot of packets over the network between the host and the agent, so it depends on the network and on the guest. Currently the PTP logic works sequentially - one agent at a time. That could be optimized, the agents can be processed in parallel. But definitely 20 minutes for 3 agents (~7min per agent) is not normal. > > - Alex >
diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c index 2406489a..50039dad 100644 --- a/tracecmd/trace-record.c +++ b/tracecmd/trace-record.c @@ -3934,15 +3934,15 @@ static void connect_to_agent(struct common_record_context *ctx, use_fifos = nr_fifos > 0; } - if (ctx->instance->result) { + if (instance->result) { role = TRACECMD_TIME_SYNC_ROLE_CLIENT; - sd = connect_addr(ctx->instance->result); + sd = connect_addr(instance->result); if (sd < 0) die("Failed to connect to host %s:%u", instance->name, instance->port); } else { /* If connecting to a proxy, then this is the guest */ - if (is_proxy(ctx->instance)) + if (is_proxy(instance)) role = TRACECMD_TIME_SYNC_ROLE_GUEST; else role = TRACECMD_TIME_SYNC_ROLE_HOST;
When trace-cmd initiates a connection to a trace agent over the network, the logic in connect_to_agent() function incorrectly uses the last instance saved in the trace context, instead of the actual instance which is passed as input argument. This works if the remote agent is set last on the command line, but causes a problem if there is more than one agent or if there is a local buffer after the agent on the command line. Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> --- tracecmd/trace-record.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)