trace-cmd: Do not use instance from trace context

Message ID	20220729040116.175015-1-tz.stoyanov@gmail.com (mailing list archive)
State	Accepted
Commit	39ec10aad3e13c579851e41a8dfa504395e6a5f1
Headers	show Return-Path: <linux-trace-devel-owner@kernel.org> From: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> To: rostedt@goodmis.org, aahringo@redhat.com, linux-trace-devel@vger.kernel.org Subject: [PATCH] trace-cmd: Do not use instance from trace context Date: Fri, 29 Jul 2022 07:01:16 +0300 Message-Id: <20220729040116.175015-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	trace-cmd: Do not use instance from trace context \| expand trace-cmd: Do not use instance from trace context

Tzvetomir Stoyanov (VMware) July 29, 2022, 4:01 a.m. UTC

When trace-cmd initiates a connection to a trace agent over the network,
the logic in connect_to_agent() function incorrectly uses the last
instance saved in the trace context, instead of the actual instance
which is passed as input argument. This works if the remote agent is
set last on the command line, but causes a problem if there is more than
one agent or if there is a local buffer after the agent on the command
line.

Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 tracecmd/trace-record.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Steven Rostedt July 29, 2022, 8:50 p.m. UTC | #1

On Fri, 29 Jul 2022 07:01:16 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:

> When trace-cmd initiates a connection to a trace agent over the network,
> the logic in connect_to_agent() function incorrectly uses the last
> instance saved in the trace context, instead of the actual instance
> which is passed as input argument. This works if the remote agent is
> set last on the command line, but causes a problem if there is more than
> one agent or if there is a local buffer after the agent on the command
> line.
> 
> Reported-by: Alexander Aring <aahringo@redhat.com>
> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
> ---
>  tracecmd/trace-record.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c
> index 2406489a..50039dad 100644
> --- a/tracecmd/trace-record.c
> +++ b/tracecmd/trace-record.c
> @@ -3934,15 +3934,15 @@ static void connect_to_agent(struct common_record_context *ctx,
>  		use_fifos = nr_fifos > 0;
>  	}
>  
> -	if (ctx->instance->result) {
> +	if (instance->result) {

Bah, I kept getting confused by when to use instance vs ctx->instance,
and I guess I messed this one up.

Thanks Tzvetomir on fixing it.

-- Steve


>  		role = TRACECMD_TIME_SYNC_ROLE_CLIENT;
> -		sd = connect_addr(ctx->instance->result);
> +		sd = connect_addr(instance->result);
>  		if (sd < 0)
>  			die("Failed to connect to host %s:%u",
>  			    instance->name, instance->port);
>  	} else {
>  		/* If connecting to a proxy, then this is the guest */
> -		if (is_proxy(ctx->instance))
> +		if (is_proxy(instance))
>  			role = TRACECMD_TIME_SYNC_ROLE_GUEST;
>  		else
>  			role = TRACECMD_TIME_SYNC_ROLE_HOST;

Alexander Aring July 29, 2022, 9:48 p.m. UTC | #2

Hi,

On Fri, Jul 29, 2022 at 4:51 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 29 Jul 2022 07:01:16 +0300
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
> > When trace-cmd initiates a connection to a trace agent over the network,
> > the logic in connect_to_agent() function incorrectly uses the last
> > instance saved in the trace context, instead of the actual instance
> > which is passed as input argument. This works if the remote agent is
> > set last on the command line, but causes a problem if there is more than
> > one agent or if there is a local buffer after the agent on the command
> > line.
> >
> > Reported-by: Alexander Aring <aahringo@redhat.com>
> > Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
> > ---
> >  tracecmd/trace-record.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c
> > index 2406489a..50039dad 100644
> > --- a/tracecmd/trace-record.c
> > +++ b/tracecmd/trace-record.c
> > @@ -3934,15 +3934,15 @@ static void connect_to_agent(struct common_record_context *ctx,
> >               use_fifos = nr_fifos > 0;
> >       }
> >
> > -     if (ctx->instance->result) {
> > +     if (instance->result) {
>
> Bah, I kept getting confused by when to use instance vs ctx->instance,
> and I guess I messed this one up.

I tested it and it seems to fix the problem..., so if it's not to late:

Tested-by: Alexander Aring <aahringo@redhat.com>

I am not sure what I should expect from the PTP time synchronization
over IP capable interfaces (it never worked for me) but I need to say
it is significantly slower than kvm time synchronization with vsock
and I am using only virtual interfaces. On the agents I get a couple
of:

CPU 1: 787 events lost
CPU 5: 3059 events lost
...

the result looks to me like garbage too, my lock states do not make
any sense...(maybe related due the events lost?)

However I think we should move this discussion to bugzilla?

- Alex

Steven Rostedt July 30, 2022, 12:54 a.m. UTC | #3

On Fri, 29 Jul 2022 17:48:08 -0400
Alexander Aring <aahringo@redhat.com> wrote:

> >
> > Bah, I kept getting confused by when to use instance vs ctx->instance,
> > and I guess I messed this one up.  
> 
> I tested it and it seems to fix the problem..., so if it's not to late:
> 
> Tested-by: Alexander Aring <aahringo@redhat.com>

Not too late. I haven't downloaded the patch from patchwork yet (nor my
other patches).

> 
> I am not sure what I should expect from the PTP time synchronization
> over IP capable interfaces (it never worked for me) but I need to say
> it is significantly slower than kvm time synchronization with vsock
> and I am using only virtual interfaces. On the agents I get a couple
> of:

Well, kvm time synchronization doesn't do much between the host and
guest. And I'm looking at making it do even less. That's because the
kvm synchronization is just "read the offset and shift of the guest
from the host and do the calculations via the reader (trace-cmd
report or kernelshark)". But the P2P is sending packets back and forth
between the host and guest and trying to figure out the round trip
latency.

> 
> CPU 1: 787 events lost
> CPU 5: 3059 events lost
> ...
> 
> the result looks to me like garbage too, my lock states do not make
> any sense...(maybe related due the events lost?)

This should be investigated.

> 
> However I think we should move this discussion to bugzilla?
> 

Sure.

-- Steve

Alexander Aring July 30, 2022, 12:59 a.m. UTC | #4

Hi,

On Fri, Jul 29, 2022 at 8:55 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 29 Jul 2022 17:48:08 -0400
> Alexander Aring <aahringo@redhat.com> wrote:
>
> > >
> > > Bah, I kept getting confused by when to use instance vs ctx->instance,
> > > and I guess I messed this one up.
> >
> > I tested it and it seems to fix the problem..., so if it's not to late:
> >
> > Tested-by: Alexander Aring <aahringo@redhat.com>
>
> Not too late. I haven't downloaded the patch from patchwork yet (nor my
> other patches).
>
> >
> > I am not sure what I should expect from the PTP time synchronization
> > over IP capable interfaces (it never worked for me) but I need to say
> > it is significantly slower than kvm time synchronization with vsock
> > and I am using only virtual interfaces. On the agents I get a couple
> > of:
>
> Well, kvm time synchronization doesn't do much between the host and
> guest. And I'm looking at making it do even less. That's because the
> kvm synchronization is just "read the offset and shift of the guest
> from the host and do the calculations via the reader (trace-cmd
> report or kernelshark)". But the P2P is sending packets back and forth
> between the host and guest and trying to figure out the round trip
> latency.
>

okay. To be more specific it took about ~20 minutes until I saw the
"Press Ctrl-C to stop recording..." message for 3 agents. Is that
normal?

- Alex

Tzvetomir Stoyanov (VMware) July 30, 2022, 4:04 a.m. UTC | #5

On Sat, Jul 30, 2022 at 4:00 AM Alexander Aring <aahringo@redhat.com> wrote:
>
> Hi,
>
> On Fri, Jul 29, 2022 at 8:55 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Fri, 29 Jul 2022 17:48:08 -0400
> > Alexander Aring <aahringo@redhat.com> wrote:
> >
> > > >
> > > > Bah, I kept getting confused by when to use instance vs ctx->instance,
> > > > and I guess I messed this one up.
> > >
> > > I tested it and it seems to fix the problem..., so if it's not to late:
> > >
> > > Tested-by: Alexander Aring <aahringo@redhat.com>
> >
> > Not too late. I haven't downloaded the patch from patchwork yet (nor my
> > other patches).
> >
> > >
> > > I am not sure what I should expect from the PTP time synchronization
> > > over IP capable interfaces (it never worked for me) but I need to say
> > > it is significantly slower than kvm time synchronization with vsock
> > > and I am using only virtual interfaces. On the agents I get a couple
> > > of:
> >
> > Well, kvm time synchronization doesn't do much between the host and
> > guest. And I'm looking at making it do even less. That's because the
> > kvm synchronization is just "read the offset and shift of the guest
> > from the host and do the calculations via the reader (trace-cmd
> > report or kernelshark)". But the P2P is sending packets back and forth
> > between the host and guest and trying to figure out the round trip
> > latency.
> >
>
> okay. To be more specific it took about ~20 minutes until I saw the
> "Press Ctrl-C to stop recording..." message for 3 agents. Is that
> normal?

It doesn't seem normal, usually takes less than a minute when I tested
with 2 agents (VMs running on the same host). PTP works by exchanging
a lot of packets over the network between the host and the agent, so
it depends on the network and on the guest. Currently the PTP logic
works sequentially - one agent at a time. That could be optimized, the
agents can be processed in parallel. But definitely 20 minutes for 3
agents (~7min per agent) is not normal.

>
> - Alex
>

trace-cmd: Do not use instance from trace context

Commit Message

Comments

Patch