diff mbox series

[v4,1/5] net-timestamp: COMPLETION timestamp on packet tx completion

Message ID b278a4f39101282e2d920fed482b914d23ffaac3.1739988644.git.pav@iki.fi (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series net: Bluetooth: add TX timestamping for ISO/L2CAP/SCO | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers warning 8 maintainers not CCed: linux-doc@vger.kernel.org edumazet@google.com horms@kernel.org corbet@lwn.net kory.maincent@bootlin.com andrew@lunn.ch kuniyu@amazon.com pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 56 this patch: 56
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3307 this patch: 3307
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 56 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 12 this patch: 12
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-20--12-00 (tests: 893)

Commit Message

Pauli Virtanen Feb. 19, 2025, 6:13 p.m. UTC
Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
when hardware reports a packet completed.

Completion tstamp is useful for Bluetooth, as hardware timestamps do not
exist in the HCI specification except for ISO packets, and the hardware
has a queue where packets may wait.  In this case the software SND
timestamp only reflects the kernel-side part of the total latency
(usually small) and queue length (usually 0 unless HW buffers
congested), whereas the completion report time is more informative of
the true latency.

It may also be useful in other cases where HW TX timestamps cannot be
obtained and user wants to estimate an upper bound to when the TX
probably happened.

Signed-off-by: Pauli Virtanen <pav@iki.fi>
---

Notes:
    v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
        together with SND, to save a bit in skb_shared_info.tx_flags
    
        As it then cannot be set per-skb, reject setting it via CMSG.

 Documentation/networking/timestamping.rst | 9 +++++++++
 include/uapi/linux/errqueue.h             | 1 +
 include/uapi/linux/net_tstamp.h           | 6 ++++--
 net/core/sock.c                           | 2 ++
 net/ethtool/common.c                      | 1 +
 5 files changed, 17 insertions(+), 2 deletions(-)

Comments

Jason Xing Feb. 20, 2025, 1:09 a.m. UTC | #1
On Thu, Feb 20, 2025 at 2:15 AM Pauli Virtanen <pav@iki.fi> wrote:
>
> Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
> when hardware reports a packet completed.
>
> Completion tstamp is useful for Bluetooth, as hardware timestamps do not
> exist in the HCI specification except for ISO packets, and the hardware
> has a queue where packets may wait.  In this case the software SND
> timestamp only reflects the kernel-side part of the total latency
> (usually small) and queue length (usually 0 unless HW buffers
> congested), whereas the completion report time is more informative of
> the true latency.
>
> It may also be useful in other cases where HW TX timestamps cannot be
> obtained and user wants to estimate an upper bound to when the TX
> probably happened.
>
> Signed-off-by: Pauli Virtanen <pav@iki.fi>
> ---
>
> Notes:
>     v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
>         together with SND, to save a bit in skb_shared_info.tx_flags
>
>         As it then cannot be set per-skb, reject setting it via CMSG.
>
>  Documentation/networking/timestamping.rst | 9 +++++++++
>  include/uapi/linux/errqueue.h             | 1 +
>  include/uapi/linux/net_tstamp.h           | 6 ++++--
>  net/core/sock.c                           | 2 ++
>  net/ethtool/common.c                      | 1 +
>  5 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index 61ef9da10e28..5034dfe326c0 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -140,6 +140,15 @@ SOF_TIMESTAMPING_TX_ACK:
>    cumulative acknowledgment. The mechanism ignores SACK and FACK.
>    This flag can be enabled via both socket options and control messages.
>
> +SOF_TIMESTAMPING_TX_COMPLETION:
> +  Request tx timestamps on packet tx completion, for the packets that
> +  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion

Is it mandatory for other drivers that will try to use
SOF_TIMESTAMPING_TX_COMPLETION in the future? I can see you coupled
both of them in hci_conn_tx_queue in patch [2/5]. If so, it would be
better if you add the limitation in sock_set_timestamping() so that
the same rule can be applied to other drivers.

But may I ask why you tried to couple them so tight in the version?
Could you say more about this? It's optional, right? IIUC, you
expected the driver to have both timestamps and then calculate the
delta easily?

Thanks,
Jason


> +  timestamp is generated by the kernel when it receives a packet
> +  completion report from the hardware. Hardware may report multiple
> +  packets at once, and completion timestamps reflect the timing of the
> +  report and not actual tx time. This flag can be enabled *only*
> +  via a socket option.
> +
>
>  1.3.2 Timestamp Reporting
>  ^^^^^^^^^^^^^^^^^^^^^^^^^
> diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
> index 3c70e8ac14b8..1ea47309d772 100644
> --- a/include/uapi/linux/errqueue.h
> +++ b/include/uapi/linux/errqueue.h
> @@ -73,6 +73,7 @@ enum {
>         SCM_TSTAMP_SND,         /* driver passed skb to NIC, or HW */
>         SCM_TSTAMP_SCHED,       /* data entered the packet scheduler */
>         SCM_TSTAMP_ACK,         /* data acknowledged by peer */
> +       SCM_TSTAMP_COMPLETION,  /* packet tx completion */
>  };
>
>  #endif /* _UAPI_LINUX_ERRQUEUE_H */
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 55b0ab51096c..383213de612a 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -44,8 +44,9 @@ enum {
>         SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
>         SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>         SOF_TIMESTAMPING_OPT_RX_FILTER = (1 << 17),
> +       SOF_TIMESTAMPING_TX_COMPLETION = (1 << 18),
>
> -       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_RX_FILTER,
> +       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_COMPLETION,
>         SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>                                  SOF_TIMESTAMPING_LAST
>  };
> @@ -58,7 +59,8 @@ enum {
>  #define SOF_TIMESTAMPING_TX_RECORD_MASK        (SOF_TIMESTAMPING_TX_HARDWARE | \
>                                          SOF_TIMESTAMPING_TX_SOFTWARE | \
>                                          SOF_TIMESTAMPING_TX_SCHED | \
> -                                        SOF_TIMESTAMPING_TX_ACK)
> +                                        SOF_TIMESTAMPING_TX_ACK | \
> +                                        SOF_TIMESTAMPING_TX_COMPLETION)
>
>  /**
>   * struct so_timestamping - SO_TIMESTAMPING parameter
> diff --git a/net/core/sock.c b/net/core/sock.c
> index a197f0a0b878..76a5d5cb1e56 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2933,6 +2933,8 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>                 tsflags = *(u32 *)CMSG_DATA(cmsg);
>                 if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK)
>                         return -EINVAL;
> +               if (tsflags & SOF_TIMESTAMPING_TX_COMPLETION)
> +                       return -EINVAL;
>
>                 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>                 sockc->tsflags |= tsflags;
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 5489d0c9d13f..ed4d6a9f4d7e 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -473,6 +473,7 @@ const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
>         [const_ilog2(SOF_TIMESTAMPING_BIND_PHC)]     = "bind-phc",
>         [const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)]   = "option-id-tcp",
>         [const_ilog2(SOF_TIMESTAMPING_OPT_RX_FILTER)] = "option-rx-filter",
> +       [const_ilog2(SOF_TIMESTAMPING_TX_COMPLETION)] = "tx-completion",
>  };
>  static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);
>
> --
> 2.48.1
>
>
Willem de Bruijn Feb. 20, 2025, 2:35 a.m. UTC | #2
Jason Xing wrote:
> On Thu, Feb 20, 2025 at 2:15 AM Pauli Virtanen <pav@iki.fi> wrote:
> >
> > Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
> > when hardware reports a packet completed.
> >
> > Completion tstamp is useful for Bluetooth, as hardware timestamps do not
> > exist in the HCI specification except for ISO packets, and the hardware
> > has a queue where packets may wait.  In this case the software SND
> > timestamp only reflects the kernel-side part of the total latency
> > (usually small) and queue length (usually 0 unless HW buffers
> > congested), whereas the completion report time is more informative of
> > the true latency.
> >
> > It may also be useful in other cases where HW TX timestamps cannot be
> > obtained and user wants to estimate an upper bound to when the TX
> > probably happened.
> >
> > Signed-off-by: Pauli Virtanen <pav@iki.fi>
> > ---
> >
> > Notes:
> >     v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
> >         together with SND, to save a bit in skb_shared_info.tx_flags
> >
> >         As it then cannot be set per-skb, reject setting it via CMSG.
> >
> >  Documentation/networking/timestamping.rst | 9 +++++++++
> >  include/uapi/linux/errqueue.h             | 1 +
> >  include/uapi/linux/net_tstamp.h           | 6 ++++--
> >  net/core/sock.c                           | 2 ++
> >  net/ethtool/common.c                      | 1 +
> >  5 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > index 61ef9da10e28..5034dfe326c0 100644
> > --- a/Documentation/networking/timestamping.rst
> > +++ b/Documentation/networking/timestamping.rst
> > @@ -140,6 +140,15 @@ SOF_TIMESTAMPING_TX_ACK:
> >    cumulative acknowledgment. The mechanism ignores SACK and FACK.
> >    This flag can be enabled via both socket options and control messages.
> >
> > +SOF_TIMESTAMPING_TX_COMPLETION:
> > +  Request tx timestamps on packet tx completion, for the packets that
> > +  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion
> 
> Is it mandatory for other drivers that will try to use
> SOF_TIMESTAMPING_TX_COMPLETION in the future? I can see you coupled
> both of them in hci_conn_tx_queue in patch [2/5]. If so, it would be
> better if you add the limitation in sock_set_timestamping() so that
> the same rule can be applied to other drivers.
> 
> But may I ask why you tried to couple them so tight in the version?
> Could you say more about this? It's optional, right? IIUC, you
> expected the driver to have both timestamps and then calculate the
> delta easily?

This is a workaround around the limited number of bits available in
skb_shared_info.tx_flags.

Pauli could claim last available bit 7.. but then you would need to
find another bit for SKBTX_BPF ;)

FWIW I think we could probably free up 1 or 2 bits if we look closely,
e.g., of SKBTX_HW_TSTAMP_USE_CYCLES or SKBTX_WIFI_STATUS.



But given that two uses for those bits are in review at the same time,
I doubt that this is the last such feature that is added.

This workaround is clever. Only, we're leaking implementation
limitations into the API, making it API non-uniform and thus more
complex.

On the one hand, the feature should work just like all the existing
ones, and thus be configurable independently, and maintaining a
consistent API. But, this does require a tx_flags bit. And the
same for each subsequent such feature that gets proposed.

On the other, if we can anticipate more such additional requests,
then perhaps it does make sense to use only one bit in the skb to
encode whether the skb is sampled (in this case SKBTX_SW_TSTAMP),
and use per-socket sk_tsflags to encode which types of timestamps
to generate for such sampled packets.

This is more or less how sampling in the new BPF mode works too.

It is just not how SO_TIMESTAMPING works for existing bits.


There's something to be said for either approach IMHO.
Jason Xing Feb. 20, 2025, 4:30 a.m. UTC | #3
On Thu, Feb 20, 2025 at 10:35 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Thu, Feb 20, 2025 at 2:15 AM Pauli Virtanen <pav@iki.fi> wrote:
> > >
> > > Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
> > > when hardware reports a packet completed.
> > >
> > > Completion tstamp is useful for Bluetooth, as hardware timestamps do not
> > > exist in the HCI specification except for ISO packets, and the hardware
> > > has a queue where packets may wait.  In this case the software SND
> > > timestamp only reflects the kernel-side part of the total latency
> > > (usually small) and queue length (usually 0 unless HW buffers
> > > congested), whereas the completion report time is more informative of
> > > the true latency.
> > >
> > > It may also be useful in other cases where HW TX timestamps cannot be
> > > obtained and user wants to estimate an upper bound to when the TX
> > > probably happened.
> > >
> > > Signed-off-by: Pauli Virtanen <pav@iki.fi>
> > > ---
> > >
> > > Notes:
> > >     v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
> > >         together with SND, to save a bit in skb_shared_info.tx_flags
> > >
> > >         As it then cannot be set per-skb, reject setting it via CMSG.
> > >
> > >  Documentation/networking/timestamping.rst | 9 +++++++++
> > >  include/uapi/linux/errqueue.h             | 1 +
> > >  include/uapi/linux/net_tstamp.h           | 6 ++++--
> > >  net/core/sock.c                           | 2 ++
> > >  net/ethtool/common.c                      | 1 +
> > >  5 files changed, 17 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > > index 61ef9da10e28..5034dfe326c0 100644
> > > --- a/Documentation/networking/timestamping.rst
> > > +++ b/Documentation/networking/timestamping.rst
> > > @@ -140,6 +140,15 @@ SOF_TIMESTAMPING_TX_ACK:
> > >    cumulative acknowledgment. The mechanism ignores SACK and FACK.
> > >    This flag can be enabled via both socket options and control messages.
> > >
> > > +SOF_TIMESTAMPING_TX_COMPLETION:
> > > +  Request tx timestamps on packet tx completion, for the packets that
> > > +  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion
> >
> > Is it mandatory for other drivers that will try to use
> > SOF_TIMESTAMPING_TX_COMPLETION in the future? I can see you coupled
> > both of them in hci_conn_tx_queue in patch [2/5]. If so, it would be
> > better if you add the limitation in sock_set_timestamping() so that
> > the same rule can be applied to other drivers.
> >
> > But may I ask why you tried to couple them so tight in the version?
> > Could you say more about this? It's optional, right? IIUC, you
> > expected the driver to have both timestamps and then calculate the
> > delta easily?
>
> This is a workaround around the limited number of bits available in
> skb_shared_info.tx_flags.

Oh, I'm surprised I missed the point even though I revisited the
previous discussion.

Pauli, please add the limitation when users setsockopt in
sock_set_timestamping() :)

>
> Pauli could claim last available bit 7.. but then you would need to
> find another bit for SKBTX_BPF ;)

Right :D

>
> FWIW I think we could probably free up 1 or 2 bits if we look closely,
> e.g., of SKBTX_HW_TSTAMP_USE_CYCLES or SKBTX_WIFI_STATUS.

Good. Will you submit a patch series to do that, or...?

>
> But given that two uses for those bits are in review at the same time,
> I doubt that this is the last such feature that is added.
>
> This workaround is clever. Only, we're leaking implementation
> limitations into the API, making it API non-uniform and thus more
> complex.

Probably not a big deal because previously OPT_ID_TCP is also bound with OPT_ID?

>
> On the one hand, the feature should work just like all the existing
> ones, and thus be configurable independently, and maintaining a
> consistent API. But, this does require a tx_flags bit. And the
> same for each subsequent such feature that gets proposed.
>
> On the other, if we can anticipate more such additional requests,
> then perhaps it does make sense to use only one bit in the skb to
> encode whether the skb is sampled (in this case SKBTX_SW_TSTAMP),
> and use per-socket sk_tsflags to encode which types of timestamps
> to generate for such sampled packets.

Very good idea, I think.

>
> This is more or less how sampling in the new BPF mode works too.
>
> It is just not how SO_TIMESTAMPING works for existing bits.

If needed, especially when a new bit is added, I think we can refactor
this part in the future? Or can we adjust it in advance? Speaking of
the design itself, it's really good :)

Thanks,
Jason

>
>
> There's something to be said for either approach IMHO.
Willem de Bruijn Feb. 20, 2025, 3:37 p.m. UTC | #4
Jason Xing wrote:
> On Thu, Feb 20, 2025 at 10:35 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Thu, Feb 20, 2025 at 2:15 AM Pauli Virtanen <pav@iki.fi> wrote:
> > > >
> > > > Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
> > > > when hardware reports a packet completed.
> > > >
> > > > Completion tstamp is useful for Bluetooth, as hardware timestamps do not
> > > > exist in the HCI specification except for ISO packets, and the hardware
> > > > has a queue where packets may wait.  In this case the software SND
> > > > timestamp only reflects the kernel-side part of the total latency
> > > > (usually small) and queue length (usually 0 unless HW buffers
> > > > congested), whereas the completion report time is more informative of
> > > > the true latency.
> > > >
> > > > It may also be useful in other cases where HW TX timestamps cannot be
> > > > obtained and user wants to estimate an upper bound to when the TX
> > > > probably happened.
> > > >
> > > > Signed-off-by: Pauli Virtanen <pav@iki.fi>
> > > > ---
> > > >
> > > > Notes:
> > > >     v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
> > > >         together with SND, to save a bit in skb_shared_info.tx_flags
> > > >
> > > >         As it then cannot be set per-skb, reject setting it via CMSG.
> > > >
> > > >  Documentation/networking/timestamping.rst | 9 +++++++++
> > > >  include/uapi/linux/errqueue.h             | 1 +
> > > >  include/uapi/linux/net_tstamp.h           | 6 ++++--
> > > >  net/core/sock.c                           | 2 ++
> > > >  net/ethtool/common.c                      | 1 +
> > > >  5 files changed, 17 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > > > index 61ef9da10e28..5034dfe326c0 100644
> > > > --- a/Documentation/networking/timestamping.rst
> > > > +++ b/Documentation/networking/timestamping.rst
> > > > @@ -140,6 +140,15 @@ SOF_TIMESTAMPING_TX_ACK:
> > > >    cumulative acknowledgment. The mechanism ignores SACK and FACK.
> > > >    This flag can be enabled via both socket options and control messages.
> > > >
> > > > +SOF_TIMESTAMPING_TX_COMPLETION:
> > > > +  Request tx timestamps on packet tx completion, for the packets that
> > > > +  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion
> > >
> > > Is it mandatory for other drivers that will try to use
> > > SOF_TIMESTAMPING_TX_COMPLETION in the future? I can see you coupled
> > > both of them in hci_conn_tx_queue in patch [2/5]. If so, it would be
> > > better if you add the limitation in sock_set_timestamping() so that
> > > the same rule can be applied to other drivers.
> > >
> > > But may I ask why you tried to couple them so tight in the version?
> > > Could you say more about this? It's optional, right? IIUC, you
> > > expected the driver to have both timestamps and then calculate the
> > > delta easily?
> >
> > This is a workaround around the limited number of bits available in
> > skb_shared_info.tx_flags.
> 
> Oh, I'm surprised I missed the point even though I revisited the
> previous discussion.
> 
> Pauli, please add the limitation when users setsockopt in
> sock_set_timestamping() :)
> 
> >
> > Pauli could claim last available bit 7.. but then you would need to
> > find another bit for SKBTX_BPF ;)
> 
> Right :D
> 
> >
> > FWIW I think we could probably free up 1 or 2 bits if we look closely,
> > e.g., of SKBTX_HW_TSTAMP_USE_CYCLES or SKBTX_WIFI_STATUS.
> 
> Good. Will you submit a patch series to do that, or...?

Reclaiming space is really up to whoever needs it.

I'll take a quick look, just to see if there is an obvious path and
we can postpone this whole conversation to next time we need a bit.

> 
> >
> > But given that two uses for those bits are in review at the same time,
> > I doubt that this is the last such feature that is added.
> >
> > This workaround is clever. Only, we're leaking implementation
> > limitations into the API, making it API non-uniform and thus more
> > complex.
> 
> Probably not a big deal because previously OPT_ID_TCP is also bound with OPT_ID?

That is also unfortunate. Ideally we had never needed OPT_ID_TCP.
 
> >
> > On the one hand, the feature should work just like all the existing
> > ones, and thus be configurable independently, and maintaining a
> > consistent API. But, this does require a tx_flags bit. And the
> > same for each subsequent such feature that gets proposed.
> >
> > On the other, if we can anticipate more such additional requests,
> > then perhaps it does make sense to use only one bit in the skb to
> > encode whether the skb is sampled (in this case SKBTX_SW_TSTAMP),
> > and use per-socket sk_tsflags to encode which types of timestamps
> > to generate for such sampled packets.
> 
> Very good idea, I think.

After giving it some thought, I prefer the first option to maintain
the same API. Let's see if we can harvest a bit and use that for this
new completion point.

> 
> >
> > This is more or less how sampling in the new BPF mode works too.
> >
> > It is just not how SO_TIMESTAMPING works for existing bits.
> 
> If needed, especially when a new bit is added, I think we can refactor
> this part in the future? Or can we adjust it in advance? Speaking of
> the design itself, it's really good :)

We cannot change existing behavior.

> Thanks,
> Jason
> 
> >
> >
> > There's something to be said for either approach IMHO.
Willem de Bruijn Feb. 20, 2025, 11:19 p.m. UTC | #5
Willem de Bruijn wrote:
> Jason Xing wrote:
> > On Thu, Feb 20, 2025 at 10:35 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Thu, Feb 20, 2025 at 2:15 AM Pauli Virtanen <pav@iki.fi> wrote:
> > > > >
> > > > > Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
> > > > > when hardware reports a packet completed.
> > > > >
> > > > > Completion tstamp is useful for Bluetooth, as hardware timestamps do not
> > > > > exist in the HCI specification except for ISO packets, and the hardware
> > > > > has a queue where packets may wait.  In this case the software SND
> > > > > timestamp only reflects the kernel-side part of the total latency
> > > > > (usually small) and queue length (usually 0 unless HW buffers
> > > > > congested), whereas the completion report time is more informative of
> > > > > the true latency.
> > > > >
> > > > > It may also be useful in other cases where HW TX timestamps cannot be
> > > > > obtained and user wants to estimate an upper bound to when the TX
> > > > > probably happened.
> > > > >
> > > > > Signed-off-by: Pauli Virtanen <pav@iki.fi>
> > > > > ---
> > > > >
> > > > > Notes:
> > > > >     v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
> > > > >         together with SND, to save a bit in skb_shared_info.tx_flags
> > > > >
> > > > >         As it then cannot be set per-skb, reject setting it via CMSG.
> > > > >
> > > > >  Documentation/networking/timestamping.rst | 9 +++++++++
> > > > >  include/uapi/linux/errqueue.h             | 1 +
> > > > >  include/uapi/linux/net_tstamp.h           | 6 ++++--
> > > > >  net/core/sock.c                           | 2 ++
> > > > >  net/ethtool/common.c                      | 1 +
> > > > >  5 files changed, 17 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > > > > index 61ef9da10e28..5034dfe326c0 100644
> > > > > --- a/Documentation/networking/timestamping.rst
> > > > > +++ b/Documentation/networking/timestamping.rst
> > > > > @@ -140,6 +140,15 @@ SOF_TIMESTAMPING_TX_ACK:
> > > > >    cumulative acknowledgment. The mechanism ignores SACK and FACK.
> > > > >    This flag can be enabled via both socket options and control messages.
> > > > >
> > > > > +SOF_TIMESTAMPING_TX_COMPLETION:
> > > > > +  Request tx timestamps on packet tx completion, for the packets that
> > > > > +  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion
> > > >
> > > > Is it mandatory for other drivers that will try to use
> > > > SOF_TIMESTAMPING_TX_COMPLETION in the future? I can see you coupled
> > > > both of them in hci_conn_tx_queue in patch [2/5]. If so, it would be
> > > > better if you add the limitation in sock_set_timestamping() so that
> > > > the same rule can be applied to other drivers.
> > > >
> > > > But may I ask why you tried to couple them so tight in the version?
> > > > Could you say more about this? It's optional, right? IIUC, you
> > > > expected the driver to have both timestamps and then calculate the
> > > > delta easily?
> > >
> > > This is a workaround around the limited number of bits available in
> > > skb_shared_info.tx_flags.
> > 
> > Oh, I'm surprised I missed the point even though I revisited the
> > previous discussion.
> > 
> > Pauli, please add the limitation when users setsockopt in
> > sock_set_timestamping() :)
> > 
> > >
> > > Pauli could claim last available bit 7.. but then you would need to
> > > find another bit for SKBTX_BPF ;)
> > 
> > Right :D
> > 
> > >
> > > FWIW I think we could probably free up 1 or 2 bits if we look closely,
> > > e.g., of SKBTX_HW_TSTAMP_USE_CYCLES or SKBTX_WIFI_STATUS.
> > 
> > Good. Will you submit a patch series to do that, or...?
> 
> Reclaiming space is really up to whoever needs it.
> 
> I'll take a quick look, just to see if there is an obvious path and
> we can postpone this whole conversation to next time we need a bit.

SKBTX_HW_TSTAMP_USE_CYCLES is only true if SOF_TIMESTAMPING_BIND_PHC.
It cannot be set per cmsg (is not in SOF_TIMESTAMPING_TX_RECORD_MASK),
so no need to record it per skb.

It only has two drivers using it, which can easily be updated:

	-                if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_USE_CYCLES)
	+                if (skb->sk &&
	+                    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_BIND_PHC)
					tx_flags |= IGC_TX_FLAGS_TSTAMP_TIMER_1;

They later call skb_tstamp_tx, which does nothing if !skb->sk.
Only cost is a higher cost of accessing the sk cacheline.

SKBTX_WIFI_STATUS essentially follows the same argument. It can only
be set in the sockopt. It has a handful more callsites that would need
to be updated. sock_flag(sk, SOCK_WIFI_STATUS) will be tested without
the socket lock held. But this is already the case in the UDP lockless
fast path through ip_make_skb.

SKBTX_HW_TSTAMP_NETDEV is only used on Rx. Could shadow another bit
that is used only on Tx.

SKBTX_IN_PROGRESS is only used by the driver to suppress the software
tx timestamp from skb_tx_timestamp if a later hardware timestamp will
be generated. Predates SOF_TIMESTAMPING_OPT_TX_SWHW.

In short plenty of bits we can reclaim if we try.

SKBTX_BPF was just merged, so we will have to reclaim one. The first
one seems most straightforward.
Jason Xing Feb. 21, 2025, 12:26 a.m. UTC | #6
On Fri, Feb 21, 2025 at 7:19 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Willem de Bruijn wrote:
> > Jason Xing wrote:
> > > On Thu, Feb 20, 2025 at 10:35 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > On Thu, Feb 20, 2025 at 2:15 AM Pauli Virtanen <pav@iki.fi> wrote:
> > > > > >
> > > > > > Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp
> > > > > > when hardware reports a packet completed.
> > > > > >
> > > > > > Completion tstamp is useful for Bluetooth, as hardware timestamps do not
> > > > > > exist in the HCI specification except for ISO packets, and the hardware
> > > > > > has a queue where packets may wait.  In this case the software SND
> > > > > > timestamp only reflects the kernel-side part of the total latency
> > > > > > (usually small) and queue length (usually 0 unless HW buffers
> > > > > > congested), whereas the completion report time is more informative of
> > > > > > the true latency.
> > > > > >
> > > > > > It may also be useful in other cases where HW TX timestamps cannot be
> > > > > > obtained and user wants to estimate an upper bound to when the TX
> > > > > > probably happened.
> > > > > >
> > > > > > Signed-off-by: Pauli Virtanen <pav@iki.fi>
> > > > > > ---
> > > > > >
> > > > > > Notes:
> > > > > >     v4: changed SOF_TIMESTAMPING_TX_COMPLETION to only emit COMPLETION
> > > > > >         together with SND, to save a bit in skb_shared_info.tx_flags
> > > > > >
> > > > > >         As it then cannot be set per-skb, reject setting it via CMSG.
> > > > > >
> > > > > >  Documentation/networking/timestamping.rst | 9 +++++++++
> > > > > >  include/uapi/linux/errqueue.h             | 1 +
> > > > > >  include/uapi/linux/net_tstamp.h           | 6 ++++--
> > > > > >  net/core/sock.c                           | 2 ++
> > > > > >  net/ethtool/common.c                      | 1 +
> > > > > >  5 files changed, 17 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > > > > > index 61ef9da10e28..5034dfe326c0 100644
> > > > > > --- a/Documentation/networking/timestamping.rst
> > > > > > +++ b/Documentation/networking/timestamping.rst
> > > > > > @@ -140,6 +140,15 @@ SOF_TIMESTAMPING_TX_ACK:
> > > > > >    cumulative acknowledgment. The mechanism ignores SACK and FACK.
> > > > > >    This flag can be enabled via both socket options and control messages.
> > > > > >
> > > > > > +SOF_TIMESTAMPING_TX_COMPLETION:
> > > > > > +  Request tx timestamps on packet tx completion, for the packets that
> > > > > > +  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion
> > > > >
> > > > > Is it mandatory for other drivers that will try to use
> > > > > SOF_TIMESTAMPING_TX_COMPLETION in the future? I can see you coupled
> > > > > both of them in hci_conn_tx_queue in patch [2/5]. If so, it would be
> > > > > better if you add the limitation in sock_set_timestamping() so that
> > > > > the same rule can be applied to other drivers.
> > > > >
> > > > > But may I ask why you tried to couple them so tight in the version?
> > > > > Could you say more about this? It's optional, right? IIUC, you
> > > > > expected the driver to have both timestamps and then calculate the
> > > > > delta easily?
> > > >
> > > > This is a workaround around the limited number of bits available in
> > > > skb_shared_info.tx_flags.
> > >
> > > Oh, I'm surprised I missed the point even though I revisited the
> > > previous discussion.
> > >
> > > Pauli, please add the limitation when users setsockopt in
> > > sock_set_timestamping() :)
> > >
> > > >
> > > > Pauli could claim last available bit 7.. but then you would need to
> > > > find another bit for SKBTX_BPF ;)
> > >
> > > Right :D
> > >
> > > >
> > > > FWIW I think we could probably free up 1 or 2 bits if we look closely,
> > > > e.g., of SKBTX_HW_TSTAMP_USE_CYCLES or SKBTX_WIFI_STATUS.
> > >
> > > Good. Will you submit a patch series to do that, or...?
> >
> > Reclaiming space is really up to whoever needs it.
> >
> > I'll take a quick look, just to see if there is an obvious path and
> > we can postpone this whole conversation to next time we need a bit.
>
> SKBTX_HW_TSTAMP_USE_CYCLES is only true if SOF_TIMESTAMPING_BIND_PHC.
> It cannot be set per cmsg (is not in SOF_TIMESTAMPING_TX_RECORD_MASK),
> so no need to record it per skb.

Those flags look like sub-features to me, not like the completion one.
Occupying one bit in the skb is luxury for them.

>
> It only has two drivers using it, which can easily be updated:
>
>         -                if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_USE_CYCLES)
>         +                if (skb->sk &&
>         +                    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_BIND_PHC)
>                                         tx_flags |= IGC_TX_FLAGS_TSTAMP_TIMER_1;
>
> They later call skb_tstamp_tx, which does nothing if !skb->sk.
> Only cost is a higher cost of accessing the sk cacheline.
>
> SKBTX_WIFI_STATUS essentially follows the same argument. It can only
> be set in the sockopt. It has a handful more callsites that would need
> to be updated. sock_flag(sk, SOCK_WIFI_STATUS) will be tested without
> the socket lock held. But this is already the case in the UDP lockless
> fast path through ip_make_skb.
>
> SKBTX_HW_TSTAMP_NETDEV is only used on Rx. Could shadow another bit
> that is used only on Tx.
>
> SKBTX_IN_PROGRESS is only used by the driver to suppress the software
> tx timestamp from skb_tx_timestamp if a later hardware timestamp will
> be generated. Predates SOF_TIMESTAMPING_OPT_TX_SWHW.

Thanks for the detailed analysis. I just checked them out and agreed.

> In short plenty of bits we can reclaim if we try.
>
> SKBTX_BPF was just merged, so we will have to reclaim one.

It's worth knowing that we probably will work on top of the bpf-next
net branch if so.

Do you want to reclaim every possible bit in one go? One series can
complete the work. // If there is anything, feel free to ask me to
implement/co-work :)

> The first one seems most straightforward.

If there are more flags than the tx_flags can have in the future, I
think we can turn to the second method you mentioned.

Could we harvest one or more to have a better uniformed design before
working on this completion feature, I'm wondering?

Thanks,
Jason
diff mbox series

Patch

diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
index 61ef9da10e28..5034dfe326c0 100644
--- a/Documentation/networking/timestamping.rst
+++ b/Documentation/networking/timestamping.rst
@@ -140,6 +140,15 @@  SOF_TIMESTAMPING_TX_ACK:
   cumulative acknowledgment. The mechanism ignores SACK and FACK.
   This flag can be enabled via both socket options and control messages.
 
+SOF_TIMESTAMPING_TX_COMPLETION:
+  Request tx timestamps on packet tx completion, for the packets that
+  also have SOF_TIMESTAMPING_TX_SOFTWARE enabled.  The completion
+  timestamp is generated by the kernel when it receives a packet
+  completion report from the hardware. Hardware may report multiple
+  packets at once, and completion timestamps reflect the timing of the
+  report and not actual tx time. This flag can be enabled *only*
+  via a socket option.
+
 
 1.3.2 Timestamp Reporting
 ^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index 3c70e8ac14b8..1ea47309d772 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -73,6 +73,7 @@  enum {
 	SCM_TSTAMP_SND,		/* driver passed skb to NIC, or HW */
 	SCM_TSTAMP_SCHED,	/* data entered the packet scheduler */
 	SCM_TSTAMP_ACK,		/* data acknowledged by peer */
+	SCM_TSTAMP_COMPLETION,	/* packet tx completion */
 };
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 55b0ab51096c..383213de612a 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -44,8 +44,9 @@  enum {
 	SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
 	SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
 	SOF_TIMESTAMPING_OPT_RX_FILTER = (1 << 17),
+	SOF_TIMESTAMPING_TX_COMPLETION = (1 << 18),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_RX_FILTER,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_COMPLETION,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
@@ -58,7 +59,8 @@  enum {
 #define SOF_TIMESTAMPING_TX_RECORD_MASK	(SOF_TIMESTAMPING_TX_HARDWARE | \
 					 SOF_TIMESTAMPING_TX_SOFTWARE | \
 					 SOF_TIMESTAMPING_TX_SCHED | \
-					 SOF_TIMESTAMPING_TX_ACK)
+					 SOF_TIMESTAMPING_TX_ACK | \
+					 SOF_TIMESTAMPING_TX_COMPLETION)
 
 /**
  * struct so_timestamping - SO_TIMESTAMPING parameter
diff --git a/net/core/sock.c b/net/core/sock.c
index a197f0a0b878..76a5d5cb1e56 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2933,6 +2933,8 @@  int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 		tsflags = *(u32 *)CMSG_DATA(cmsg);
 		if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK)
 			return -EINVAL;
+		if (tsflags & SOF_TIMESTAMPING_TX_COMPLETION)
+			return -EINVAL;
 
 		sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
 		sockc->tsflags |= tsflags;
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 5489d0c9d13f..ed4d6a9f4d7e 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -473,6 +473,7 @@  const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
 	[const_ilog2(SOF_TIMESTAMPING_BIND_PHC)]     = "bind-phc",
 	[const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)]   = "option-id-tcp",
 	[const_ilog2(SOF_TIMESTAMPING_OPT_RX_FILTER)] = "option-rx-filter",
+	[const_ilog2(SOF_TIMESTAMPING_TX_COMPLETION)] = "tx-completion",
 };
 static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);