diff mbox series

[v3,net-next,3/3] net: ena: Add PHC documentation

Message ID 20241103113140.275-4-darinzon@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series PHC support in ENA driver | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 3 maintainers not CCed: horms@kernel.org corbet@lwn.net linux-doc@vger.kernel.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 90 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-03--21-00 (tests: 781)

Commit Message

Arinzon, David Nov. 3, 2024, 11:31 a.m. UTC
Provide the relevant information and guidelines
about the feature support in the ENA driver.

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
---
 .../device_drivers/ethernet/amazon/ena.rst    | 78 +++++++++++++++++++
 1 file changed, 78 insertions(+)

Comments

Jakub Kicinski Nov. 5, 2024, 2:17 a.m. UTC | #1
On Sun, 3 Nov 2024 13:31:39 +0200 David Arinzon wrote:
> +=================   ======================================================
> +**phc_cnt**         Number of successful retrieved timestamps (below expire timeout).
> +**phc_exp**         Number of expired retrieved timestamps (above expire timeout).
> +**phc_skp**         Number of skipped get time attempts (during block period).
> +**phc_err**         Number of failed get time attempts (entering into block state).
> +=================   ======================================================

I seem to recall we had an unpleasant conversation about using standard
stats recently. Please tell me where you looked to check if Linux has
standard stats for packet timestamping. We need to add the right info
there.
Arinzon, David Nov. 5, 2024, 10:52 a.m. UTC | #2
> > +=================
> ======================================================
> > +**phc_cnt**         Number of successful retrieved timestamps (below
> expire timeout).
> > +**phc_exp**         Number of expired retrieved timestamps (above
> expire timeout).
> > +**phc_skp**         Number of skipped get time attempts (during block
> period).
> > +**phc_err**         Number of failed get time attempts (entering into block
> state).
> > +=================
> ======================================================
> 
> I seem to recall we had an unpleasant conversation about using standard
> stats recently. Please tell me where you looked to check if Linux has standard
> stats for packet timestamping. We need to add the right info there.
> --
> pw-bot: cr

Hi Jakub,

Just wanted to clarify that this feature and the associated documentation are specifically intended for reading a HW timestamp,
not for TX/RX packet timestamping.
We reviewed similar drivers that support HW timestamping via `gettime64` and `gettimex64` APIs,
and we couldn't identify any that capture or report statistics related to reading a HW timestamp.
Let us know if further details would be helpful.
Gal Pressman Nov. 5, 2024, 4:16 p.m. UTC | #3
On 05/11/2024 12:52, Arinzon, David wrote:
>>> +=================
>> ======================================================
>>> +**phc_cnt**         Number of successful retrieved timestamps (below
>> expire timeout).
>>> +**phc_exp**         Number of expired retrieved timestamps (above
>> expire timeout).
>>> +**phc_skp**         Number of skipped get time attempts (during block
>> period).
>>> +**phc_err**         Number of failed get time attempts (entering into block
>> state).
>>> +=================
>> ======================================================
>>
>> I seem to recall we had an unpleasant conversation about using standard
>> stats recently. Please tell me where you looked to check if Linux has standard
>> stats for packet timestamping. We need to add the right info there.
>> --
>> pw-bot: cr
> 
> Hi Jakub,
> 
> Just wanted to clarify that this feature and the associated documentation are specifically intended for reading a HW timestamp,
> not for TX/RX packet timestamping.
> We reviewed similar drivers that support HW timestamping via `gettime64` and `gettimex64` APIs,
> and we couldn't identify any that capture or report statistics related to reading a HW timestamp.
> Let us know if further details would be helpful.

David, did you consider Rahul's recent timestamping stats API?
0e9c127729be ("ethtool: add interface to read Tx hardware timestamping
statistics")
Arinzon, David Nov. 5, 2024, 4:52 p.m. UTC | #4
> >>> +=================
> >> ======================================================
> >>> +**phc_cnt**         Number of successful retrieved timestamps (below
> >> expire timeout).
> >>> +**phc_exp**         Number of expired retrieved timestamps (above
> >> expire timeout).
> >>> +**phc_skp**         Number of skipped get time attempts (during block
> >> period).
> >>> +**phc_err**         Number of failed get time attempts (entering into
> block
> >> state).
> >>> +=================
> >> ======================================================
> >>
> >> I seem to recall we had an unpleasant conversation about using
> >> standard stats recently. Please tell me where you looked to check if
> >> Linux has standard stats for packet timestamping. We need to add the
> right info there.
> >> --
> >> pw-bot: cr
> >
> > Hi Jakub,
> >
> > Just wanted to clarify that this feature and the associated
> > documentation are specifically intended for reading a HW timestamp, not
> for TX/RX packet timestamping.
> > We reviewed similar drivers that support HW timestamping via
> > `gettime64` and `gettimex64` APIs, and we couldn't identify any that
> capture or report statistics related to reading a HW timestamp.
> > Let us know if further details would be helpful.
> 
> David, did you consider Rahul's recent timestamping stats API?
> 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping
> statistics")

Hi Gal,

We've looked into the `get_ts_stats` ethtool hook, and it refers to TX HW packet timestamping
and not HW timestamp which is retrieved through `gettime64` and `gettimex64`.
Jakub Kicinski Nov. 6, 2024, 1:28 a.m. UTC | #5
On Tue, 5 Nov 2024 10:52:12 +0000 Arinzon, David wrote:
> Just wanted to clarify that this feature and the associated
> documentation are specifically intended for reading a HW timestamp,
> not for TX/RX packet timestamping.

Oh, so you're saying you can only read the clock from the device?
The word timestamp means time associated with an event.

In the doc you talk about:

> +PHC support and capabilities can be verified using ethtool:
> +
> +.. code-block:: shell
> +
> +  ethtool -T <interface>

which is for packet timestamping

also:

> ENA Linux driver supports PTP hardware clock providing timestamp
> reference to achieve nanosecond accuracy.

You probably want to double check the definitions of accuracy and
resolution.

We recently merged an Amazon PTP clock driver from David Woodhouse, 
see commit 20503272422693. If you're not timestamping packets why
not use that driver?
Rahul Rameshbabu Nov. 6, 2024, 2:02 a.m. UTC | #6
On Tue, 05 Nov, 2024 16:52:11 +0000 "Arinzon, David" <darinzon@amazon.com> wrote:
>> >>> +=================
>> >> ======================================================
>> >>> +**phc_cnt**         Number of successful retrieved timestamps (below
>> >> expire timeout).
>> >>> +**phc_exp**         Number of expired retrieved timestamps (above
>> >> expire timeout).
>> >>> +**phc_skp**         Number of skipped get time attempts (during block
>> >> period).
>> >>> +**phc_err**         Number of failed get time attempts (entering into
>> block
>> >> state).
>> >>> +=================
>> >> ======================================================
>> >>
>> >> I seem to recall we had an unpleasant conversation about using
>> >> standard stats recently. Please tell me where you looked to check if
>> >> Linux has standard stats for packet timestamping. We need to add the
>> right info there.
>> >> --
>> >> pw-bot: cr
>> >
>> > Hi Jakub,
>> >
>> > Just wanted to clarify that this feature and the associated
>> > documentation are specifically intended for reading a HW timestamp, not
>> for TX/RX packet timestamping.
>> > We reviewed similar drivers that support HW timestamping via
>> > `gettime64` and `gettimex64` APIs, and we couldn't identify any that
>> capture or report statistics related to reading a HW timestamp.
>> > Let us know if further details would be helpful.
>> 
>> David, did you consider Rahul's recent timestamping stats API?
>> 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping
>> statistics")
>
> Hi Gal,
>
> We've looked into the `get_ts_stats` ethtool hook, and it refers to TX HW packet timestamping
> and not HW timestamp which is retrieved through `gettime64` and `gettimex64`.

Hi folks,

I think everyone might be on the same page now, but I wanted to provide
some clarifications just in case.

The use case that the TX HW timestamping statistics covers


    ┌─────────────────────────────────────┐
    │                                     │
    │                                     │
    │                              ┌──────┤
    │           NIC hw             │      │             Packets out
    │                              │  PF  ├────────────────────────────────────────────────►
    │                              │      │
    │                              └───┬──┤
    │                                  │  │
    │                                  │  │
    │                                  │  │
    │                             ┌────┘  │
    │                             │       │
    └─────────────────────────────┼───────┘
                                  │Hw timestamp information per packet
    ┌─────────────────────────────┼───────┐
    │                             │       │
    │                       ┌─────▼────┐  │
    │                       │  cmsg    │  │
    │                       │          │  │
    │                       │  queue   │  │
    │                       │          │  │
    │                       └──────────┘  │
    │                                     │
    │                                     │
    │         Linux Kernel Stack          │
    │                                     │
    │                                     │
    │                                     │
    └─────────────────────────────────────┘

We are collecting statistics on every packet being sent out the wire.
The use case being described here


    ┌──────────────────────────────────┐
    │                                  │
    │                                  │
    │                          ┌───────┤
    │            NIC hw        │       │
    │                          │   PF  │
    │    ┌───────────────────┐ │       │
    │    │                   │ └───────┤
    │    │        PHC        │         │
    │    │                   │         │
    │    │                   │         │
    │    │ (just a clock dev)│         │
    │    └─────────┬─────────┘         │
    │              │                   │
    └──────────────┼───────────────────┘
                   │ Query device's clock for current time
    ┌──────────────┼───────────────────┐      ┌─────────────────────────────────┐
    │              │                   │      │                                 │
    │              │                   │      │                                 │
    │   ┌──────────┼─────────────┐     │      │                                 │
    │   │          │             │     │      │    ┌───────────────────────┐    │
    │   │          │             │     │      │    │                       │    │
    │   │          ▼          ───┼─────┼──────┼────┼──────────►            │    │
    │   │                        │     │      │    │                       │    │
    │   │ .gettimex64 callback   │     │      │    │  clock_gettime syscall│    │
    │   └────────────────────────┘     │      │    └───────────────────────┘    │
    │                                  │      │                                 │
    │        Linux Kernel Stack        │      │            Userspace            │
    │                                  │      │                                 │
    └──────────────────────────────────┘      └─────────────────────────────────┘

The model above is about getting the time from the NIC hardware in the
userspace application (which has not involvement with TX/RX traffic).

I do think the phc-to-host related statistics are on the niche side of
things. The following drivers are error free in their gettimex64 paths.

* AMD pesando/ionic
* Broadcom Tigon3
* Intel ixgbe
* Intel igc
* NVIDIA mlxsw
* NVIDIA mlx5_core

The above drivers would definitely not benefit from having "phc
(nic)"-to-host related statistics being presented here. I am more in
favor of making these statistics specific to amazon's ENA driver since I
think most drivers do not have a complex . Also, what value is there in
the count of phc-to-host successful/failed operations versus just
keeping track of the errors in userspace for whoever is calling
clock_gettime. I am somewhat ok with these counters, but I honestly
cannot imagine any practical use to this especially since they are not
related to anything over-the-wire. So the errors in userspace would be
enough of an indicator of whether there is excessive utilization of the
requests and the counters seem redundant to that (at least to me). Feel
free to share how you feel these counters would be helpful beyond
handling the return codes through clock_gettime. I might just be missing
something.

Hope this helps.
Jakub Kicinski Nov. 6, 2024, 2:15 a.m. UTC | #7
On Tue, 05 Nov 2024 18:02:57 -0800 Rahul Rameshbabu wrote:
> I do think the phc-to-host related statistics are on the niche side of
> things. The following drivers are error free in their gettimex64 paths.
> 
> * AMD pesando/ionic
> * Broadcom Tigon3
> * Intel ixgbe
> * Intel igc
> * NVIDIA mlxsw
> * NVIDIA mlx5_core
> 
> The above drivers would definitely not benefit from having "phc
> (nic)"-to-host related statistics being presented here. I am more in
> favor of making these statistics specific to amazon's ENA driver since I
> think most drivers do not have a complex . Also, what value is there in
> the count of phc-to-host successful/failed operations versus just
> keeping track of the errors in userspace for whoever is calling
> clock_gettime. I am somewhat ok with these counters, but I honestly
> cannot imagine any practical use to this especially since they are not
> related to anything over-the-wire. So the errors in userspace would be
> enough of an indicator of whether there is excessive utilization of the
> requests and the counters seem redundant to that (at least to me). Feel
> free to share how you feel these counters would be helpful beyond
> handling the return codes through clock_gettime. I might just be missing
> something.

Agreed, thanks a lot for the analysis. I misread the code.
I looked at ena_com_phc_get() last night and incorrectly 
assumed it's way to complex to be called from gettimex64 :S
Arinzon, David Nov. 12, 2024, 5:53 p.m. UTC | #8
Thank you Rahul for the detailed explanations

Hi Jakub,

> > Just wanted to clarify that this feature and the associated
> > documentation are specifically intended for reading a HW timestamp,
> > not for TX/RX packet timestamping.
> 
> Oh, so you're saying you can only read the clock from the device?
> The word timestamp means time associated with an event.
> 

Based on the documentation of gettimex64 API
The ts parameter holds the PHC timestamp.
We are using the same terminology
https://elixir.bootlin.com/linux/v6.11.6/source/include/linux/ptp_clock_kernel.h#L97

 * @gettimex64:  Reads the current time from the hardware clock and optionally
 *               also the system clock.
 *               parameter ts: Holds the PHC timestamp.
 *               parameter sts: If not NULL, it holds a pair of timestamps from
 *               the system clock. The first reading is made right before
 *               reading the lowest bits of the PHC timestamp and the second
 *               reading immediately follows that.

> In the doc you talk about:
> 
> > +PHC support and capabilities can be verified using ethtool:
> > +
> > +.. code-block:: shell
> > +
> > +  ethtool -T <interface>
> 
> which is for packet timestamping
> 

ethtool -T shows all timestamping capabilities, which indeed include
packet timestamping but also the PTP Hardware Clock (PHC) index
If the value is `none`, it means that there's no PHC support
This is done by implementing the `get_ts_info` hook, which is
part of this patchset.

https://elixir.bootlin.com/linux/v6.11.6/source/include/linux/ethtool.h#L720

> also:
> 
> > ENA Linux driver supports PTP hardware clock providing timestamp
> > reference to achieve nanosecond accuracy.
> 
> You probably want to double check the definitions of accuracy and
> resolution.
> 

Thank you, will be changed in the next patchset

> We recently merged an Amazon PTP clock driver from David Woodhouse,
> see commit 20503272422693. If you're not timestamping packets why not use
> that driver?

The AMZNC10C vmclock device driver is intended to be used in systems where there's an hypervisor.
The PHC driver in this patchset is intended for virtual and non-virtual (metal) instances in AWS.
The AMZNC10C might not be available in the future on the same instances where PHC is available.
diff mbox series

Patch

diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
index 4561e8ab..12665ea8 100644
--- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
+++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
@@ -56,6 +56,7 @@  ena_netdev.[ch]     Main Linux kernel driver.
 ena_ethtool.c       ethtool callbacks.
 ena_xdp.[ch]        XDP files
 ena_pci_id_tbl.h    Supported device IDs.
+ena_phc.[ch]        PTP hardware clock infrastructure (see `PHC`_ for more info)
 =================   ======================================================
 
 Management Interface:
@@ -221,6 +222,83 @@  descriptor it was received on would be recycled. When a packet smaller
 than RX copybreak bytes is received, it is copied into a new memory
 buffer and the RX descriptor is returned to HW.
 
+.. _`PHC`:
+
+PTP Hardware Clock (PHC)
+========================
+.. _`ptp-userspace-api`: https://docs.kernel.org/driver-api/ptp.html#ptp-hardware-clock-user-space-api
+.. _`testptp`: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/ptp/testptp.c
+
+ENA Linux driver supports PTP hardware clock providing timestamp reference to achieve nanosecond accuracy.
+
+**PHC support**
+
+PHC depends on the PTP module, which needs to be either loaded as a module or compiled into the kernel.
+
+Verify if the PTP module is present:
+
+.. code-block:: shell
+
+  grep -w '^CONFIG_PTP_1588_CLOCK=[ym]' /boot/config-`uname -r`
+
+- If no output is provided, the ENA driver cannot be loaded with PHC support.
+
+- ``CONFIG_PTP_1588_CLOCK=y``: the PTP module is already compiled and loaded inside the kernel binary file.
+
+- ``CONFIG_PTP_1588_CLOCK=m``: the PTP module needs to be loaded prior to loading the ENA driver:
+
+Load PTP module:
+
+.. code-block:: shell
+
+  sudo modprobe ptp
+
+All available PTP clock sources can be tracked here:
+
+.. code-block:: shell
+
+  ls /sys/class/ptp
+
+PHC support and capabilities can be verified using ethtool:
+
+.. code-block:: shell
+
+  ethtool -T <interface>
+
+**PHC timestamp**
+
+To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_:
+
+.. code-block:: shell
+
+  testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1
+
+PHC get time requests should be within reasonable bounds,
+avoid excessive utilization to ensure optimal performance and efficiency.
+The ENA device restricts the frequency of PHC get time requests to a maximum
+of 125 requests per second. If this limit is surpassed, the get time request
+will fail, leading to an increment in the phc_err statistic.
+
+**PHC statistics**
+
+PHC can be monitored using :code:`ethtool -S` counters:
+
+=================   ======================================================
+**phc_cnt**         Number of successful retrieved timestamps (below expire timeout).
+**phc_exp**         Number of expired retrieved timestamps (above expire timeout).
+**phc_skp**         Number of skipped get time attempts (during block period).
+**phc_err**         Number of failed get time attempts (entering into block state).
+=================   ======================================================
+
+PHC timeouts:
+
+=================   ======================================================
+**expire**          Max time for a valid timestamp retrieval, passing this threshold will fail
+                    the get time request and block new requests until block timeout.
+**block**           Blocking period starts once get time request expires or fails, all get time
+                    requests during block period will be skipped.
+=================   ======================================================
+
 Statistics
 ==========