Message ID | 20241103113140.275-4-darinzon@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | PHC support in ENA driver | expand |
On Sun, 3 Nov 2024 13:31:39 +0200 David Arinzon wrote: > +================= ====================================================== > +**phc_cnt** Number of successful retrieved timestamps (below expire timeout). > +**phc_exp** Number of expired retrieved timestamps (above expire timeout). > +**phc_skp** Number of skipped get time attempts (during block period). > +**phc_err** Number of failed get time attempts (entering into block state). > +================= ====================================================== I seem to recall we had an unpleasant conversation about using standard stats recently. Please tell me where you looked to check if Linux has standard stats for packet timestamping. We need to add the right info there.
> > +================= > ====================================================== > > +**phc_cnt** Number of successful retrieved timestamps (below > expire timeout). > > +**phc_exp** Number of expired retrieved timestamps (above > expire timeout). > > +**phc_skp** Number of skipped get time attempts (during block > period). > > +**phc_err** Number of failed get time attempts (entering into block > state). > > +================= > ====================================================== > > I seem to recall we had an unpleasant conversation about using standard > stats recently. Please tell me where you looked to check if Linux has standard > stats for packet timestamping. We need to add the right info there. > -- > pw-bot: cr Hi Jakub, Just wanted to clarify that this feature and the associated documentation are specifically intended for reading a HW timestamp, not for TX/RX packet timestamping. We reviewed similar drivers that support HW timestamping via `gettime64` and `gettimex64` APIs, and we couldn't identify any that capture or report statistics related to reading a HW timestamp. Let us know if further details would be helpful.
On 05/11/2024 12:52, Arinzon, David wrote: >>> +================= >> ====================================================== >>> +**phc_cnt** Number of successful retrieved timestamps (below >> expire timeout). >>> +**phc_exp** Number of expired retrieved timestamps (above >> expire timeout). >>> +**phc_skp** Number of skipped get time attempts (during block >> period). >>> +**phc_err** Number of failed get time attempts (entering into block >> state). >>> +================= >> ====================================================== >> >> I seem to recall we had an unpleasant conversation about using standard >> stats recently. Please tell me where you looked to check if Linux has standard >> stats for packet timestamping. We need to add the right info there. >> -- >> pw-bot: cr > > Hi Jakub, > > Just wanted to clarify that this feature and the associated documentation are specifically intended for reading a HW timestamp, > not for TX/RX packet timestamping. > We reviewed similar drivers that support HW timestamping via `gettime64` and `gettimex64` APIs, > and we couldn't identify any that capture or report statistics related to reading a HW timestamp. > Let us know if further details would be helpful. David, did you consider Rahul's recent timestamping stats API? 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping statistics")
> >>> +================= > >> ====================================================== > >>> +**phc_cnt** Number of successful retrieved timestamps (below > >> expire timeout). > >>> +**phc_exp** Number of expired retrieved timestamps (above > >> expire timeout). > >>> +**phc_skp** Number of skipped get time attempts (during block > >> period). > >>> +**phc_err** Number of failed get time attempts (entering into > block > >> state). > >>> +================= > >> ====================================================== > >> > >> I seem to recall we had an unpleasant conversation about using > >> standard stats recently. Please tell me where you looked to check if > >> Linux has standard stats for packet timestamping. We need to add the > right info there. > >> -- > >> pw-bot: cr > > > > Hi Jakub, > > > > Just wanted to clarify that this feature and the associated > > documentation are specifically intended for reading a HW timestamp, not > for TX/RX packet timestamping. > > We reviewed similar drivers that support HW timestamping via > > `gettime64` and `gettimex64` APIs, and we couldn't identify any that > capture or report statistics related to reading a HW timestamp. > > Let us know if further details would be helpful. > > David, did you consider Rahul's recent timestamping stats API? > 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping > statistics") Hi Gal, We've looked into the `get_ts_stats` ethtool hook, and it refers to TX HW packet timestamping and not HW timestamp which is retrieved through `gettime64` and `gettimex64`.
On Tue, 5 Nov 2024 10:52:12 +0000 Arinzon, David wrote: > Just wanted to clarify that this feature and the associated > documentation are specifically intended for reading a HW timestamp, > not for TX/RX packet timestamping. Oh, so you're saying you can only read the clock from the device? The word timestamp means time associated with an event. In the doc you talk about: > +PHC support and capabilities can be verified using ethtool: > + > +.. code-block:: shell > + > + ethtool -T <interface> which is for packet timestamping also: > ENA Linux driver supports PTP hardware clock providing timestamp > reference to achieve nanosecond accuracy. You probably want to double check the definitions of accuracy and resolution. We recently merged an Amazon PTP clock driver from David Woodhouse, see commit 20503272422693. If you're not timestamping packets why not use that driver?
On Tue, 05 Nov, 2024 16:52:11 +0000 "Arinzon, David" <darinzon@amazon.com> wrote: >> >>> +================= >> >> ====================================================== >> >>> +**phc_cnt** Number of successful retrieved timestamps (below >> >> expire timeout). >> >>> +**phc_exp** Number of expired retrieved timestamps (above >> >> expire timeout). >> >>> +**phc_skp** Number of skipped get time attempts (during block >> >> period). >> >>> +**phc_err** Number of failed get time attempts (entering into >> block >> >> state). >> >>> +================= >> >> ====================================================== >> >> >> >> I seem to recall we had an unpleasant conversation about using >> >> standard stats recently. Please tell me where you looked to check if >> >> Linux has standard stats for packet timestamping. We need to add the >> right info there. >> >> -- >> >> pw-bot: cr >> > >> > Hi Jakub, >> > >> > Just wanted to clarify that this feature and the associated >> > documentation are specifically intended for reading a HW timestamp, not >> for TX/RX packet timestamping. >> > We reviewed similar drivers that support HW timestamping via >> > `gettime64` and `gettimex64` APIs, and we couldn't identify any that >> capture or report statistics related to reading a HW timestamp. >> > Let us know if further details would be helpful. >> >> David, did you consider Rahul's recent timestamping stats API? >> 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping >> statistics") > > Hi Gal, > > We've looked into the `get_ts_stats` ethtool hook, and it refers to TX HW packet timestamping > and not HW timestamp which is retrieved through `gettime64` and `gettimex64`. Hi folks, I think everyone might be on the same page now, but I wanted to provide some clarifications just in case. The use case that the TX HW timestamping statistics covers ┌─────────────────────────────────────┐ │ │ │ │ │ ┌──────┤ │ NIC hw │ │ Packets out │ │ PF ├────────────────────────────────────────────────► │ │ │ │ └───┬──┤ │ │ │ │ │ │ │ │ │ │ ┌────┘ │ │ │ │ └─────────────────────────────┼───────┘ │Hw timestamp information per packet ┌─────────────────────────────┼───────┐ │ │ │ │ ┌─────▼────┐ │ │ │ cmsg │ │ │ │ │ │ │ │ queue │ │ │ │ │ │ │ └──────────┘ │ │ │ │ │ │ Linux Kernel Stack │ │ │ │ │ │ │ └─────────────────────────────────────┘ We are collecting statistics on every packet being sent out the wire. The use case being described here ┌──────────────────────────────────┐ │ │ │ │ │ ┌───────┤ │ NIC hw │ │ │ │ PF │ │ ┌───────────────────┐ │ │ │ │ │ └───────┤ │ │ PHC │ │ │ │ │ │ │ │ │ │ │ │ (just a clock dev)│ │ │ └─────────┬─────────┘ │ │ │ │ └──────────────┼───────────────────┘ │ Query device's clock for current time ┌──────────────┼───────────────────┐ ┌─────────────────────────────────┐ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┼─────────────┐ │ │ │ │ │ │ │ │ │ ┌───────────────────────┐ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ───┼─────┼──────┼────┼──────────► │ │ │ │ │ │ │ │ │ │ │ │ .gettimex64 callback │ │ │ │ clock_gettime syscall│ │ │ └────────────────────────┘ │ │ └───────────────────────┘ │ │ │ │ │ │ Linux Kernel Stack │ │ Userspace │ │ │ │ │ └──────────────────────────────────┘ └─────────────────────────────────┘ The model above is about getting the time from the NIC hardware in the userspace application (which has not involvement with TX/RX traffic). I do think the phc-to-host related statistics are on the niche side of things. The following drivers are error free in their gettimex64 paths. * AMD pesando/ionic * Broadcom Tigon3 * Intel ixgbe * Intel igc * NVIDIA mlxsw * NVIDIA mlx5_core The above drivers would definitely not benefit from having "phc (nic)"-to-host related statistics being presented here. I am more in favor of making these statistics specific to amazon's ENA driver since I think most drivers do not have a complex . Also, what value is there in the count of phc-to-host successful/failed operations versus just keeping track of the errors in userspace for whoever is calling clock_gettime. I am somewhat ok with these counters, but I honestly cannot imagine any practical use to this especially since they are not related to anything over-the-wire. So the errors in userspace would be enough of an indicator of whether there is excessive utilization of the requests and the counters seem redundant to that (at least to me). Feel free to share how you feel these counters would be helpful beyond handling the return codes through clock_gettime. I might just be missing something. Hope this helps.
On Tue, 05 Nov 2024 18:02:57 -0800 Rahul Rameshbabu wrote: > I do think the phc-to-host related statistics are on the niche side of > things. The following drivers are error free in their gettimex64 paths. > > * AMD pesando/ionic > * Broadcom Tigon3 > * Intel ixgbe > * Intel igc > * NVIDIA mlxsw > * NVIDIA mlx5_core > > The above drivers would definitely not benefit from having "phc > (nic)"-to-host related statistics being presented here. I am more in > favor of making these statistics specific to amazon's ENA driver since I > think most drivers do not have a complex . Also, what value is there in > the count of phc-to-host successful/failed operations versus just > keeping track of the errors in userspace for whoever is calling > clock_gettime. I am somewhat ok with these counters, but I honestly > cannot imagine any practical use to this especially since they are not > related to anything over-the-wire. So the errors in userspace would be > enough of an indicator of whether there is excessive utilization of the > requests and the counters seem redundant to that (at least to me). Feel > free to share how you feel these counters would be helpful beyond > handling the return codes through clock_gettime. I might just be missing > something. Agreed, thanks a lot for the analysis. I misread the code. I looked at ena_com_phc_get() last night and incorrectly assumed it's way to complex to be called from gettimex64 :S
Thank you Rahul for the detailed explanations Hi Jakub, > > Just wanted to clarify that this feature and the associated > > documentation are specifically intended for reading a HW timestamp, > > not for TX/RX packet timestamping. > > Oh, so you're saying you can only read the clock from the device? > The word timestamp means time associated with an event. > Based on the documentation of gettimex64 API The ts parameter holds the PHC timestamp. We are using the same terminology https://elixir.bootlin.com/linux/v6.11.6/source/include/linux/ptp_clock_kernel.h#L97 * @gettimex64: Reads the current time from the hardware clock and optionally * also the system clock. * parameter ts: Holds the PHC timestamp. * parameter sts: If not NULL, it holds a pair of timestamps from * the system clock. The first reading is made right before * reading the lowest bits of the PHC timestamp and the second * reading immediately follows that. > In the doc you talk about: > > > +PHC support and capabilities can be verified using ethtool: > > + > > +.. code-block:: shell > > + > > + ethtool -T <interface> > > which is for packet timestamping > ethtool -T shows all timestamping capabilities, which indeed include packet timestamping but also the PTP Hardware Clock (PHC) index If the value is `none`, it means that there's no PHC support This is done by implementing the `get_ts_info` hook, which is part of this patchset. https://elixir.bootlin.com/linux/v6.11.6/source/include/linux/ethtool.h#L720 > also: > > > ENA Linux driver supports PTP hardware clock providing timestamp > > reference to achieve nanosecond accuracy. > > You probably want to double check the definitions of accuracy and > resolution. > Thank you, will be changed in the next patchset > We recently merged an Amazon PTP clock driver from David Woodhouse, > see commit 20503272422693. If you're not timestamping packets why not use > that driver? The AMZNC10C vmclock device driver is intended to be used in systems where there's an hypervisor. The PHC driver in this patchset is intended for virtual and non-virtual (metal) instances in AWS. The AMZNC10C might not be available in the future on the same instances where PHC is available.
diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index 4561e8ab..12665ea8 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -56,6 +56,7 @@ ena_netdev.[ch] Main Linux kernel driver. ena_ethtool.c ethtool callbacks. ena_xdp.[ch] XDP files ena_pci_id_tbl.h Supported device IDs. +ena_phc.[ch] PTP hardware clock infrastructure (see `PHC`_ for more info) ================= ====================================================== Management Interface: @@ -221,6 +222,83 @@ descriptor it was received on would be recycled. When a packet smaller than RX copybreak bytes is received, it is copied into a new memory buffer and the RX descriptor is returned to HW. +.. _`PHC`: + +PTP Hardware Clock (PHC) +======================== +.. _`ptp-userspace-api`: https://docs.kernel.org/driver-api/ptp.html#ptp-hardware-clock-user-space-api +.. _`testptp`: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/ptp/testptp.c + +ENA Linux driver supports PTP hardware clock providing timestamp reference to achieve nanosecond accuracy. + +**PHC support** + +PHC depends on the PTP module, which needs to be either loaded as a module or compiled into the kernel. + +Verify if the PTP module is present: + +.. code-block:: shell + + grep -w '^CONFIG_PTP_1588_CLOCK=[ym]' /boot/config-`uname -r` + +- If no output is provided, the ENA driver cannot be loaded with PHC support. + +- ``CONFIG_PTP_1588_CLOCK=y``: the PTP module is already compiled and loaded inside the kernel binary file. + +- ``CONFIG_PTP_1588_CLOCK=m``: the PTP module needs to be loaded prior to loading the ENA driver: + +Load PTP module: + +.. code-block:: shell + + sudo modprobe ptp + +All available PTP clock sources can be tracked here: + +.. code-block:: shell + + ls /sys/class/ptp + +PHC support and capabilities can be verified using ethtool: + +.. code-block:: shell + + ethtool -T <interface> + +**PHC timestamp** + +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_: + +.. code-block:: shell + + testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1 + +PHC get time requests should be within reasonable bounds, +avoid excessive utilization to ensure optimal performance and efficiency. +The ENA device restricts the frequency of PHC get time requests to a maximum +of 125 requests per second. If this limit is surpassed, the get time request +will fail, leading to an increment in the phc_err statistic. + +**PHC statistics** + +PHC can be monitored using :code:`ethtool -S` counters: + +================= ====================================================== +**phc_cnt** Number of successful retrieved timestamps (below expire timeout). +**phc_exp** Number of expired retrieved timestamps (above expire timeout). +**phc_skp** Number of skipped get time attempts (during block period). +**phc_err** Number of failed get time attempts (entering into block state). +================= ====================================================== + +PHC timeouts: + +================= ====================================================== +**expire** Max time for a valid timestamp retrieval, passing this threshold will fail + the get time request and block new requests until block timeout. +**block** Blocking period starts once get time request expires or fails, all get time + requests during block period will be skipped. +================= ====================================================== + Statistics ==========