Message ID | 20201019172435.4416-8-ceggers@arri.de (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dsa: microchip: PTP support for KSZ956x | expand |
On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote: > Add routines required for TX hardware time stamping. > > The KSZ9563 only supports one step time stamping > (HWTSTAMP_TX_ONESTEP_P2P), which requires linuxptp-2.0 or later. PTP > mode is permanently enabled (changes tail tag; depends on > CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP).TX time stamps are reported via an > interrupt / device registers whilst RX time stamps are reported via an > additional tail tag. > > One step TX time stamping of PDelay_Resp requires the RX time stamp from > the associated PDelay_Req message. linuxptp assumes that the RX time > stamp has already been subtracted from the PDelay_Req correction field > (as done by the TI PHYTER). linuxptp will echo back the value of the > correction field in the PDelay_Resp message. > > In order to be compatible to this already established interface, the > KSZ9563 code emulates this behavior. When processing the PDelay_Resp > message, the time stamp is moved back from the correction field to the > tail tag, as the hardware doesn't support negative values on this field. > Of course, the UDP checksums (if any) have to be corrected after this > (for both directions). > > The PTP hardware performs internal detection of PTP frames (likely > similar as ptp_classify_raw() and ptp_parse_header()). As these filters > cannot be disabled, the current delay mode (E2E/P2P) and the clock mode > (master/slave) must be configured via sysfs attributes. Time stamping > will only be performed on PTP packets matching the current mode > settings. > > Everything has been tested on a Microchip KSZ9563 switch. I looked a little bit at the KSZ9563 datasheet and I'm more confused than I was before opening it. -----------------------------[cut here]----------------------------- The device supports V2 (2008) of the IEEE 1588 PTP specification and can be programmed as either an end-to-end (E2E) or peer-to-peer (P2P) transparent clock (TC) between ports. In addition, the host port can be programmed as either a slave or master ordinary clock (OC) port. Ingress timestamp capture, egress timestamp recording, correction field update with residence time and link delay, delay turn-around time insertion, egress timestamp insertion, and checksum update are supported. -----------------------------[cut here]----------------------------- So it's a 1-step transparent clock, fair enough. That works autonomously without any sort of involvement from the operating system, you know that, right? This is stateless functionality. BUT, if that is the case, what do you need PTP support in the kernel for? What profiles are you using with linuxptp? What benefit does it bring you if you report timestamps to the operating system, for terminated 1588 traffic? Why would you even terminate 1588 traffic on the host CPU? I fail to understand many of the use cases that this switch is tailored for. Also, I know that Microchip support does a pretty bad job at giving useful answers, and the datasheet isn't quite clear either (looks like there's info that has been copied from other switches, like for 2-step timestamping, then removed, and too much was removed because now nothing is clear) so you'll have to give your best shot at explaining some things. Global PTP Message Config 1 Register ------------------------------------ Bit 2: Selection of P2P or E2E 1 = Peer-to-peer (P2P) transparent clock mode 0 = End-to-end (E2E) transparent clock mode What does this bit do exactly? Does it change the switch's behavior as an autonomous 1-step transparent clock? Or does it have anything to do with how/which timestamps are delivered to the CPU? The point is, why do you care to configure this? Sysfs is not going to fly without a solid explanation, which you did not provide here. My understanding of E2E vs P2P TC is that an E2E TC will correct the timestamps of Pdelay messages, while a P2P TC won't. The P2P TC must speak proper PDelay and not forward those packets sheepishly. Which starts to answer my question, I believe... So my comment above, that the 1-step TC functionality doesn't require any involvement from the CPU, is only correct for E2E TC, am I right? For P2P TC, you would need the host CPU to speak peer delay. But you wouldn't need it for anything else (the SYNC messages would have no reason to go to the CPU, would they?). So, again, what profile are you using with linuxptp for this one? If my understanding is right, maybe you want to just leave the switch operate in E2E TC mode by default, and put it into P2P TC as soon as your .port_hwtstamp_set() method is called? Ok, on to my next question.... Bit 1: Selection of Master or Slave 1 = Host port is PTP master ordinary clock 0 = Host port is PTP slave ordinary clock What does this _actually_ do? Here I really have no idea. I can only imagine that this has again to do with the 1-step TC operation, and that it's treating the host port as a switched endpoint, and this has to do with the port states of the P2P TC. I'm so confused by this one that I don't even know what to ask... Ok, let's put it differently. You bothered to add a sysfs for it, so you must be using it for something. What are you using it for?
On Tuesday, 20 October 2020, 02:10:40 CEST, Vladimir Oltean wrote: > I looked a little bit at the KSZ9563 datasheet and I'm more confused > than I was before opening it. > > -----------------------------[cut here]----------------------------- > The device supports V2 (2008) of the IEEE 1588 PTP specification and can > be programmed as either an end-to-end (E2E) or peer-to-peer (P2P) > transparent clock (TC) between ports. In addition, the host port can be > programmed as either a slave or master ordinary clock (OC) port. > Ingress timestamp capture, egress timestamp recording, correction field > update with residence time and link delay, delay turn-around time > insertion, egress timestamp insertion, and checksum update are > supported. > -----------------------------[cut here]----------------------------- > > So it's a 1-step transparent clock, fair enough. That works autonomously > without any sort of involvement from the operating system, you know > that, right? This is stateless functionality. The device performs "one-step time stamping". This means that the correction field in Sync and PDelay_Resp messages is updated automatically. Not more. For operation as a switch, this would be probably enough. But I need also to set the switch' local PTP clock in order to generate a synchronized output signal for local use. > BUT, if that is the case, what do you need PTP support in the kernel > for? The device cannot autonomously set it's PTP clock. This requires calculating the link delay between the own port and the master/peer. It looks like the KSZ9563 cannot perform this calculation itself. > What profiles are you using with linuxptp? My config is at the bottom. > What benefit does it > bring you if you report timestamps to the operating system, for > terminated 1588 traffic? Why would you even terminate 1588 traffic on > the host CPU? I fail to understand many of the use cases that this > switch is tailored for. In my opinion, the switch doesn't terminate 1588 traffic. It only generates hardware time stamps and performs update of the correction field on selected messages. > Also, I know that Microchip support does a pretty bad job at giving > useful answers, and the datasheet isn't quite clear either (looks like > there's info that has been copied from other switches, like for 2-step > timestamping, then removed, and too much was removed because now nothing > is clear) so you'll have to give your best shot at explaining some > things. It looks like it was planned to support also 2-step time stamping. You can see an errata in revision B of the errata sheet [1]. Revision C dropped this again. > Global PTP Message Config 1 Register > ------------------------------------ > > Bit 2: Selection of P2P or E2E > 1 = Peer-to-peer (P2P) transparent clock mode > 0 = End-to-end (E2E) transparent clock mode > > What does this bit do exactly? > Does it change the switch's behavior as an autonomous 1-step transparent > clock? Or does it have anything to do with how/which timestamps are > delivered to the CPU? The point is, why do you care to configure this? > Sysfs is not going to fly without a solid explanation, which you did not > provide here. In short: Clock synchronization (using linuxptp) doesn't work if this is not configured in sync with ptp4l.conf. Either no time stamping is performed or PTP messages are filtered out. If required, I can investigate this further. > My understanding of E2E vs P2P TC is that an E2E TC will correct the > timestamps of Pdelay messages, while a P2P TC won't Yes, I think so. But I am no expert for PTP. > The P2P TC must speak proper PDelay and not forward those packets > sheepishly. Which starts to answer my question, I believe... So my comment > above, that the 1-step TC functionality doesn't require any involvement from > the CPU, is only correct for E2E TC, am I right? For P2P TC, you would need > the host CPU to speak peer delay. The host CPU is not required for updating forwarded (switched) PTP messages. But it is required RX/TX time stamping of own PTP messages. All local PTP operations can be done on the switch, there's no work left for the MAC connected to the host port. > But you wouldn't need it for anything else (the > SYNC messages would have no reason to go to the CPU, would they?). The (time stamped) SYNC messages are required in order to calculate the offset from the master clock. > So, again, what profile are you using with linuxptp for this one? only a few lines left... > If my understanding is right, maybe you want to just leave the switch > operate in E2E TC mode by default, and put it into P2P TC as soon as > your .port_hwtstamp_set() method is called? Currently the TC mode has to be set manually. So the automatic mode of ptp4l cannot be used. Probably this could be automated depending on received PTP messages. But probably we cannot see these messages because these are filtered out. > Ok, on to my next question.... > > Bit 1: Selection of Master or Slave > 1 = Host port is PTP master ordinary clock > 0 = Host port is PTP slave ordinary clock > > What does this _actually_ do? Here I really have no idea. I can only > imagine that this has again to do with the 1-step TC operation, and that > it's treating the host port as a switched endpoint, and this has to do > with the port states of the P2P TC. I'm so confused by this one that I > don't even know what to ask... As for the TC mode questions, also this one could be best answered by Microchip. If not set correctly, clock synchronization will not work. I need to investigate what happens here concretely. Again I suppose that time stamping of the associated messages will not be performed or some messages may be filtered out. > Ok, let's put it differently. You bothered to add a sysfs for it, so you > must be using it for something. What are you using it for? If not set correctly, nothing will work. This has the major drawback, that ptp4l cannot determine the clock mode nor the delay measurement mode automatically. I will try find some answers for this. Best regards Christian [global] twoStepFlag 0 time_stamping p2p1step #slaveOnly 1 delay_mechanism P2P #delay_mechanism E2E delay_filter_length 100 tx_timestamp_timeout 100 #network_transport UDPv6 #network_transport L2 [1] http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ9563R-Errata-80000786B.pdf
On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote: > Add routines required for TX hardware time stamping. > > The KSZ9563 only supports one step time stamping > (HWTSTAMP_TX_ONESTEP_P2P), which requires linuxptp-2.0 or later. PTP > mode is permanently enabled (changes tail tag; depends on > CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP).TX time stamps are reported via an > interrupt / device registers whilst RX time stamps are reported via an > additional tail tag. > > One step TX time stamping of PDelay_Resp requires the RX time stamp from > the associated PDelay_Req message. linuxptp assumes that the RX time > stamp has already been subtracted from the PDelay_Req correction field > (as done by the TI PHYTER). linuxptp will echo back the value of the > correction field in the PDelay_Resp message. > > In order to be compatible to this already established interface, the > KSZ9563 code emulates this behavior. When processing the PDelay_Resp > message, the time stamp is moved back from the correction field to the > tail tag, as the hardware doesn't support negative values on this field. > Of course, the UDP checksums (if any) have to be corrected after this > (for both directions). > > The PTP hardware performs internal detection of PTP frames (likely > similar as ptp_classify_raw() and ptp_parse_header()). As these filters > cannot be disabled, the current delay mode (E2E/P2P) and the clock mode > (master/slave) must be configured via sysfs attributes. Time stamping > will only be performed on PTP packets matching the current mode > settings. > > Everything has been tested on a Microchip KSZ9563 switch. > > Signed-off-by: Christian Eggers <ceggers@arri.de> > --- Hi Richard! Looks like we've been discussing without you here. I had an off-list discussion with Christian and we just realized that. The topic of this email is that we've got new P2P one-step hardware coming in, and it's not quite API-compatible with what's there already (you might not actually be surprised by that). Since you weren't copied to Christian's patches, here's a patchwork link for you: https://patchwork.ozlabs.org/project/netdev/cover/20201019172435.4416-1-ceggers@arri.de/ Short introduction by quoting from the IEEE 1588 standard, just to get everybody on the same page. ----------------------------------[cut here]----------------------------------- 11.4.3 Peer delay mechanism operational specifications ------------------------------------------------------ The actual value of the meanPathDelay shall be measured and computed as follows for each instance of a peer delay request-response measurement: (a) The delay requestor, Node-A: (1) Prepares a Pdelay_Req message. The correctionField (see 13.3.2.7) shall be set to 0. (2) If asymmetry corrections are required, shall modify the correctionField per 11.6.4. (3) Shall set the originTimestamp to 0 or an estimate no worse than +/-1s of the egress timestamp, t1, of the Pdelay_Req message. (4) Shall send the Pdelay_Req message and generate and save timestamp t1. (b) If the delay responder, Node-B, is a one-step clock, it shall: (1) Generate timestamp t2 upon receipt of the Pdelay_Req message (2) Prepare a Pdelay_Resp message (3) Copy the sequenceId field from the Pdelay_Req message to the sequenceId field of the Pdelay_Resp message (4) Copy the sourcePortIdentity field from the Pdelay_Req message to the requestingPortIdentity field of the Pdelay_Resp message (5) Copy the domainNumber field from the Pdelay_Req message to the domainNumber field of the Pdelay_Resp message (6) Copy the correctionField from the Pdelay_Req message to the correctionField of the Pdelay_Resp message (7) Then: (i) Set to 0 the requestReceiptTimestamp field of the Pdelay_Resp message (ii) Issue the Pdelay_Resp message and generate timestamp t3 upon sending (iii) After t3 is generated but while the Pdelay_Resp message is leaving the responder, shall add the turnaround time t3 - t2 to the correctionField of the Pdelay_Resp message and make any needed corrections to checksums or other content-dependent fields of the Pdelay_Resp message (c) If the delay responder is a two-step clock, it shall: [skipping because in our case, the delay responder is a one-step clock] (d) The delay requestor, Node-A, shall: (1) Generate timestamp t 4 upon receipt of the Pdelay_Resp message (2) If asymmetry corrections are required, modify the correctionField of the Pdelay_Resp message per 11.6.5 (3) If the twoStepFlag of the received Pdelay_Resp message is FALSE, indicating that no Pdelay_Resp_Follow_Up message will be received, compute the <meanPathDelay> as: <meanPathDelay> = [(t4 - t1) - correctionField of Pdelay_Resp] / 2 ----------------------------------[cut here]----------------------------------- Simply put: | | t1 |------\ Pdelay_Req | | ------\ | | ----->| t2 Clock A | | Clock B | Pdelay_Resp /------| t3 | ------ | t4 |<-----/ | | | (t4 - t1) - (t3 - t2) meanPathDelay = --------------------- 2 where t3 - t2 (the "turnaround" time) is contained inside the correctionField of the Pdelay_Resp. To reiterate, t3 is the RX timestamp of the Pdelay_Req, and t4 is the TX timestamp of the Pdelay_Resp, both taken at the delay responder (Clock B). That was about it for the standard. The hardware that Christian is configuring (consider operation as Clock B) works this way: (a) the ingress port takes the t2 RX timestamp of the Pdelay_Req normally (b) on transmission of the Pdelay_Resp, the kernel must provide t2 as metadata together with the Pdelay_Resp frame itself (it is put in the "TX timestamp" field of the DSA tag) (c) the egress port takes the t3 TX timestamp and rewrites the correctionField of the PTP header with the (t3 - t2) value Straightforward, one might say? Except for the fact that this is not how the API for peer-to-peer one-step timestamping currently works. We were expecting that for this switch to have a streamlined implementation for P2P 1-step, the kernel would have an API to inject a Pdelay_Resp packet into the stack _along_ with t2. But it doesn't. So how _does_ that work for TI PHYTER? As far as we understand, the PHYTER appears to autonomously mangle PTP packets in the following way: - subtracting t2 on RX from the correctionField of the Pdelay_Req - adding t3 on TX to the correctionField of the Pdelay_Resp All that linuxptp has to do is connect the wires. That's why in process_pdelay_req() from port.c, there is: if (p->timestamping == TS_P2P1STEP) { rsp->header.correction = m->header.correction; Otherwise said, the t2 is not provided to the kernel explicitly, but simply by copying the correctionField from the Pdelay_Req into the Pdelay_Resp. This correctionField is the play ground of the PHYTER, so by having linuxptp copy that field, it simply has a path to finish off what it started. We believe that by the time the Pdelay_Req message arrives at the host, the correctionField will be negative, and it will only become back positive after transmitting the Pdelay_Resp. This is also something that might be problematic for other devices. Nevertheless, I'm not here to say that it's wrong to do what we are currently doing for PHYTER. I mean, paragraph 11.4.3.(b).(6) clearly states that the correctionField should be transferred anyway from the Pdelay_Req to the Pdelay_Resp. I am not entirely sure of the reasons why it stipulates this, if it's just to allow hardware like PHYTER to operate within spec, or if it's for the delay requestor to be aware of the correction that was applied to his frame. Regardless of which way it is, we expect that any 1588 stack will have to do that, so this is not implementation-defined behavior. What I'm here to say is that this is not how Christian's hardware works, and he is simply emulating the kernel interface used by the PHYTER. He needs to subtract t2 on RX from the correctionField manually in the tagger code, then have extra code on TX to copy t2 from the correctionField into the DSA tag's TX timestamp field. This complicates the code by quite a significant margin, and my concern is that nobody is going to follow what's going on while maintaining it, and why it is done in this backwards way. We would like to know your opinion on whether we couldn't, in fact, have a more straightforward UAPI for hardware like his. For example, expand the sendmsg() API with a new cmsg that contains t2 specifically for P2P one-step timestamping. Or anything else. But "straightforward" should be the key. Thanks, -Vladimir
I'm just catching up with this. Really. Truly. Please -- Include the maintainer on CC for such patches! In case you don't know who that is, you can always consult the MAINTAINERS file. There you will find the following entry. PTP HARDWARE CLOCK SUPPORT M: Richard Cochran <richardcochran@gmail.com> L: netdev@vger.kernel.org S: Maintained W: http://linuxptp.sourceforge.net/ On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote: > > The PTP hardware performs internal detection of PTP frames (likely > > similar as ptp_classify_raw() and ptp_parse_header()). As these filters > > cannot be disabled, the current delay mode (E2E/P2P) and the clock mode > > (master/slave) must be configured via sysfs attributes. This is a complete no-go. NAK. Richard
On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > So how _does_ that work for TI PHYTER? > > As far as we understand, the PHYTER appears to autonomously mangle PTP packets > in the following way: > - subtracting t2 on RX from the correctionField of the Pdelay_Req > - adding t3 on TX to the correctionField of the Pdelay_Resp The Phyter does not support peer-to-peer one step. The only driver that implements it is ptp_ines.c. And *that* driver/HW implements it correctly. Thanks, Richard
Hi Richard, On Thursday, 22 October 2020, 04:42:01 CEST, Richard Cochran wrote: > I'm just catching up with this. > > Really. Truly. Please -- Include the maintainer on CC for such patches! sorry for missing you on the recipients list. I blindly trusted the output of get_maintainer.pl. I recently sent two other patches which may also be of interest for you. They related to handling of SO_TIMESTAMPING on 32 bit platforms with newer C libraries: https://patchwork.ozlabs.org/project/netdev/patch/20201012093542.15504-1-ceggers@arri.de/ https://patchwork.ozlabs.org/project/netdev/patch/20201012093542.15504-2-ceggers@arri.de/ > On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > > On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote: > > > The PTP hardware performs internal detection of PTP frames (likely > > > similar as ptp_classify_raw() and ptp_parse_header()). As these filters > > > cannot be disabled, the current delay mode (E2E/P2P) and the clock mode > > > (master/slave) must be configured via sysfs attributes. > > This is a complete no-go. NAK. I didn't design the hardware nor do I have access to adequate documentation. I will try to figure out what functionality is concretely affected by these two settings. If I am correct, the KSZ hardware consists of two main building blocks: 1. A TC on the switch path. 2. An OC on the DSA host port. From the data sheet, page 109, chapter 5.1.6.11 ("Global PTP Message Config 1 Register"), bit 2: > *Selection of P2P or E2E* > 1 = Peer-to-peer (P2P) transparent clock mode > 0 = End-to-end (E2E) transparent clock mode So this "tcmode" sysfs entry controls the behavior of the switch' transparent clock. Is this related in any way to the PTP API? For the master/slave setting, the data sheet writes the following: *Selection of Master or Slave* 1 = Host port is PTP master ordinary clock 0 = Host port is PTP slave ordinary clock So this is really related to the OC and so to the PTP API. Setting this manually would interfere with the BMCA. I'll check whether delay measurement and clock synchronization can also work for all conditions (E2E/P2P, master/slave) without altering this value. Otherwise we might consider the KSZ as a "Slave Only Clock (SO)". Christian
On Wed, Oct 21, 2020 at 08:02:17PM -0700, Richard Cochran wrote: > On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > > So how _does_ that work for TI PHYTER? > > > > As far as we understand, the PHYTER appears to autonomously mangle PTP packets > > in the following way: > > - subtracting t2 on RX from the correctionField of the Pdelay_Req > > - adding t3 on TX to the correctionField of the Pdelay_Resp > > The Phyter does not support peer-to-peer one step. Ok, that's my mistake for not double-checking, sorry. > The only driver that implements it is ptp_ines.c. > > And *that* driver/HW implements it correctly. Is there documentation available for this timestamping block? I might be missing some data, as the kernel driver is fairly pass-through for the TX timestamping of the Pdelay_Resp, so the hardware might just 'do the right thing'. I believe the answer lies within the timestamper's per-port RX FIFO. Just guessing here, but I suspect that the RX FIFO doesn't get cleared immediately after the host queries it, and the hardware caches the last 100 events from the pool and uses the RX timestamps of Pdelay_Req as t2 in the process of updating the correctionField of Pdelay_Resp on TX. Would that be correct?
On Thursday, 22 October 2020, 09:30:57 CEST, Christian Eggers wrote: > On Thursday, 22 October 2020, 04:42:01 CEST, Richard Cochran wrote: > > On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > > > On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote: > > > > The PTP hardware performs internal detection of PTP frames (likely > > > > similar as ptp_classify_raw() and ptp_parse_header()). As these > > > > filters > > > > cannot be disabled, the current delay mode (E2E/P2P) and the clock > > > > mode > > > > (master/slave) must be configured via sysfs attributes. > > > > This is a complete no-go. NAK. > > I didn't design the hardware nor do I have access to adequate documentation. > I will try to figure out what functionality is concretely affected by these > two settings. I tried to study the effect of setting the ocmode bit on the KSZ either to master or to slave. The main visible change is, that some PTP message types are be filtered out on RX: - in "master" mode, "Sync" messages from other nodes will not be received (but everything else like "Announce" seem to work) - in "slave" mode, "Delay_Req" messages from other nodes will not be received I am not an expert for PTP, so the following is only the idea of a beginner how this could probably be handled: As PTP announce messages are received all the time, the BMCA should always be able to work. The KSZ hardware needs to be set to "master" when a node is becoming master (in order to be able to receive (and answer) Delay_Req messages). The setting "slave" is equired when the BCMA decides not being master anymore (in order to receive Sync messages). Handling the transition to "master" mode could probably be done easily in the driver (when a Sync message is seen in TX direction by the time stamping code). But transition to slave seems to be difficult, because the tagging driver cannot see when the node stops being master. For user space (ptp4l), the decision for master/slave mode could probably be done easier. If Richard (or somebody else) decides that "mode switching" of the KSZ device would not be appropriate, I suspect the functionality of the KSZ has to be limited to "Slave Only Clock". regards Christian
On Thu, Oct 22, 2020 at 12:01:26PM +0300, Vladimir Oltean wrote: > On Wed, Oct 21, 2020 at 08:02:17PM -0700, Richard Cochran wrote: > > On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote: > > > So how _does_ that work for TI PHYTER? > > > > > > As far as we understand, the PHYTER appears to autonomously mangle PTP packets > > > in the following way: > > > - subtracting t2 on RX from the correctionField of the Pdelay_Req > > > - adding t3 on TX to the correctionField of the Pdelay_Resp > > > > The Phyter does not support peer-to-peer one step. > > Ok, that's my mistake for not double-checking, sorry. > > > The only driver that implements it is ptp_ines.c. > > > > And *that* driver/HW implements it correctly. > > Is there documentation available for this timestamping block? I might be > missing some data, as the kernel driver is fairly pass-through for the > TX timestamping of the Pdelay_Resp, so the hardware might just 'do the > right thing'. I believe the answer lies within the timestamper's > per-port RX FIFO. Just guessing here, but I suspect that the RX FIFO > doesn't get cleared immediately after the host queries it, and the > hardware caches the last 100 events from the pool and uses the RX > timestamps of Pdelay_Req as t2 in the process of updating the > correctionField of Pdelay_Resp on TX. Would that be correct? Ah, no, I think I'm wrong. As per your own description here: https://lwn.net/Articles/807970/ If the hardware supports p2p one-step, it subtracts the ingress time stamp value from the Pdelay_Request correction field. The user space software stack then simply copies the correction field into the Pdelay_Response, and on transmission the hardware adds the egress time stamp into the correction field. So we were correct about the behavior, just not about the target hardware. So, that's just not how this hardware works. What do you recommend? Keeping a FIFO of Pdelay_Req RX timestamps, and matching them to Pdelay_Resp messages on TX, all of that contained within tag_ksz.c?
Hi Vladimir, On Thursday, 22 October 2020, 12:50:14 CEST, Vladimir Oltean wrote: > If the hardware supports p2p one-step, it subtracts the ingress time > stamp value from the Pdelay_Request correction field. The user space > software stack then simply copies the correction field into the > Pdelay_Response, and on transmission the hardware adds the egress time > stamp into the correction field. > > So we were correct about the behavior, just not about the target > hardware. > > So, that's just not how this hardware works. What do you recommend? > Keeping a FIFO of Pdelay_Req RX timestamps, and matching them to > Pdelay_Resp messages on TX, all of that contained within tag_ksz.c? after applying the RX timestamp correctly to the correction field (shifting the nanoseconds by 16), it seems that "moving" the timestamp back to the tail tag on TX is not required anymore. Keeping the RX timestamp simply in the correction field (negative value), works fine now. So this halves the effort in the tag_ksz driver. Best regards Christian
On Thu, Oct 22, 2020 at 01:11:40PM +0200, Christian Eggers wrote: > Hi Vladimir, > > On Thursday, 22 October 2020, 12:50:14 CEST, Vladimir Oltean wrote: > after applying the RX timestamp correctly to the correction field (shifting > the nanoseconds by 16), That modification should have been done anyway, since the unit of measurement for correctionField is scaled ppb (48 bits nanoseconds, 16 bits scaled nanoseconds), and not nanoseconds. > it seems that "moving" the timestamp back to the tail tag on TX is not > required anymore. Keeping the RX timestamp simply in the correction > field (negative value), works fine now. So this halves the effort in > the tag_ksz driver. Ok, this makes sense. Depending on what Richard responds, it now looks like the cleanest approach would be to move your implementation that is currently in ksz9477_update_ptp_correction_field() into a generic function called static inline void ptp_onestep_p2p_move_t2_to_correction(struct sk_buff *skb, unsigned int ptp_type, struct ptp_header *ptp_header, ktime_t t2) You should then clearly document that this function is needed for hardware capable of one-step P2P that does not already modify the correctionField of Pdelay_Req event messages on ingress.
On Thu, Oct 22, 2020 at 02:32:43PM +0300, Vladimir Oltean wrote: > On Thu, Oct 22, 2020 at 01:11:40PM +0200, Christian Eggers wrote: > > > it seems that "moving" the timestamp back to the tail tag on TX is not > > required anymore. Keeping the RX timestamp simply in the correction > > field (negative value), works fine now. So this halves the effort in > > the tag_ksz driver. > > Ok, this makes sense. > Depending on what Richard responds, it now looks like the cleanest > approach would be to move your implementation that is currently in > ksz9477_update_ptp_correction_field() into a generic function called +1
On Thu, Oct 22, 2020 at 12:17:48PM +0200, Christian Eggers wrote: > I tried to study the effect of setting the ocmode bit on the KSZ either to > master or to slave. The main visible change is, that some PTP message types > are be filtered out on RX: > - in "master" mode, "Sync" messages from other nodes will not be received > (but everything else like "Announce" seem to work) > - in "slave" mode, "Delay_Req" messages from other nodes will not be received Could you dump the contents of your REG_PTP_MSG_CONF2 register?
Hi Vladimir, On Friday, 30 October 2020, 19:24:47 CET, Vladimir Oltean wrote: > On Thu, Oct 22, 2020 at 12:17:48PM +0200, Christian Eggers wrote: > > I tried to study the effect of setting the ocmode bit on the KSZ either to > > master or to slave. The main visible change is, that some PTP message > > types > > are be filtered out on RX: > > - in "master" mode, "Sync" messages from other nodes will not be received > > (but everything else like "Announce" seem to work) > > - in "slave" mode, "Delay_Req" messages from other nodes will not be > > received > Could you dump the contents of your REG_PTP_MSG_CONF2 register? runtime register value is 0x1004 (matches default value from the data sheet). The Linux driver doesn't touch this register. Below is a dump of all PTP related (global) registers. regards Christian KSZ9563 (Ethernet switch) Global IEEE 1588 PTP CLKCTRL 0002 SWFA enabled CLKSADJ NOP PTPSD subtract CLKREAD NOP CLKWRITE NOP CLKCADJ disabled EN enabled RESET normal RTCCP 0002 PHASE 16ns RTCNS 17D72FF0 NANOSECONDS 399978480 RTCS 00000023 SECONDS 35 RTCSUBNS 00000000 RATEDIR subtract TEMPADJ permanent SUBNS 0 RTCTMPADJ 00000000 CYCLES 0 MSGCFG1 007D MODEEN enabled IEEE802.3 enabled UDPv4 enabled UDPv6 enabled TCMODE P2P OCMODE slave MSGCFG2 1004 UNICASTEN both ALTMASTER disabled PRIOTX event only CHKSYNFU disabled CHKDLY disabled CHKPDLY disabled DROP disabled CHKDOM disabled IPv4CHKSUM calc DOMVER 0200 VERSION 2 DOMAIN 0 UNITIDX 00000000 GPIO_IDX GPIO_1 TS_IDX Unit 0 TRIG_IDX Unit 0
On Sun, Nov 01, 2020 at 10:35:01AM +0100, Christian Eggers wrote: > Hi Vladimir, > > On Friday, 30 October 2020, 19:24:47 CET, Vladimir Oltean wrote: > > On Thu, Oct 22, 2020 at 12:17:48PM +0200, Christian Eggers wrote: > > > I tried to study the effect of setting the ocmode bit on the KSZ either to > > > master or to slave. The main visible change is, that some PTP message > > > types > > > are be filtered out on RX: > > > - in "master" mode, "Sync" messages from other nodes will not be received > > > (but everything else like "Announce" seem to work) > > > - in "slave" mode, "Delay_Req" messages from other nodes will not be > > > received > > Could you dump the contents of your REG_PTP_MSG_CONF2 register? > runtime register value is 0x1004 (matches default value from the data sheet). > The Linux driver doesn't touch this register. Below is a dump of all PTP > related (global) registers. So the bit 5 ("Enable Dropping of Sync/Follow_Up and Delay_Req PTP Messages") is not set. When the PTP messages are dropped, do you know which error counter in ethtool -S is increasing?
On Sunday, 1 November 2020, 12:10:08 CET, Vladimir Oltean wrote: > On Sun, Nov 01, 2020 at 10:35:01AM +0100, Christian Eggers wrote: > > Hi Vladimir, > > > > On Friday, 30 October 2020, 19:24:47 CET, Vladimir Oltean wrote: > > > On Thu, Oct 22, 2020 at 12:17:48PM +0200, Christian Eggers wrote: > > > > I tried to study the effect of setting the ocmode bit on the KSZ > > > > either to > > > > master or to slave. The main visible change is, that some PTP message > > > > types > > > > are be filtered out on RX: > > > > - in "master" mode, "Sync" messages from other nodes will not be > > > > received > > > > (but everything else like "Announce" seem to work) > > > > - in "slave" mode, "Delay_Req" messages from other nodes will not be > > > > received > > > > > > Could you dump the contents of your REG_PTP_MSG_CONF2 register? > > > > runtime register value is 0x1004 (matches default value from the data > > sheet). The Linux driver doesn't touch this register. Below is a dump of > > all PTP related (global) registers. > > So the bit 5 ("Enable Dropping of Sync/Follow_Up and Delay_Req PTP > Messages") is not set. When the PTP messages are dropped, do you know > which error counter in ethtool -S is increasing? I am not sure whether I understand the question. My assumption is that the KSZ9563 simply doesn't forward specific PTP packages from the slave ports to the CPU port. In my imagination this happens in hardware and is not visible in software. I have run "ethtool -S" two times on the PTP master clock (E2E mode): - When "ocmode" is set to master (DelayReq messages can be received) - When "ocmode" is set to slave (DelayReq messages cannot be received) Here is the diff output: --- /home/root/ethtool1 +++ /home/root/ethtool2 @@ -1,8 +1,8 @@ NIC statistics: - tx_packets: 1421 - tx_bytes: 133641 - rx_packets: 488 - rx_bytes: 35904 + tx_packets: 1455 + tx_bytes: 136783 + rx_packets: 496 + rx_bytes: 36459 rx_hi: 0 rx_undersize: 0 rx_fragments: 0 @@ -14,10 +14,10 @@ rx_mac_ctrl: 0 rx_pause: 0 rx_bcast: 4 - rx_mcast: 667 + rx_mcast: 683 rx_ucast: 0 rx_64_or_less: 0 - rx_65_127: 659 + rx_65_127: 675 rx_128_255: 11 rx_256_511: 1 rx_512_1023: 0 @@ -27,15 +27,15 @@ tx_hi: 0 tx_late_col: 0 tx_pause: 0 - tx_bcast: 713 - tx_mcast: 1852 - tx_ucast: 324 + tx_bcast: 733 + tx_mcast: 1897 + tx_ucast: 333 tx_deferred: 0 tx_total_col: 0 tx_exc_col: 0 tx_single_col: 0 tx_mult_col: 0 - rx_total: 61158 - tx_total: 307225 + rx_total: 62577 + tx_total: 313773 rx_discards: 20 tx_discards: 0 regards Christian
On Sun, Nov 01, 2020 at 11:14:24PM +0100, Christian Eggers wrote: > I am not sure whether I understand the question. When the port is in standalone mode, there is no other destination for packets except for the CPU. So if the switch filters out Sync or Delay_Req packets depending on the operating mode, then naturally those packets would get dropped somewhere, and I wanted to see where. > I have run "ethtool -S" two times on the PTP master clock (E2E mode): > - When "ocmode" is set to master (DelayReq messages can be received) > - When "ocmode" is set to slave (DelayReq messages cannot be received) > > Here is the diff output: > --- /home/root/ethtool1 > +++ /home/root/ethtool2 > @@ -1,8 +1,8 @@ > NIC statistics: > - tx_packets: 1421 > - tx_bytes: 133641 > - rx_packets: 488 > - rx_bytes: 35904 > + tx_packets: 1455 > + tx_bytes: 136783 > + rx_packets: 496 > + rx_bytes: 36459 > rx_hi: 0 > rx_undersize: 0 > rx_fragments: 0 > @@ -14,10 +14,10 @@ > rx_mac_ctrl: 0 > rx_pause: 0 > rx_bcast: 4 > - rx_mcast: 667 > + rx_mcast: 683 > rx_ucast: 0 > rx_64_or_less: 0 > - rx_65_127: 659 > + rx_65_127: 675 > rx_128_255: 11 > rx_256_511: 1 > rx_512_1023: 0 > @@ -27,15 +27,15 @@ > tx_hi: 0 > tx_late_col: 0 > tx_pause: 0 > - tx_bcast: 713 > - tx_mcast: 1852 > - tx_ucast: 324 > + tx_bcast: 733 > + tx_mcast: 1897 > + tx_ucast: 333 > tx_deferred: 0 > tx_total_col: 0 > tx_exc_col: 0 > tx_single_col: 0 > tx_mult_col: 0 > - rx_total: 61158 > - tx_total: 307225 > + rx_total: 62577 > + tx_total: 313773 > rx_discards: 20 > tx_discards: 0 These counters seem to be on the front-panel port, and the discard seems to not be recorded there. But nonetheless, even if you do see the drops in the ethtool statistics for the CPU port, it's probably going to be at 'tx_discards' which is pretty unspecific and not useful. I was thinking that the drop reason would be more elaborate, which it looks like it isn't. > My assumption is that the KSZ9563 simply doesn't forward specific PTP > packages from the slave ports to the CPU port. In my imagination this > happens in hardware and is not visible in software. You talked about tracking the BMCA by snooping Announce messages. I don't think that is going to be the path forward either. Think about the general case, where there might not even be a BMCA (like in the automotive profile). It almost seems to me as if the hardware is trying to be helpful by dropping the PTP messages that the application layer would drop anyway. Too bad that nobody seems to have told them to make this helpful mechanism optional, because as it is, it's standing in the way more than helping. You know what the real problem is, with DSA you don't have the concept of the host port being an Ordinary Clock. DSA does not treat the host port as a switched endpoint (aka a plain net device attached to a dumb switch, and unaware of that switch), but instead is the conduit interface for each front-panel switch interface, which is an individually addressable network interface in its own right. You are not supposed to use a DSA master interface for networking directly, not for regular networking and not for PTP. In fact, DSA-enabled ports, only the PTP clock of the switch is usable. If you attempt to run ptp4l on the master interface an error will be thrown back at you. Why am I mentioning this? Because the setting that's causing trouble for us is 'port state of the host port OC', which in the context of what I said above is nonsense. There _is_ no host port OC. There are 2 switch ports which can act as individual OCs, or as a BC, or as a TC. But consider the case when the switch is a BC, with one of the front-panel ports being a master and the other a slave. What mode are you supposed to put the host port in, so that it receives both the Sync packets from the slave port, and the Delay_Req packets from the master port?! It just makes no sense to me. In principle I don't see any reason why this switch would not be able to operate as a one-step peer delay BC. Unless somebody from Microchip could shed some light on the design decisions of this switch (and there are enough Microchip people copied already), here's what I would do. I would cut my losses and just support peer delay, no E2E delay request-response mechanism (this means you'll need to deploy peer delay to all devices within your network, but the gains might be worth it). Because peer delay is symmetrical (both link partners are both requestors as well as responders), there's no help in the world that this switch could volunteer to give you in dropping packets on your behalf. So I expect that if you hardcode: - the port state for the host port OC as slave, then you'd get the Sync messages from all ports, and the Delay_Req messages would be dropped but you wouldn't care about those anyway, and - the selection of TC mode to P2P TC. Then I would negotiate with Richard whether it's ok to add these extra values to enum hwtstamp_rx_filters: HWTSTAMP_FILTER_PTP_V2_PDELAY HWTSTAMP_FILTER_PTP_V2_L4_PDELAY Given the fact that you're only limiting the timestamping to Pdelay because we don't fully understand the hardware, I don't really know whether introducing UAPI for this one situation is justifiable. If not, then your driver will not have a chance to reject ptp4l -E, and it will Simply Not Work.
On Mon, Nov 02, 2020 at 01:41:49AM +0200, Vladimir Oltean wrote: > In principle I don't see any reason why this switch would not be able > to operate as a one-step peer delay BC. What I meant to say was "one-step E2E BC", since I was talking about having to receive both Sync and Delay_Req at the same time, of course.
On Monday, 2 November 2020, 00:41:49 CET, Vladimir Oltean wrote: > On Sun, Nov 01, 2020 at 11:14:24PM +0100, Christian Eggers wrote: > > My assumption is that the KSZ9563 simply doesn't forward specific PTP > > packages from the slave ports to the CPU port. In my imagination this > > happens in hardware and is not visible in software. > > You talked about tracking the BMCA by snooping Announce messages. I > don't think that is going to be the path forward either. Think about the > general case, where there might not even be a BMCA (like in the > automotive profile). Maybe my mail from October, 22 was ambiguous. I meant that despite of the presence of filtering, a BCMA algorithm should be about to work (as no Announce messages are filtered out). Additionally I said, that switching between "master" and "slave" mode could not be done automatically by the driver, as the driver could at most detect the presence of Sync messages (indication for master mode), but would do hard to detect a transition to slave mode. I see a chance that user space (ptp4l) could configure the appropriate "hardware filter setup" for master/slave mode. > It almost seems to me as if the hardware is trying to be helpful by > dropping the PTP messages that the application layer would drop anyway. > Too bad that nobody seems to have told them to make this helpful > mechanism optional, because as it is, it's standing in the way more than > helping. I think the same. Maybe there is some undocumented "filter disable" bit, but this information must come from Microchip. > You know what the real problem is, with DSA you don't have the concept > of the host port being an Ordinary Clock. DSA does not treat the host > port as a switched endpoint (aka a plain net device attached to a dumb > switch, and unaware of that switch), but instead is the conduit interface > for each front-panel switch interface, which is an individually > addressable network interface in its own right. You are not supposed to > use a DSA master interface for networking directly, not for regular > networking and not for PTP. In fact, DSA-enabled ports, only the PTP > clock of the switch is usable. If you attempt to run ptp4l on the master > interface an error will be thrown back at you. > > Why am I mentioning this? Because the setting that's causing trouble for > us is 'port state of the host port OC', which in the context of what I > said above is nonsense. There _is_ no host port OC. There are 2 switch > ports which can act as individual OCs, or as a BC, or as a TC. But the switch has only one clock at all. I assume that the switch cannot be a boundary clock, only TC seems possible. > But > consider the case when the switch is a BC, with one of the front-panel > ports being a master and the other a slave. What mode are you supposed > to put the host port in, so that it receives both the Sync packets from > the slave port, and the Delay_Req packets from the master port?! It just > makes no sense to me. In principle I don't see any reason why this > switch would not be able to operate as a one-step peer delay BC. > > Unless somebody from Microchip could shed some light on the design > decisions of this switch (and there are enough Microchip people copied > already), here's what I would do. I would cut my losses and just support > peer delay, no E2E delay request-response mechanism (this means you'll > need to deploy peer delay to all devices within your network, but the > gains might be worth it). Because peer delay is symmetrical (both link > partners are both requestors as well as responders), there's no help in > the world that this switch could volunteer to give you in dropping > packets on your behalf. So I expect that if you hardcode: > - the port state for the host port OC as slave, then you'd get the Sync > messages from all ports, and the Delay_Req messages would be dropped > but you wouldn't care about those anyway, and > - the selection of TC mode to P2P TC. When using only P2P, setting the OCMODE bit to "slave" should work. > Then I would negotiate with Richard whether it's ok to add these extra > values to enum hwtstamp_rx_filters: > HWTSTAMP_FILTER_PTP_V2_PDELAY > HWTSTAMP_FILTER_PTP_V2_L4_PDELAY > As said above, having "filter setups" for E2E/P2P and for MASTER/SLAVE would probably fit well for this kind of hardware. > Given the fact that you're only limiting the timestamping to Pdelay > because we don't fully understand the hardware, I don't really know > whether introducing UAPI for this one situation is justifiable. If not, > then your driver will not have a chance to reject ptp4l -E, and it will > Simply Not Work.
On Mon, Nov 02, 2020 at 11:35:00AM +0100, Christian Eggers wrote: > Maybe my mail from October, 22 was ambiguous. I meant that despite of the > presence of filtering, a BCMA algorithm should be about to work (as no > Announce messages are filtered out). > > Additionally I said, that switching between "master" and "slave" mode could > not be done automatically by the driver, as the driver could at most detect > the presence of Sync messages (indication for master mode), but would do hard > to detect a transition to slave mode. > > I see a chance that user space (ptp4l) could configure the appropriate > "hardware filter setup" for master/slave mode. The concept that you want from user space is hard to define. You want ptp4l to make the driver aware of the port state, which is something that happens at runtime and is not part of the "hardware filter setup" as you say. Then, even if ptp4l notifies the driver of port state, E2E BC setups would still be broken because that's how the hardware works and no user space assistance can help with that. Also, what abstraction do you plan using for programming the PTP port state into the kernel. Maybe you should optimize for what you plan to use. If you need to use an E2E profile, maybe it would be worth the effort to find an appropriate abstraction for this port state thing. If you only use it for testing, then maybe it would be a good idea to keep the sysfs in your tree. > > Why am I mentioning this? Because the setting that's causing trouble for > > us is 'port state of the host port OC', which in the context of what I > > said above is nonsense. There _is_ no host port OC. There are 2 switch > > ports which can act as individual OCs, or as a BC, or as a TC. > But the switch has only one clock at all. I assume that the switch cannot be a > boundary clock, only TC seems possible. Why would a P2P BC not work? It does not require more than one clock. > As said above, having "filter setups" for E2E/P2P and for MASTER/SLAVE would > probably fit well for this kind of hardware. For E2E/P2P is one thing, for master/slave is a completely different thing.
Hi Vladimir, On Thursday, 22 October 2020, 13:32:43 CET, Vladimir Oltean wrote: > On Thu, Oct 22, 2020 at 01:11:40PM +0200, Christian Eggers wrote: > > On Thursday, 22 October 2020, 12:50:14 CEST, Vladimir Oltean wrote: > > after applying the RX timestamp correctly to the correction field > > (shifting > > the nanoseconds by 16), > > That modification should have been done anyway, since the unit of > measurement for correctionField is scaled ppb (48 bits nanoseconds, 16 > bits scaled nanoseconds), and not nanoseconds. > > > it seems that "moving" the timestamp back to the tail tag on TX is not > > required anymore. Keeping the RX timestamp simply in the correction > > field (negative value), works fine now. So this halves the effort in > > the tag_ksz driver. unfortunately I made a mistake when testing. Actually the timestamp *must* be moved from the correction field (negative) to the egress tail tag. > Ok, this makes sense. > Depending on what Richard responds, it now looks like the cleanest > approach would be to move your implementation that is currently in > ksz9477_update_ptp_correction_field() into a generic function called > > static inline void ptp_onestep_p2p_move_t2_to_correction(struct sk_buff > *skb, unsigned int ptp_type, > struct ptp_header *ptp_header, > ktime_t t2) I have implemented this in ptp_classify.h. Passing t2 instead of the correction field itself is fine for rx, but as this function is now still required for transmit, it looks a little bit misused there (see below). Shall I keep it as below, or revert it to passing value of the correction field itself? regards Christian static void ksz9477_xmit_timestamp(struct sk_buff *skb) { struct sk_buff *clone = DSA_SKB_CB(skb)->clone; struct ptp_header *ptp_hdr; u32 tstamp_raw = 0; u64 correction; if (!clone) goto out_put_tag; /* Use cached PTP header and type from ksz9477_ptp_should_tstamp(). Note * that KSZ9477_SKB_CB(clone)->ptp_header != NULL implies that this is a * Pdelay_resp message. */ ptp_hdr = KSZ9477_SKB_CB(clone)->ptp_header; if (!ptp_hdr) goto out_put_tag; correction = get_unaligned_be64(&ptp_hdr->correction); /* For PDelay_Resp messages we will likely have a negative value in the * correction field (see ksz9477_rcv()). The switch hardware cannot * correctly update such values, so it must be moved to the time stamp * field in the tail tag. */ if ((s64)correction < 0) { unsigned int ptp_type = KSZ9477_SKB_CB(clone)->ptp_type; struct timespec64 ts; u64 ns; /* Move ingress time stamp from PTP header's correction field to * tail tag. Format of the correction filed is 48 bit ns + 16 * bit fractional ns. Avoid shifting negative numbers. */ ns = -((s64)correction) >> 16; ts = ns_to_timespec64(ns); tstamp_raw = ((ts.tv_sec & 3) << 30) | ts.tv_nsec; >>> /* Set correction field to 0 (by subtracting the negative value) >>> * and update UDP checksum. >>> */ >>> ptp_onestep_p2p_move_t2_to_correction(skb, ptp_type, ptp_hdr, ns_to_ktime(-ns)); } out_put_tag: put_unaligned_be32(tstamp_raw, skb_put(skb, KSZ9477_PTP_TAG_LEN)); } Addtionally ptp_onestep_p2p_move_t2_to_correction() must be able to handle negative values: static inline void ptp_onestep_p2p_move_t2_to_correction(struct sk_buff *skb, unsigned int type, struct ptp_header *hdr, ktime_t t2) { u8 *ptr = skb_mac_header(skb); struct udphdr *uhdr = NULL; s64 ns = ktime_to_ns(t2); __be64 correction_old; s64 correction; /* previous correction value is required for checksum update. */ memcpy(&correction_old, &hdr->correction, sizeof(correction_old)); correction = (s64)be64_to_cpu(correction_old); /* PTP correction field consists of 32 bit nanoseconds and 16 bit * fractional nanoseconds. Avoid shifting negative numbers. */ >>> if (ns >= 0) >>> correction -= ns << 16; >>> else >>> correction += -ns << 16; /* write new correction value */ put_unaligned_be64((u64)correction, &hdr->correction); ... }
Sorry for getting back late to you. It did not compute when I read your email the first time around, then I let it sit for a while. On Thu, Nov 05, 2020 at 09:18:04PM +0100, Christian Eggers wrote: > unfortunately I made a mistake when testing. Actually the timestamp *must* be > moved from the correction field (negative) to the egress tail tag. That doesn't sound very good at all. I have a simple question. Can your driver, when operating as a PTP master clock, distribute a time in 2020 into your network? (you can check with "phc_ctl /dev/ptp0 get")
On Tuesday, 10 November 2020, 02:42:34 CET, Vladimir Oltean wrote: > Sorry for getting back late to you. It did not compute when I read your > email the first time around, then I let it sit for a while. > > On Thu, Nov 05, 2020 at 09:18:04PM +0100, Christian Eggers wrote: > > unfortunately I made a mistake when testing. Actually the timestamp *must* > > be moved from the correction field (negative) to the egress tail tag. > That doesn't sound very good at all. I think that is no drawback. It is implemented and works. > I have a simple question. Can your driver, when operating as a PTP > master clock, distribute a time in 2020 into your network? (you can > check with "phc_ctl /dev/ptp0 get") # phc_ctl /dev/ptp2 get phc_ctl[2141.093]: clock time is 1605018671.788481458 or Tue Nov 10 14:31:11 2020 The KSZ PTP clock has 32 bit seconds (and 30 bit nanoseconds). I hope I can send the new series tonight. Had today no 5 minutes without the telephone ringing...
On Tue, Nov 10, 2020 at 03:36:02PM +0100, Christian Eggers wrote: > On Tuesday, 10 November 2020, 02:42:34 CET, Vladimir Oltean wrote: > > Sorry for getting back late to you. It did not compute when I read your > > email the first time around, then I let it sit for a while. > > > > On Thu, Nov 05, 2020 at 09:18:04PM +0100, Christian Eggers wrote: > > > unfortunately I made a mistake when testing. Actually the timestamp *must* > > > be moved from the correction field (negative) to the egress tail tag. > > That doesn't sound very good at all. > I think that is no drawback. It is implemented and works. It works, but how? The reason why I'm asking you is this. I can guess that this hardware plays by a very simple song when performing one-step TX timestamping. correctionField_New = correctionField_Old + (t_TX - t_Tail_Tag) It does this because, as far as the hardware is concerned, it needs to make no difference between event messages, be they Sync, Pdelay_Req or Pdelay_Resp messages. First, let's see if we agree what happens for a Sync. | | t1 |------\ Sync | | ------\ | | ----->| t2 Master | | Slave Clock | | Clock | | When receiving the one-step Sync message, the Slave Clock must add the correctionField to the originTimestamp in order to figure out what is the t1 it should use. Usually, the correctionField would only be updated by transparent clocks. But when using one-step TX timestamping, it is not mandatory for master clocks to send the correctionField as zero (something which is mandatory for two-step). So they don't. Here's what the IEEE 1588 standard says: -----------------------------[cut here]----------------------------- 9.5.9.3 One-step clocks ----------------------- The originTimestamp field of the Sync message shall be an estimate no worse than ±1 s of the <syncEventEgressTimestamp> excluding any fractional nanoseconds. -----------------------------[cut here]----------------------------- In practice, the Master Clock will probably set the originTimestamp to something more or less arbitrary by reading the current PTP time of the NIC, then it will let the MAC rewrite the correctionField with the precise t_TX (hardware TX timestamp) minus the value from the originTimestamp field*. *Actually parsing the packet at that rate, while also rewriting and timestamping it, might be too tricky and too late, so the MAC, instead of reading the originTimestamp from the actual Sync message, will actually require you, the driver writer, to pass them the value of the originTimestamp twice, this time also through the tail tag (or through whatever other buffer metadata that hardware might use). So this formula would do that job: correctionField_New = correctionField_Old + (t_TX - t_Tail_Tag) where t_TX is the MAC TX timestamp and t_Tail_Tag should be the approximate originTimestamp of the Sync message. I am fairly confident that this is how your hardware works, because that's also how peer delay wants to be timestamped, it seems. It's just that in the case of Pdelay_Resp, t_Tail_Tag must be set to t2 (the precise timestamp of the Pdelay_Req), in order for the correctionField for that Pdelay_Resp message to contain (t3 - t2): | | t1 |------\ Pdelay_Req | | ------\ | | ----->| t2 Clock A | | Clock B | Pdelay_Resp /------| t3 | ------ | t4 |<-----/ | | | You don't do any of that here. So what must be happening in your case is that the originTimestamp for Sync messages is always zero, and the correctionField provides the rest of t1, in its entirety. That's because in your tagger, you also leave the t_Tail_Tag as zero for Sync messages (you only touch it for peer delay). Is that ok? Well, on paper yes, because 0 + t1 = t1 + 0 = t1 (either the correctionField or the originTimestamp could be zero, and they would add up to the same number), but in practice, the bit width of the originTimestamp is 64 bits, whereas the correctionField only has 48 bits for nanoseconds, the lower 16 bits being for fixed point. So if you're going to keep your entire t1 into the correctionField, then in practice your master clock can only advertise a time that holds 48 bits worth of nanoseconds, i.e. 3 1/4 days worth of TAI starting with Jan 1st 1970. Then it would wrap around. So how come this is not what happens in your case? I don't see you update the originTimestamp nor the t_Tail_Tag for Sync messages. The arithmetic that is done on the correctionField should be simple 48-bit two's complement. It should not matter if the correctionField is positive or negative. The only case I believe it would matter that the correctionField is negative is if the arithmetic is not actually on 48 bit two's complement, but on the partial timestamps directly (i.e. 32 bit two's complement). What I'm saying is that this formula is probably calculated in hardware in two steps: correctionField_New = correctionField_Old + (t_TX - t_Tail_Tag) Step 1: t_tmp = t_TX (32 bits) - t_Tail_Tag (32 bits), using 32-bit two's complement arithmetic Step 2: correctionField_New (48 bits) = correctionField_Old (48 bits) + t_tmp (32 bits), using 48-bit two's complement arithmetic, and where t_tmp is probably treated as an unsigned integer that is zero-padded up to 48 bits I am getting the feeling that there are still some things we're missing in this series. I would not hurry to send a v2 until they're clear. I hope that you're not testing PTP just between yourself and yourself. It's really easy for bugs to cancel out.
On Tue, Nov 10, 2020 at 06:40:45PM +0200, Vladimir Oltean wrote: > I am fairly confident that this is how your hardware works, because > that's also how peer delay wants to be timestamped, it seems. So I was confident and also wrong, it appears. My KSZ9477 datasheet says: In the host-to-switch direction, the 4-byte timestamp field is always present when PTP is enabled, as shown in Figure 4-6. This is true for all packets sent by the host, including IBA packets. The host uses this field to insert the receive timestamp from PTP Pdelay_Req messages into the Pdelay_Resp messages. For all other traffic and PTP message types, the host should populate the timestamp field with zeros. Hm. Does that mean that the switch updates the originTimestamp field of the Sync frames by itself? Ok... Very interesting that they decided to introduce a field in the tail tag for a single type of PTP message. But something is still wrong if you need to special-case the negative correctionField, it looks like the arithmetic is not done on the correct number of bits, either by the driver or by the hardware. And zeroing out the correctionField of the Pdelay_Resp on transmission, to put that value into t_Tail_Tag? How can you squeeze a 48-bit value into a 32-bit value without truncation?
Hi Vladimir, On Tuesday, 10 November 2020, 20:32:45 CET, Vladimir Oltean wrote: > On Tue, Nov 10, 2020 at 06:40:45PM +0200, Vladimir Oltean wrote: > > I am fairly confident that this is how your hardware works, because > > that's also how peer delay wants to be timestamped, it seems. > > So I was confident and also wrong, it appears. My KSZ9477 datasheet > says: > > In the host-to-switch direction, the 4-byte timestamp field is always > present when PTP is enabled, as shown in Figure 4-6. This is true for > all packets sent by the host, including IBA packets. The host uses this > field to insert the receive timestamp from PTP Pdelay_Req messages into > the Pdelay_Resp messages. For all other traffic and PTP message types, > the host should populate the timestamp field with zeros. > > Hm. Does that mean that the switch updates the originTimestamp field of > the Sync frames by itself? IMHO this is the best solution. User space / driver do not know the exact time (would require slow I2C transfer). So inserting the time in hardware seems to be the better solution. Maybe this is what the data sheet meant with "egress timestamp insertion". > Ok... Very interesting that they decided to > introduce a field in the tail tag for a single type of PTP message. > But something is still wrong if you need to special-case the negative > correctionField, it looks like the arithmetic is not done on the correct > number of bits, either by the driver or by the hardware. > Maybe I found the formula which is (should) applied to the correction field for PDelayResp: correction = correction_old + now + residental_delay + tx_latency - tail_tag <correction_old>: current value from the PTP header <now>: Time of the PTP clock when entering the switch <residental_delay>: Switching delay <Tx latency>: Delay between time stamping unit and wire (configurable via register) <tail_tag>: Time stamp in the tail tag The new correction value has been captured with Wireshark. For the measurement I simply halted the internal PTP clock and set it to zero (so it's value is exactly known). In the port's TX time stamp unit I got a non-zero time stamp anyway (between, 10 to 40 ns), so this must be the residential delay. I tested with different values for the correction field and the tail_tag. Negative values are no problem, but the calculation seems no to consider all bits. Measurements: - PTP clock: 0 (frozen) - Residential delay: variable - TX delay: 45 ns (default) - correction = 0xffff ffff ffff.ffff (-1.xxx ns) correction = correction + now + residental_delay + tx_latency - tail_tag = -1 + 0 40 + 45 0 = 84 (0000 0000 0054) wireshark: (0000 0000 0054) --> correct correction = correction + now + residental_delay + tx_latency - tail_tag = -1 + 0 32 + 45 1.000 = -924 (FFFF FFFF FC64) wireshark: (0000 FFFF FC64) --> wrong correction = correction + now + residental_delay + tx_latency - tail_tag = -1 + 0 24 + 45 2.000.000.000 = -1.926.258.108 (FFFF 8D2F A244) wireshark: (0000 7B9A CA44) --> wrong - correction = 0xffff ffff 0000.0000 (-65536.0 ns) correction = correction + now + residental_delay + tx_latency - tail_tag = -65536 0 24 + 45 0 = -65467 (FFFF FFFF 0045) wireshark: (FFFF FFFF 0045) --> correct correction = correction + now + residental_delay + tx_latency - tail_tag = -65536 + 0 32 + 45 1.000 = -66459 (FFFF FFFE FC65) wireshark: (0000 FFFE FC65) --> wrong Please note that the tail tag consist of 2 bits for seconds and 30 bit nanoseconds. So the value of 2.000.000.000 means 1s + 926.258.176 ns. As you are better in 2's complement as me, you can give me some more combinations for testing if you need. But in the end it looks like I should keep T2 in the tail tag. > And zeroing out the correctionField of the Pdelay_Resp on transmission, > to put that value into t_Tail_Tag? How can you squeeze a 48-bit value > into a 32-bit value without truncation? Only the lower bits are used. As long as PDelayResp doesn't take more than 4 seconds, this should be enough. regards Christian
Hi Vladimir, On Tuesday, 10 November 2020, 20:32:45 CET, Vladimir Oltean wrote: > On Tue, Nov 10, 2020 at 06:40:45PM +0200, Vladimir Oltean wrote: > > I am fairly confident that this is how your hardware works, because > > that's also how peer delay wants to be timestamped, it seems. > > So I was confident and also wrong, it appears. My KSZ9477 datasheet > says: > > In the host-to-switch direction, the 4-byte timestamp field is always > present when PTP is enabled, as shown in Figure 4-6. This is true for > all packets sent by the host, including IBA packets. The host uses this > field to insert the receive timestamp from PTP Pdelay_Req messages into > the Pdelay_Resp messages. For all other traffic and PTP message types, > the host should populate the timestamp field with zeros. > > Hm. Does that mean that the switch updates the originTimestamp field of > the Sync frames by itself? IMHO this is the best solution. User space / driver do not know the exact time (would require slow I2C transfer). So inserting the time in hardware seems to be the better solution. Maybe this is what the data sheet meant with "egress timestamp insertion". > Ok... Very interesting that they decided to > introduce a field in the tail tag for a single type of PTP message. > But something is still wrong if you need to special-case the negative > correctionField, it looks like the arithmetic is not done on the correct > number of bits, either by the driver or by the hardware. > Maybe I found the formula which is (should) applied to the correction field for PDelayResp: correction = correction_old + now + residental_delay + tx_latency - tail_tag <correction_old>: current value from the PTP header <now>: Time of the PTP clock when entering the switch <residental_delay>: Switching delay <Tx latency>: Delay between time stamping unit and wire (configurable via register) <tail_tag>: Time stamp in the tail tag The new correction value has been captured with Wireshark. For the measurement I simply halted the internal PTP clock and set it to zero (so it's value is exactly known). In the port's TX time stamp unit I got a non-zero time stamp anyway (between, 10 to 40 ns), so this must be the residential delay. I tested with different values for the correction field and the tail_tag. Negative values are no problem, but the calculation seems no to consider all bits. Measurements: - PTP clock: 0 (frozen) - Residential delay: variable - TX delay: 45 ns (default) - correction = 0xffff ffff ffff.ffff (-1.xxx ns) correction = correction + now + residental_delay + tx_latency - tail_tag = -1 + 0 40 + 45 0 = 84 (0000 0000 0054) wireshark: (0000 0000 0054) --> correct correction = correction + now + residental_delay + tx_latency - tail_tag = -1 + 0 32 + 45 1.000 = -924 (FFFF FFFF FC64) wireshark: (0000 FFFF FC64) --> wrong correction = correction + now + residental_delay + tx_latency - tail_tag = -1 + 0 24 + 45 2.000.000.000 = -1.926.258.108 (FFFF 8D2F A244) wireshark: (0000 7B9A CA44) --> wrong - correction = 0xffff ffff 0000.0000 (-65536.0 ns) correction = correction + now + residental_delay + tx_latency - tail_tag = -65536 0 24 + 45 0 = -65467 (FFFF FFFF 0045) wireshark: (FFFF FFFF 0045) --> correct correction = correction + now + residental_delay + tx_latency - tail_tag = -65536 + 0 32 + 45 1.000 = -66459 (FFFF FFFE FC65) wireshark: (0000 FFFE FC65) --> wrong Please note that the tail tag consist of 2 bits for seconds and 30 bit nanoseconds. So the value of 2.000.000.000 means 1s + 926.258.176 ns. As you are better in 2's complement as me, you can give me some more combinations for testing if you need. But in the end it looks like I should keep T2 in the tail tag. > And zeroing out the correctionField of the Pdelay_Resp on transmission, > to put that value into t_Tail_Tag? How can you squeeze a 48-bit value > into a 32-bit value without truncation? Only the lower bits are used. As long as PDelayResp doesn't take more than 4 seconds, this should be enough. regards Christian
Hi Vladimir, On Tuesday, 10 November 2020, 20:32:45 CET, Vladimir Oltean wrote: > But something is still wrong if you need to special-case the negative > correctionField, it looks like the arithmetic is not done on the correct > number of bits, either by the driver or by the hardware. I got it! The problem was caused because I subtracted the "full" timestamp (reconstructed 64 bit seconds) from the correction field. When the KSZ updates the correction field of the PDelay_Resp message on tx, only the "partial" (2 bit seconds) egress time stamp will be added. So the invalid values in the correction field were caused because much more seconds were subtracted on rx than added on tx. > And zeroing out the correctionField of the Pdelay_Resp on transmission, > to put that value into t_Tail_Tag? How can you squeeze a 48-bit value > into a 32-bit value without truncation? The answer is simple: Only the lower 32 bit (tail tag) must be subtracted as only 32 bit will be added on egress. v2 is on the way... regards Christian
On Thu, Nov 12, 2020 at 04:28:44PM +0100, Christian Eggers wrote: > the invalid values in the correction field were caused because much > more seconds were subtracted on rx than added on tx. [...] > v2 is on the way... Alright :)
On Thursday, 12 November 2020, 16:28:44 CET, Christian Eggers wrote: > Hi Vladimir, > > On Tuesday, 10 November 2020, 20:32:45 CET, Vladimir Oltean wrote: > > But something is still wrong if you need to special-case the negative > > correctionField, it looks like the arithmetic is not done on the correct > > number of bits, either by the driver or by the hardware. > > I got it! I got it not! While keeping the (negative) correction field works perfect when using PTP over L2 (what I did the last weeks), this causes an unwanted side effect when using UDP: ... > User Datagram Protocol, Src Port: 319, Dst Port: 319 > > Source Port: 319 > Destination Port: 319 > Length: 62 > > Checksum: 0x2285 incorrect, should be 0x2286 (maybe caused by "UDP checksum offload"?) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > [Checksum Status: Bad] > [Stream index: 0] > [Timestamps] > > Precision Time Protocol (IEEE1588) > > 0000 .... = transportSpecific: 0x0 > .... 0011 = messageId: Peer_Delay_Resp Message (0x3) > 0000 .... = Reserved: 0 > .... 0010 = versionPTP: 2 > messageLength: 54 > subdomainNumber: 0 > Reserved: 0 > flags: 0x0000 > correction: 5579788,000000 nanoseconds > Reserved: 0 > ClockIdentity: 0x849000fffe0980f7 > SourcePortID: 1 > sequenceId: 785 > control: Other Message (5) > logMessagePeriod: 127 > requestreceiptTimestamp (seconds): 0 > requestreceiptTimestamp (nanoseconds): 0 > requestingSourcePortIdentity: 0x849000fffe0980f6 > requestingSourcePortId: 2 While correction field is ok (residential delay ~5ms, using one printk...), the UDP checksum is off by one in all PDelay_Resp messages. The KSZ device offers on option to set the UDP checksum to zero, but this also didn't help and additionally wouldn't work for IPv6. It seems that I should return to "moving T2 from the correction field to the tail tag" on tx. regards Christian
diff --git a/drivers/net/dsa/microchip/ksz9477_main.c b/drivers/net/dsa/microchip/ksz9477_main.c index 7d623400139f..42cd17c8c25d 100644 --- a/drivers/net/dsa/microchip/ksz9477_main.c +++ b/drivers/net/dsa/microchip/ksz9477_main.c @@ -1388,6 +1388,7 @@ static const struct dsa_switch_ops ksz9477_switch_ops = { .phy_read = ksz9477_phy_read16, .phy_write = ksz9477_phy_write16, .phylink_mac_link_down = ksz_mac_link_down, + .get_ts_info = ksz9477_ptp_get_ts_info, .port_enable = ksz_enable_port, .get_strings = ksz9477_get_strings, .get_ethtool_stats = ksz_get_ethtool_stats, @@ -1408,6 +1409,11 @@ static const struct dsa_switch_ops ksz9477_switch_ops = { .port_mdb_del = ksz9477_port_mdb_del, .port_mirror_add = ksz9477_port_mirror_add, .port_mirror_del = ksz9477_port_mirror_del, + .port_hwtstamp_get = ksz9477_ptp_port_hwtstamp_get, + .port_hwtstamp_set = ksz9477_ptp_port_hwtstamp_set, + .port_txtstamp = ksz9477_ptp_port_txtstamp, + /* never defer rx delivery, tstamping is done via tail tagging */ + .port_rxtstamp = NULL, }; static u32 ksz9477_get_port_addr(int port, int offset) @@ -1554,7 +1560,8 @@ static irqreturn_t ksz9477_switch_irq_thread(int irq, void *dev_id) if (ret) return result; - /* ToDo: Add specific handling of port interrupts */ + if (data8 & PORT_PTP_INT) + result |= ksz9477_ptp_port_interrupt(dev, port); } } diff --git a/drivers/net/dsa/microchip/ksz9477_ptp.c b/drivers/net/dsa/microchip/ksz9477_ptp.c index 44d7bbdea518..d4bcfff0577e 100644 --- a/drivers/net/dsa/microchip/ksz9477_ptp.c +++ b/drivers/net/dsa/microchip/ksz9477_ptp.c @@ -6,7 +6,10 @@ * Copyright (c) 2020 ARRI Lighting */ +#include <linux/net_tstamp.h> +#include <linux/ptp_classify.h> #include <linux/ptp_clock_kernel.h> +#include <linux/sysfs.h> #include "ksz_common.h" #include "ksz9477_reg.h" @@ -71,8 +74,10 @@ static int ksz9477_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm) static int ksz9477_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta) { struct ksz_device *dev = container_of(ptp, struct ksz_device, ptp_caps); + struct timespec64 delta64 = ns_to_timespec64(delta); s32 sec, nsec; u16 data16; + unsigned long flags; int ret; mutex_lock(&dev->ptp_mutex); @@ -103,6 +108,10 @@ static int ksz9477_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta) if (ret) goto error_return; + spin_lock_irqsave(&dev->ptp_clock_lock, flags); + dev->ptp_clock_time = timespec64_add(dev->ptp_clock_time, delta64); + spin_unlock_irqrestore(&dev->ptp_clock_lock, flags); + error_return: mutex_unlock(&dev->ptp_mutex); return ret; @@ -160,6 +169,7 @@ static int ksz9477_ptp_settime(struct ptp_clock_info *ptp, struct timespec64 con { struct ksz_device *dev = container_of(ptp, struct ksz_device, ptp_caps); u16 data16; + unsigned long flags; int ret; mutex_lock(&dev->ptp_mutex); @@ -198,6 +208,10 @@ static int ksz9477_ptp_settime(struct ptp_clock_info *ptp, struct timespec64 con if (ret) goto error_return; + spin_lock_irqsave(&dev->ptp_clock_lock, flags); + dev->ptp_clock_time = *ts; + spin_unlock_irqrestore(&dev->ptp_clock_lock, flags); + error_return: mutex_unlock(&dev->ptp_mutex); return ret; @@ -208,9 +222,27 @@ static int ksz9477_ptp_enable(struct ptp_clock_info *ptp, struct ptp_clock_reque return -ENOTTY; } +static long ksz9477_ptp_do_aux_work(struct ptp_clock_info *ptp) +{ + struct ksz_device *dev = container_of(ptp, struct ksz_device, ptp_caps); + struct timespec64 ts; + unsigned long flags; + + mutex_lock(&dev->ptp_mutex); + _ksz9477_ptp_gettime(dev, &ts); + mutex_unlock(&dev->ptp_mutex); + + spin_lock_irqsave(&dev->ptp_clock_lock, flags); + dev->ptp_clock_time = ts; + spin_unlock_irqrestore(&dev->ptp_clock_lock, flags); + + return HZ; /* reschedule in 1 second */ +} + static int ksz9477_ptp_start_clock(struct ksz_device *dev) { u16 data; + unsigned long flags; int ret; ret = ksz_read16(dev, REG_PTP_CLK_CTRL, &data); @@ -230,6 +262,11 @@ static int ksz9477_ptp_start_clock(struct ksz_device *dev) if (ret) return ret; + spin_lock_irqsave(&dev->ptp_clock_lock, flags); + dev->ptp_clock_time.tv_sec = 0; + dev->ptp_clock_time.tv_nsec = 0; + spin_unlock_irqrestore(&dev->ptp_clock_lock, flags); + return 0; } @@ -251,11 +288,349 @@ static int ksz9477_ptp_stop_clock(struct ksz_device *dev) return 0; } +/* Time stamping support */ + +static int ksz9477_ptp_enable_mode(struct ksz_device *dev) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + /* Enable PTP mode */ + data |= PTP_ENABLE; + ret = ksz_write16(dev, REG_PTP_MSG_CONF1, data); + if (ret) + return ret; + + return 0; +} + +static int ksz9477_ptp_disable_mode(struct ksz_device *dev) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + /* Disable PTP mode */ + data &= ~PTP_ENABLE; + ret = ksz_write16(dev, REG_PTP_MSG_CONF1, data); + if (ret) + return ret; + + return 0; +} + +static int ksz9477_ptp_enable_port_ptp_interrupts(struct ksz_device *dev, int port) +{ + u32 addr = PORT_CTRL_ADDR(port, REG_PORT_INT_MASK); + u8 data; + int ret; + + ret = ksz_read8(dev, addr, &data); + if (ret) + return ret; + + /* Enable port PTP interrupt (0 means enabled) */ + data &= ~PORT_PTP_INT; + ret = ksz_write8(dev, addr, data); + if (ret) + return ret; + + return 0; +} + +static int ksz9477_ptp_disable_port_ptp_interrupts(struct ksz_device *dev, int port) +{ + u32 addr = PORT_CTRL_ADDR(port, REG_PORT_INT_MASK); + u8 data; + int ret; + + ret = ksz_read8(dev, addr, &data); + if (ret) + return ret; + + /* Enable port PTP interrupt (1 means disabled) */ + data |= PORT_PTP_INT; + ret = ksz_write8(dev, addr, data); + if (ret) + return ret; + + return 0; +} + +static int ksz9477_ptp_enable_port_egress_interrupts(struct ksz_device *dev, int port) +{ + u32 addr = PORT_CTRL_ADDR(port, REG_PTP_PORT_TX_INT_ENABLE__2); + u16 data; + int ret; + + ret = ksz_read16(dev, addr, &data); + if (ret) + return ret; + + /* Enable port xdelay egress timestamp interrupt (1 means enabled) */ + data |= PTP_PORT_XDELAY_REQ_INT; + ret = ksz_write16(dev, addr, data); + if (ret) + return ret; + + return 0; +} + +static int ksz9477_ptp_disable_port_egress_interrupts(struct ksz_device *dev, int port) +{ + u32 addr = PORT_CTRL_ADDR(port, REG_PTP_PORT_TX_INT_ENABLE__2); + u16 data; + int ret; + + ret = ksz_read16(dev, addr, &data); + if (ret) + return ret; + + /* Disable port xdelay egress timestamp interrupts (0 means disabled) */ + data &= PTP_PORT_XDELAY_REQ_INT; + ret = ksz_write16(dev, addr, data); + if (ret) + return ret; + + return 0; +} + +static int ksz9477_ptp_port_init(struct ksz_device *dev, int port) +{ + struct ksz_port *prt = &dev->ports[port]; + int ret; + + /* Read rx and tx delay from port registers */ + ret = ksz_read16(dev, PORT_CTRL_ADDR(port, REG_PTP_PORT_RX_DELAY__2), + &prt->tstamp_rx_latency_ns); + if (ret) + return ret; + + ret = ksz_read16(dev, PORT_CTRL_ADDR(port, REG_PTP_PORT_TX_DELAY__2), + &prt->tstamp_tx_latency_ns); + if (ret) + return ret; + + if (port != dev->cpu_port) { + ret = ksz9477_ptp_enable_port_ptp_interrupts(dev, port); + if (ret) + return ret; + + ret = ksz9477_ptp_enable_port_egress_interrupts(dev, port); + if (ret) + goto error_disable_port_ptp_interrupts; + } + + return 0; + +error_disable_port_ptp_interrupts: + if (port != dev->cpu_port) + ksz9477_ptp_disable_port_ptp_interrupts(dev, port); + return ret; +} + +static void ksz9477_ptp_port_deinit(struct ksz_device *dev, int port) +{ + if (port != dev->cpu_port) { + ksz9477_ptp_disable_port_egress_interrupts(dev, port); + ksz9477_ptp_disable_port_ptp_interrupts(dev, port); + } +} + +static int ksz9477_ptp_ports_init(struct ksz_device *dev) +{ + int port; + int ret; + + for (port = 0; port < dev->port_cnt; port++) { + ret = ksz9477_ptp_port_init(dev, port); + if (ret) + goto error_deinit; + } + + return 0; + +error_deinit: + for (--port; port >= 0; --port) + ksz9477_ptp_port_deinit(dev, port); + return ret; +} + +static void ksz9477_ptp_ports_deinit(struct ksz_device *dev) +{ + int port; + + for (port = dev->port_cnt - 1; port >= 0; --port) + ksz9477_ptp_port_deinit(dev, port); +} + +/* device attributes */ + +enum ksz9477_ptp_tcmode { + KSZ9477_PTP_TCMODE_E2E, + KSZ9477_PTP_TCMODE_P2P, +}; + +static int ksz9477_ptp_tcmode_get(struct ksz_device *dev, enum ksz9477_ptp_tcmode *tcmode) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + *tcmode = (data & PTP_TC_P2P) ? KSZ9477_PTP_TCMODE_P2P : KSZ9477_PTP_TCMODE_E2E; + + return 0; +} + +static int ksz9477_ptp_tcmode_set(struct ksz_device *dev, enum ksz9477_ptp_tcmode tcmode) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + if (tcmode == KSZ9477_PTP_TCMODE_P2P) + data |= PTP_TC_P2P; + else + data &= ~PTP_TC_P2P; + + ret = ksz_write16(dev, REG_PTP_MSG_CONF1, data); + if (ret) + return ret; + + return 0; +} + +static ssize_t tcmode_show(struct device *dev, struct device_attribute *attr __always_unused, + char *buf) +{ + struct ksz_device *ksz = dev_get_drvdata(dev); + enum ksz9477_ptp_tcmode tcmode; + int ret = ksz9477_ptp_tcmode_get(ksz, &tcmode); + + if (ret) + return ret; + + return sprintf(buf, "%s\n", tcmode == KSZ9477_PTP_TCMODE_P2P ? "P2P" : "E2E"); +} + +static ssize_t tcmode_store(struct device *dev, struct device_attribute *attr __always_unused, + char const *buf, size_t count) +{ + struct ksz_device *ksz = dev_get_drvdata(dev); + int ret; + + if (strcasecmp(buf, "E2E") == 0) + ret = ksz9477_ptp_tcmode_set(ksz, KSZ9477_PTP_TCMODE_E2E); + else if (strcasecmp(buf, "P2P") == 0) + ret = ksz9477_ptp_tcmode_set(ksz, KSZ9477_PTP_TCMODE_P2P); + else + return -EINVAL; + + return ret ? ret : (ssize_t)count; +} + +static DEVICE_ATTR_RW(tcmode); + +enum ksz9477_ptp_ocmode { + KSZ9477_PTP_OCMODE_SLAVE, + KSZ9477_PTP_OCMODE_MASTER, +}; + +static int ksz9477_ptp_ocmode_get(struct ksz_device *dev, enum ksz9477_ptp_ocmode *ocmode) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + *ocmode = (data & PTP_MASTER) ? KSZ9477_PTP_OCMODE_MASTER : KSZ9477_PTP_OCMODE_SLAVE; + + return 0; +} + +static int ksz9477_ptp_ocmode_set(struct ksz_device *dev, enum ksz9477_ptp_ocmode ocmode) +{ + u16 data; + int ret; + + ret = ksz_read16(dev, REG_PTP_MSG_CONF1, &data); + if (ret) + return ret; + + if (ocmode == KSZ9477_PTP_OCMODE_MASTER) + data |= PTP_MASTER; + else + data &= ~PTP_MASTER; + + ret = ksz_write16(dev, REG_PTP_MSG_CONF1, data); + if (ret) + return ret; + + return 0; +} + +static ssize_t ocmode_show(struct device *dev, struct device_attribute *attr __always_unused, + char *buf) +{ + struct ksz_device *ksz = dev_get_drvdata(dev); + enum ksz9477_ptp_ocmode ocmode; + int ret = ksz9477_ptp_ocmode_get(ksz, &ocmode); + + if (ret) + return ret; + + return sprintf(buf, "%s\n", ocmode == KSZ9477_PTP_OCMODE_MASTER ? "master" : "slave"); +} + +static ssize_t ocmode_store(struct device *dev, struct device_attribute *attr __always_unused, + char const *buf, size_t count) +{ + struct ksz_device *ksz = dev_get_drvdata(dev); + int ret; + + if (strcasecmp(buf, "master") == 0) + ret = ksz9477_ptp_ocmode_set(ksz, KSZ9477_PTP_OCMODE_MASTER); + else if (strcasecmp(buf, "slave") == 0) + ret = ksz9477_ptp_ocmode_set(ksz, KSZ9477_PTP_OCMODE_SLAVE); + else + return -EINVAL; + + return ret ? ret : (ssize_t)count; +} + +static DEVICE_ATTR_RW(ocmode); + +static struct attribute *ksz9477_ptp_attrs[] = { + &dev_attr_tcmode.attr, + &dev_attr_ocmode.attr, + NULL, +}; + +static struct attribute_group ksz9477_ptp_attrgrp = { + .attrs = ksz9477_ptp_attrs, +}; + int ksz9477_ptp_init(struct ksz_device *dev) { int ret; mutex_init(&dev->ptp_mutex); + spin_lock_init(&dev->ptp_clock_lock); /* PTP clock properties */ @@ -275,6 +650,7 @@ int ksz9477_ptp_init(struct ksz_device *dev) dev->ptp_caps.gettime64 = ksz9477_ptp_gettime; dev->ptp_caps.settime64 = ksz9477_ptp_settime; dev->ptp_caps.enable = ksz9477_ptp_enable; + dev->ptp_caps.do_aux_work = ksz9477_ptp_do_aux_work; /* Start hardware counter (will overflow after 136 years) */ ret = ksz9477_ptp_start_clock(dev); @@ -287,8 +663,36 @@ int ksz9477_ptp_init(struct ksz_device *dev) goto error_stop_clock; } + /* Enable PTP mode (will affect tail tagging format) */ + ret = ksz9477_ptp_enable_mode(dev); + if (ret) + goto error_unregister_clock; + + /* Init switch ports */ + ret = ksz9477_ptp_ports_init(dev); + if (ret) + goto error_disable_mode; + + /* Init attributes */ + ret = sysfs_create_group(&dev->dev->kobj, &ksz9477_ptp_attrgrp); + if (ret) + goto error_ports_deinit; + + /* Schedule cyclic call of ksz_ptp_do_aux_work() */ + ret = ptp_schedule_worker(dev->ptp_clock, 0); + if (ret) + goto error_device_remove_attrgrp; + return 0; +error_device_remove_attrgrp: + sysfs_remove_group(&dev->dev->kobj, &ksz9477_ptp_attrgrp); +error_ports_deinit: + ksz9477_ptp_ports_deinit(dev); +error_disable_mode: + ksz9477_ptp_disable_mode(dev); +error_unregister_clock: + ptp_clock_unregister(dev->ptp_clock); error_stop_clock: ksz9477_ptp_stop_clock(dev); return ret; @@ -296,6 +700,231 @@ int ksz9477_ptp_init(struct ksz_device *dev) void ksz9477_ptp_deinit(struct ksz_device *dev) { + sysfs_remove_group(&dev->dev->kobj, &ksz9477_ptp_attrgrp); + ksz9477_ptp_ports_deinit(dev); + ksz9477_ptp_disable_mode(dev); ptp_clock_unregister(dev->ptp_clock); ksz9477_ptp_stop_clock(dev); } + +irqreturn_t ksz9477_ptp_port_interrupt(struct ksz_device *dev, int port) +{ + u32 addr = PORT_CTRL_ADDR(port, REG_PTP_PORT_TX_INT_STATUS__2); + struct ksz_port *prt = &dev->ports[port]; + irqreturn_t result = IRQ_NONE; + u16 data; + int ret; + + ret = ksz_read16(dev, addr, &data); + if (ret) + return IRQ_NONE; + + if ((data & PTP_PORT_XDELAY_REQ_INT) && prt->tstamp_tx_xdelay_skb) { + /* Timestamp for Pdelay_Req / Delay_Req */ + u32 tstamp_raw; + struct skb_shared_hwtstamps shhwtstamps; + struct sk_buff *tmp_skb; + + /* In contrast to the KSZ9563R data sheet, the format of the + * port time stamp registers is 2 bit seconds + 30 bit nano + * seconds (same as in the tail tags). + */ + ret = ksz_read32(dev, PORT_CTRL_ADDR(port, REG_PTP_PORT_XDELAY_TS), &tstamp_raw); + if (ret) + return result; + + memset(&shhwtstamps, 0, sizeof(shhwtstamps)); + shhwtstamps.hwtstamp = ksz9477_tstamp_to_clock(dev, tstamp_raw, + prt->tstamp_tx_latency_ns); + + /* skb_complete_tx_timestamp() will free up the client to make + * another timestamp-able transmit. We have to be ready for it + * -- by clearing the ps->tx_skb "flag" -- beforehand. + */ + + tmp_skb = prt->tstamp_tx_xdelay_skb; + prt->tstamp_tx_xdelay_skb = NULL; + clear_bit_unlock(KSZ_HWTSTAMP_TX_XDELAY_IN_PROGRESS, &prt->tstamp_state); + skb_complete_tx_timestamp(tmp_skb, &shhwtstamps); + } + + /* Clear interrupt(s) (W1C) */ + ret = ksz_write16(dev, addr, data); + if (ret) + return IRQ_NONE; + + return IRQ_HANDLED; +} + +/* DSA PTP operations */ + +int ksz9477_ptp_get_ts_info(struct dsa_switch *ds, int port __always_unused, + struct ethtool_ts_info *ts) +{ + struct ksz_device *dev = ds->priv; + + ts->so_timestamping = + SOF_TIMESTAMPING_TX_HARDWARE | + SOF_TIMESTAMPING_RX_HARDWARE | + SOF_TIMESTAMPING_RAW_HARDWARE; + + ts->phc_index = ptp_clock_index(dev->ptp_clock); + + ts->tx_types = + BIT(HWTSTAMP_TX_OFF) | + BIT(HWTSTAMP_TX_ONESTEP_P2P); + + ts->rx_filters = + BIT(HWTSTAMP_FILTER_NONE) | + BIT(HWTSTAMP_FILTER_PTP_V2_L4_EVENT) | + BIT(HWTSTAMP_FILTER_PTP_V2_L2_EVENT) | + BIT(HWTSTAMP_FILTER_PTP_V2_EVENT); + + return 0; +} + +static int ksz9477_set_hwtstamp_config(struct ksz_device *dev, int port, + struct hwtstamp_config *config) +{ + struct ksz_port *prt = &dev->ports[port]; + bool tstamp_enable = false; + + /* Prevent the TX/RX paths from trying to interact with the + * timestamp hardware while we reconfigure it. + */ + clear_bit_unlock(KSZ_HWTSTAMP_ENABLED, &prt->tstamp_state); + + /* reserved for future extensions */ + if (config->flags) + return -EINVAL; + + switch (config->tx_type) { + case HWTSTAMP_TX_OFF: + tstamp_enable = false; + break; + case HWTSTAMP_TX_ONESTEP_P2P: + tstamp_enable = true; + break; + default: + return -ERANGE; + } + + switch (config->rx_filter) { + case HWTSTAMP_FILTER_NONE: + tstamp_enable = false; + break; + case HWTSTAMP_FILTER_PTP_V2_L4_EVENT: + case HWTSTAMP_FILTER_PTP_V2_L4_SYNC: + case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ: + config->rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT; /* UDPv4/UDPv6 */ + break; + case HWTSTAMP_FILTER_PTP_V2_L2_EVENT: + case HWTSTAMP_FILTER_PTP_V2_L2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ: + config->rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT; /* 802.3 ether */ + break; + case HWTSTAMP_FILTER_PTP_V2_EVENT: + case HWTSTAMP_FILTER_PTP_V2_SYNC: + case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: + config->rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT; /* UDP / 802.3 */ + break; + case HWTSTAMP_FILTER_ALL: + default: + config->rx_filter = HWTSTAMP_FILTER_NONE; + return -ERANGE; + } + + /* Once hardware has been configured, enable timestamp checks + * in the RX/TX paths. + */ + if (tstamp_enable) + set_bit(KSZ_HWTSTAMP_ENABLED, &prt->tstamp_state); + + return 0; +} + +int ksz9477_ptp_port_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr) +{ + struct ksz_device *dev = ds->priv; + struct hwtstamp_config *port_tstamp_config = &dev->ports[port].tstamp_config; + + return copy_to_user(ifr->ifr_data, + port_tstamp_config, sizeof(*port_tstamp_config)) ? -EFAULT : 0; +} + +int ksz9477_ptp_port_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr) +{ + struct ksz_device *dev = ds->priv; + struct hwtstamp_config *port_tstamp_config = &dev->ports[port].tstamp_config; + struct hwtstamp_config config; + int err; + + if (copy_from_user(&config, ifr->ifr_data, sizeof(config))) + return -EFAULT; + + err = ksz9477_set_hwtstamp_config(dev, port, &config); + if (err) + return err; + + /* Save the chosen configuration to be returned later. */ + memcpy(port_tstamp_config, &config, sizeof(config)); + + return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ? -EFAULT : 0; +} + +/* Returns a pointer to the PTP header if the caller should time stamp, + * or NULL if the caller should not. + */ +static struct ptp_header *ksz9477_ptp_should_tstamp(struct ksz_port *port, struct sk_buff *skb, + unsigned int type) +{ + if (!test_bit(KSZ_HWTSTAMP_ENABLED, &port->tstamp_state)) + return NULL; + + return ptp_parse_header(skb, type); +} + +bool ksz9477_ptp_port_txtstamp(struct dsa_switch *ds, int port, struct sk_buff *clone, + unsigned int type) +{ + struct ksz_device *dev = ds->priv; + struct ksz_port *prt = &dev->ports[port]; + enum ksz9477_ptp_event_messages msg_type; + struct ptp_header *hdr; + + /* KSZ9563 supports PTPv2 only */ + if (!(type & PTP_CLASS_V2)) + return false; + + /* Should already been tested in dsa_skb_tx_timestamp()? */ + if (!(skb_shinfo(clone)->tx_flags & SKBTX_HW_TSTAMP)) + return false; + + hdr = ksz9477_ptp_should_tstamp(prt, clone, type); + if (!hdr) + return false; + + msg_type = ptp_get_msgtype(hdr, type); + + switch (msg_type) { + /* As the KSZ9563 always performs one step time stamping, only the time + * stamp for Delay_Req and Pdelay_Req are reported to the application + * via socket error queue. Time stamps for Sync and Pdelay_resp will be + * applied directly to the outgoing message (e.g. correction field), but + * will NOT be reported to the socket. + */ + case PTP_Event_Message_Delay_Req: + case PTP_Event_Message_Pdelay_Req: + if (test_and_set_bit_lock(KSZ_HWTSTAMP_TX_XDELAY_IN_PROGRESS, + &prt->tstamp_state)) + return false; /* free cloned skb */ + + prt->tstamp_tx_xdelay_skb = clone; + break; + + default: + return false; /* free cloned skb */ + } + + return true; /* keep cloned skb */ +} diff --git a/drivers/net/dsa/microchip/ksz9477_ptp.h b/drivers/net/dsa/microchip/ksz9477_ptp.h index 0076538419fa..e8d50a086311 100644 --- a/drivers/net/dsa/microchip/ksz9477_ptp.h +++ b/drivers/net/dsa/microchip/ksz9477_ptp.h @@ -10,6 +10,9 @@ #ifndef DRIVERS_NET_DSA_MICROCHIP_KSZ9477_PTP_H_ #define DRIVERS_NET_DSA_MICROCHIP_KSZ9477_PTP_H_ +#include <linux/irqreturn.h> +#include <linux/types.h> + #include "ksz_common.h" #if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP) @@ -17,11 +20,35 @@ int ksz9477_ptp_init(struct ksz_device *dev); void ksz9477_ptp_deinit(struct ksz_device *dev); +irqreturn_t ksz9477_ptp_port_interrupt(struct ksz_device *dev, int port); + +int ksz9477_ptp_get_ts_info(struct dsa_switch *ds, int port, struct ethtool_ts_info *ts); +int ksz9477_ptp_port_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr); +int ksz9477_ptp_port_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr); +bool ksz9477_ptp_port_txtstamp(struct dsa_switch *ds, int port, struct sk_buff *clone, + unsigned int type); + #else static inline int ksz9477_ptp_init(struct ksz_device *dev) { return 0; } static inline void ksz9477_ptp_deinit(struct ksz_device *dev) {} +static inline irqreturn_t ksz9477_ptp_port_interrupt(struct ksz_device *dev, int port) + { return IRQ_NONE; } + +static inline int ksz9477_ptp_get_ts_info(struct dsa_switch *ds, int port, + struct ethtool_ts_info *ts) { return -EOPNOTSUPP; } + +static inline int ksz9477_ptp_port_hwtstamp_get(struct dsa_switch *ds, int port, + struct ifreq *ifr) { return -EOPNOTSUPP; } + +static inline int ksz9477_ptp_port_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr) + { return -EOPNOTSUPP; } + +static inline bool ksz9477_ptp_port_txtstamp(struct dsa_switch *ds, int port, + struct sk_buff *clone, unsigned int type) + { return false; } + #endif #endif /* DRIVERS_NET_DSA_MICROCHIP_KSZ9477_PTP_H_ */ diff --git a/include/linux/dsa/ksz_common.h b/include/linux/dsa/ksz_common.h index 4d5b6cc9429a..10b42154ba5a 100644 --- a/include/linux/dsa/ksz_common.h +++ b/include/linux/dsa/ksz_common.h @@ -8,8 +8,10 @@ #include <linux/gpio/consumer.h> #include <linux/kernel.h> #include <linux/mutex.h> +#include <linux/net_tstamp.h> #include <linux/phy.h> #include <linux/ptp_clock_kernel.h> +#include <linux/spinlock.h> #include <linux/regmap.h> #include <linux/timer.h> #include <linux/workqueue.h> @@ -25,6 +27,12 @@ struct ksz_port_mib { u64 *counters; }; +/* state flags for ksz_port::tstamp_state */ +enum { + KSZ_HWTSTAMP_ENABLED, + KSZ_HWTSTAMP_TX_XDELAY_IN_PROGRESS, +}; + struct ksz_port { u16 member; u16 vid_member; @@ -41,6 +49,13 @@ struct ksz_port { struct ksz_port_mib mib; phy_interface_t interface; +#if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP) + u16 tstamp_rx_latency_ns; /* rx delay from wire to tstamp unit */ + u16 tstamp_tx_latency_ns; /* tx delay from tstamp unit to wire */ + struct hwtstamp_config tstamp_config; + struct sk_buff *tstamp_tx_xdelay_skb; + unsigned long tstamp_state; +#endif }; struct ksz_device { @@ -98,7 +113,22 @@ struct ksz_device { struct ptp_clock *ptp_clock; struct ptp_clock_info ptp_caps; struct mutex ptp_mutex; + spinlock_t ptp_clock_lock; /* for ptp_clock_time */ + /* approximated current time, read once per second from hardware */ + struct timespec64 ptp_clock_time; #endif }; +/* KSZ9563 will only timestamp event messages */ +enum ksz9477_ptp_event_messages { + PTP_Event_Message_Sync = 0x0, + PTP_Event_Message_Delay_Req = 0x1, + PTP_Event_Message_Pdelay_Req = 0x2, + PTP_Event_Message_Pdelay_Resp = 0x3, +}; + +/* net/dsa/tag_ksz.c */ +ktime_t ksz9477_tstamp_to_clock(struct ksz_device *ksz, u32 tstamp, + int offset_ns); + #endif /* _NET_DSA_KSZ_COMMON_H_ */ diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 4820dbcedfa2..9dbe2fde3db0 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -4,10 +4,14 @@ * Copyright (c) 2017 Microchip Technology */ +#include <asm/unaligned.h> +#include <linux/dsa/ksz_common.h> #include <linux/etherdevice.h> #include <linux/list.h> +#include <linux/ptp_classify.h> #include <linux/slab.h> #include <net/dsa.h> +#include <net/checksum.h> #include "dsa_priv.h" /* Typically only one byte is used for tail tag. */ @@ -87,26 +91,210 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ8795); /* * For Ingress (Host -> KSZ9477), 2 bytes are added before FCS. * --------------------------------------------------------------------------- - * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes) + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|ts(4bytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes) * --------------------------------------------------------------------------- + * ts : time stamp (only present if PTP is enabled on the hardware). * tag0 : Prioritization (not used now) * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5) * - * For Egress (KSZ9477 -> Host), 1 byte is added before FCS. + * For Egress (KSZ9477 -> Host), 1/4 bytes are added before FCS. * --------------------------------------------------------------------------- - * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|FCS(4bytes) + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|ts(4bytes)|tag0(1byte)|FCS(4bytes) * --------------------------------------------------------------------------- + * ts : time stamp (only present of bit 7 in tag0 is set * tag0 : zero-based value represents port * (eg, 0x00=port1, 0x02=port3, 0x06=port7) */ #define KSZ9477_INGRESS_TAG_LEN 2 #define KSZ9477_PTP_TAG_LEN 4 -#define KSZ9477_PTP_TAG_INDICATION 0x80 +#define KSZ9477_PTP_TAG_INDICATION BIT(7) #define KSZ9477_TAIL_TAG_OVERRIDE BIT(9) #define KSZ9477_TAIL_TAG_LOOKUP BIT(10) +#if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP) +static inline __wsum ksz9477_check_diff8(__be64 old, __be64 new, __wsum oldsum) +{ + __be64 diff[2] = { ~old, new }; + + return csum_partial(diff, sizeof(diff), oldsum); +} + +/* Replace content of PTP header's correction field and update UDP checksum */ +static void ksz9477_update_ptp_correction_field(struct sk_buff *skb, unsigned int type, + struct ptp_header *ptp_header, u64 value) +{ + u8 *ptr = skb_mac_header(skb); + struct udphdr *uhdr = NULL; + __be64 correction_old; + + /* previous correction value is required for checksum update. */ + memcpy(&correction_old, &ptp_header->correction, sizeof(correction_old)); + + /* write new correction value */ + put_unaligned_be64(value, &ptp_header->correction); + + /* locate udp header */ + if (type & PTP_CLASS_VLAN) + ptr += VLAN_HLEN; + + ptr += ETH_HLEN; + + switch (type & PTP_CLASS_PMASK) { + case PTP_CLASS_IPV4: + ptr += ((struct iphdr *)ptr)->ihl << 2; + uhdr = (struct udphdr *)ptr; + break; + case PTP_CLASS_IPV6: + ptr += IP6_HLEN; + uhdr = (struct udphdr *)ptr; + break; + } + + if (uhdr) { + /* update checksum */ + uhdr->check = csum_fold(ksz9477_check_diff8(correction_old, ptp_header->correction, + ~csum_unfold(uhdr->check))); + if (!uhdr->check) + uhdr->check = CSUM_MANGLED_0; + } +} + +/* Time stamp tag is only inserted if PTP is enabled in hardware. */ +static void ksz9477_xmit_timestamp(struct sk_buff *skb) +{ + enum ksz9477_ptp_event_messages msg_type; + struct ptp_header *ptp_hdr; + unsigned int ptp_type; + u32 tstamp_raw = 0; + u64 correction; + + /* For PDelay_Resp messages we will likely have a negative value in the + * correction field (see ksz9477_rcv()). The switch hardware cannot + * correctly update such values, so it must be moved to the time stamp + * field in the tail tag. + */ + + /* Check PTP message type */ + ptp_type = ptp_classify_raw(skb); + + if (ptp_type == PTP_CLASS_NONE) + goto out_put_tag; + + ptp_hdr = ptp_parse_header(skb, ptp_type); + if (!ptp_hdr) + goto out_put_tag; + + msg_type = ptp_get_msgtype(ptp_hdr, ptp_type); + if (msg_type != PTP_Event_Message_Pdelay_Resp) + goto out_put_tag; + + correction = get_unaligned_be64(&ptp_hdr->correction); + if ((s64)correction < 0) { + struct timespec64 ts; + + /* Move ingress time stamp from PTP header's correction field to tail tag. */ + ts = ns_to_timespec64(-((s64)correction)); + tstamp_raw = ((ts.tv_sec & 3) << 30) | ts.tv_nsec; + correction = 0; + ksz9477_update_ptp_correction_field(skb, ptp_type, ptp_hdr, correction); + } + +out_put_tag: + put_unaligned_be32(tstamp_raw, skb_put(skb, KSZ9477_PTP_TAG_LEN)); +} + +ktime_t ksz9477_tstamp_to_clock(struct ksz_device *ksz, u32 tstamp, int offset_ns) +{ + struct timespec64 ptp_clock_time; + /* Split up time stamp, 2 bit seconds, 30 bit nano seconds */ + struct timespec64 ts = { + .tv_sec = tstamp >> 30, + .tv_nsec = tstamp & (BIT_MASK(30) - 1), + }; + struct timespec64 diff; + unsigned long flags; + s64 ns; + + spin_lock_irqsave(&ksz->ptp_clock_lock, flags); + ptp_clock_time = ksz->ptp_clock_time; + spin_unlock_irqrestore(&ksz->ptp_clock_lock, flags); + + /* calculate full time from partial time stamp */ + ts.tv_sec = (ptp_clock_time.tv_sec & ~3) | ts.tv_sec; + + /* find nearest possible point in time */ + diff = timespec64_sub(ts, ptp_clock_time); + if (diff.tv_sec > 2) + ts.tv_sec -= 4; + else if (diff.tv_sec < -2) + ts.tv_sec += 4; + + /* Add/remove excess delay between wire and time stamp unit */ + ns = timespec64_to_ns(&ts) + offset_ns; + + return ns_to_ktime(ns); +} +EXPORT_SYMBOL(ksz9477_tstamp_to_clock); + +static void ksz9477_rcv_timestamp(struct sk_buff *skb, u8 *tag, + struct net_device *dev, unsigned int port) +{ + struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); + enum ksz9477_ptp_event_messages msg_type; + struct dsa_switch *ds = dev->dsa_ptr->ds; + struct ksz_device *ksz = ds->priv; + struct ksz_port *prt = &ksz->ports[port]; + u8 *tstamp = tag - KSZ9477_PTP_TAG_LEN; + struct ptp_header *ptp_hdr; + unsigned int ptp_type; + u64 correction; + + /* convert time stamp and write to skb */ + memset(hwtstamps, 0, sizeof(*hwtstamps)); + hwtstamps->hwtstamp = ksz9477_tstamp_to_clock(ksz, get_unaligned_be32(tstamp), + -prt->tstamp_rx_latency_ns); + + /* For PDelay_Req messages, user space (ptp4l) expects that the hardware + * subtracts the egress time stamp from the correction field. The + * separate hw time stamp from the sk_buff struct will not be used in + * this case. + */ + if (skb_headroom(skb) < ETH_HLEN) + return; + + __skb_push(skb, ETH_HLEN); + ptp_type = ptp_classify_raw(skb); + __skb_pull(skb, ETH_HLEN); + + if (ptp_type == PTP_CLASS_NONE) + return; + + ptp_hdr = ptp_parse_header(skb, ptp_type); + if (!ptp_hdr) + return; + + msg_type = ptp_get_msgtype(ptp_hdr, ptp_type); + if (msg_type != PTP_Event_Message_Pdelay_Req) + return; + + correction = get_unaligned_be64(&ptp_hdr->correction); + correction -= hwtstamps->hwtstamp; + ksz9477_update_ptp_correction_field(skb, ptp_type, ptp_hdr, correction); +} +#else /* IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP) */ +static void ksz9477_xmit_timestamp(struct sk_buff *skb __maybe_unused) +{ +} + +static void ksz9477_rcv_timestamp(struct sk_buff *skb __maybe_unused, u8 *tag __maybe_unused, + struct net_device *dev __maybe_unused, + unsigned int port __maybe_unused) +{ +} +#endif /* IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP) */ + static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, struct net_device *dev) { @@ -116,6 +304,7 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, u16 val; /* Tag encoding */ + ksz9477_xmit_timestamp(skb); tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN); addr = skb_mac_header(skb); @@ -138,8 +327,10 @@ static struct sk_buff *ksz9477_rcv(struct sk_buff *skb, struct net_device *dev, unsigned int len = KSZ_EGRESS_TAG_LEN; /* Extra 4-bytes PTP timestamp */ - if (tag[0] & KSZ9477_PTP_TAG_INDICATION) + if (tag[0] & KSZ9477_PTP_TAG_INDICATION) { + ksz9477_rcv_timestamp(skb, tag, dev, port); len += KSZ9477_PTP_TAG_LEN; + } return ksz_common_rcv(skb, dev, port, len); } @@ -149,7 +340,7 @@ static const struct dsa_device_ops ksz9477_netdev_ops = { .proto = DSA_TAG_PROTO_KSZ9477, .xmit = ksz9477_xmit, .rcv = ksz9477_rcv, - .overhead = KSZ9477_INGRESS_TAG_LEN, + .overhead = KSZ9477_INGRESS_TAG_LEN + KSZ9477_PTP_TAG_LEN, .tail_tag = true, }; @@ -167,6 +358,7 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, u8 *tag; /* Tag encoding */ + ksz9477_xmit_timestamp(skb); tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); addr = skb_mac_header(skb); @@ -183,7 +375,7 @@ static const struct dsa_device_ops ksz9893_netdev_ops = { .proto = DSA_TAG_PROTO_KSZ9893, .xmit = ksz9893_xmit, .rcv = ksz9477_rcv, - .overhead = KSZ_INGRESS_TAG_LEN, + .overhead = KSZ_INGRESS_TAG_LEN + KSZ9477_PTP_TAG_LEN, .tail_tag = true, };
Add routines required for TX hardware time stamping. The KSZ9563 only supports one step time stamping (HWTSTAMP_TX_ONESTEP_P2P), which requires linuxptp-2.0 or later. PTP mode is permanently enabled (changes tail tag; depends on CONFIG_NET_DSA_MICROCHIP_KSZ9477_PTP).TX time stamps are reported via an interrupt / device registers whilst RX time stamps are reported via an additional tail tag. One step TX time stamping of PDelay_Resp requires the RX time stamp from the associated PDelay_Req message. linuxptp assumes that the RX time stamp has already been subtracted from the PDelay_Req correction field (as done by the TI PHYTER). linuxptp will echo back the value of the correction field in the PDelay_Resp message. In order to be compatible to this already established interface, the KSZ9563 code emulates this behavior. When processing the PDelay_Resp message, the time stamp is moved back from the correction field to the tail tag, as the hardware doesn't support negative values on this field. Of course, the UDP checksums (if any) have to be corrected after this (for both directions). The PTP hardware performs internal detection of PTP frames (likely similar as ptp_classify_raw() and ptp_parse_header()). As these filters cannot be disabled, the current delay mode (E2E/P2P) and the clock mode (master/slave) must be configured via sysfs attributes. Time stamping will only be performed on PTP packets matching the current mode settings. Everything has been tested on a Microchip KSZ9563 switch. Signed-off-by: Christian Eggers <ceggers@arri.de> --- drivers/net/dsa/microchip/ksz9477_main.c | 9 +- drivers/net/dsa/microchip/ksz9477_ptp.c | 629 +++++++++++++++++++++++ drivers/net/dsa/microchip/ksz9477_ptp.h | 27 + include/linux/dsa/ksz_common.h | 30 ++ net/dsa/tag_ksz.c | 206 +++++++- 5 files changed, 893 insertions(+), 8 deletions(-)