Message ID | 20201230154755.14746-2-pali@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: sfp: add support for GPON RTL8672/RTL9601C and Ubiquiti U-Fiber | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 115 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Wed, Dec 30, 2020 at 04:47:52PM +0100, Pali Rohár wrote: > Workaround for GPON SFP modules based on VSOL V2801F brand was added in > commit 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 > v2.0 workaround"). But it works only for ids explicitly added to the list. > As there are more rebraded VSOL V2801F modules and OEM vendors are putting > into vendor name random strings we cannot build workaround based on ids. > > Moreover issue which that commit tried to workaround is generic not only to > VSOL based modules, but rather to all GPON modules based on Realtek RTL8672 > and RTL9601C chips. > > They can be found for example in following GPON modules: > * V-SOL V2801F > * C-Data FD511GX-RM0 > * OPTON GP801R > * BAUDCOM BD-1234-SFM > * CPGOS03-0490 v2.0 > * Ubiquiti U-Fiber Instant > * EXOT EGS1 > > Those Realtek chips have broken EEPROM emulator which for N-byte read > operation returns just one byte of EEPROM data followed by N-1 zeros. > > So introduce a new function sfp_id_needs_byte_io() which detects SFP > modules with these Realtek chips which have broken EEPROM emulator based on > N-1 zeros and switch to 1 byte EEPROM reading operation which workaround > this issue. > > This patch fixes reading EEPROM content from SFP modules based on Realtek > RTL8672 and RTL9601C chips. > > Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround") > Co-developed-by: Russell King <rmk+kernel@armlinux.org.uk> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > Signed-off-by: Pali Rohár <pali@kernel.org> > --- > drivers/net/phy/sfp.c | 78 ++++++++++++++++++++++++------------------- > 1 file changed, 44 insertions(+), 34 deletions(-) > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > index 91d74c1a920a..490e78a72dd6 100644 > --- a/drivers/net/phy/sfp.c > +++ b/drivers/net/phy/sfp.c > @@ -336,19 +336,11 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, > size_t len) > { > struct i2c_msg msgs[2]; > - size_t block_size; > + u8 bus_addr = a2 ? 0x51 : 0x50; > + size_t block_size = sfp->i2c_block_size; > size_t this_len; > - u8 bus_addr; > int ret; > > - if (a2) { > - block_size = 16; > - bus_addr = 0x51; > - } else { > - block_size = sfp->i2c_block_size; > - bus_addr = 0x50; > - } > - NAK. You are undoing something that is definitely needed. The diagnostics must be read with sequential reads to be able to properly read the 16-bit values. The rest of the patch is fine; it's a shame the entire thing has been spoilt by this extra addition that was not in the patch we had been discussing off-list.
On Wednesday 30 December 2020 16:10:37 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 04:47:52PM +0100, Pali Rohár wrote: > > Workaround for GPON SFP modules based on VSOL V2801F brand was added in > > commit 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 > > v2.0 workaround"). But it works only for ids explicitly added to the list. > > As there are more rebraded VSOL V2801F modules and OEM vendors are putting > > into vendor name random strings we cannot build workaround based on ids. > > > > Moreover issue which that commit tried to workaround is generic not only to > > VSOL based modules, but rather to all GPON modules based on Realtek RTL8672 > > and RTL9601C chips. > > > > They can be found for example in following GPON modules: > > * V-SOL V2801F > > * C-Data FD511GX-RM0 > > * OPTON GP801R > > * BAUDCOM BD-1234-SFM > > * CPGOS03-0490 v2.0 > > * Ubiquiti U-Fiber Instant > > * EXOT EGS1 > > > > Those Realtek chips have broken EEPROM emulator which for N-byte read > > operation returns just one byte of EEPROM data followed by N-1 zeros. > > > > So introduce a new function sfp_id_needs_byte_io() which detects SFP > > modules with these Realtek chips which have broken EEPROM emulator based on > > N-1 zeros and switch to 1 byte EEPROM reading operation which workaround > > this issue. > > > > This patch fixes reading EEPROM content from SFP modules based on Realtek > > RTL8672 and RTL9601C chips. > > > > Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround") > > Co-developed-by: Russell King <rmk+kernel@armlinux.org.uk> > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > > Signed-off-by: Pali Rohár <pali@kernel.org> > > --- > > drivers/net/phy/sfp.c | 78 ++++++++++++++++++++++++------------------- > > 1 file changed, 44 insertions(+), 34 deletions(-) > > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > > index 91d74c1a920a..490e78a72dd6 100644 > > --- a/drivers/net/phy/sfp.c > > +++ b/drivers/net/phy/sfp.c > > @@ -336,19 +336,11 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, > > size_t len) > > { > > struct i2c_msg msgs[2]; > > - size_t block_size; > > + u8 bus_addr = a2 ? 0x51 : 0x50; > > + size_t block_size = sfp->i2c_block_size; > > size_t this_len; > > - u8 bus_addr; > > int ret; > > > > - if (a2) { > > - block_size = 16; > > - bus_addr = 0x51; > > - } else { > > - block_size = sfp->i2c_block_size; > > - bus_addr = 0x50; > > - } > > - > > NAK. You are undoing something that is definitely needed. The > diagnostics must be read with sequential reads to be able to properly > read the 16-bit values. This change is really required for those Realtek chips. I thought that it is obvious that from *both* addresses 0x50 and 0x51 can be read only one byte at the same time. Reading 2 bytes (for be16 value) cannot be really done by one i2 transfer, it must be done in two. Therefore I had to "undo" this change as it is breaking support for all those Realtek SFPs, including VSOL. That is why I also put Fixes tag into commit message as it is really fixing reading for VSOL SFP from address 0x51. I understand that you want to read __be16 val in sfp_hwmon_read_sensor() function via one i2c transfer to have "consistent" value, but it is impossible on these Realtek chips. I have already tested that sfp_hwmon_read_sensor() function see just zero on second byte when is trying to use 2-byte read via one i2c transfer. When it use two i2c transfers, one for low 8bits and one for high 8bits then reading is working. > The rest of the patch is fine; it's a shame the entire thing has > been spoilt by this extra addition that was not in the patch we had > been discussing off-list. Sorry for that. I really thought that it is obvious that this change needs to be undone and read must always use byte transfer if we detect these broken Realtek chips.
On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > This change is really required for those Realtek chips. I thought that > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > really done by one i2 transfer, it must be done in two. Then these modules are even more broken than first throught, and quite simply it is pointless supporting the diagnostics on them because we can never read the values in an atomic way. It's also a violation of the SFF-8472 that _requires_ multi-byte reads to read these 16 byte values atomically. Reading them with individual byte reads results in a non-atomic read, and the 16-bit value can not be trusted to be correct. That is really not optional, no matter what any manufacturer says - if they claim the SFP MSAs allows it, they're quite simply talking out of a donkey's backside and you should dispose of the module in biohazard packaging. :) So no, I hadn't understood this from your emails, and as I say above, if this is the case, then we quite simply disable diagnostics on these modules since they are _highly_ noncompliant.
On Wed, Dec 30, 2020 at 05:05:46PM +0000, Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > This change is really required for those Realtek chips. I thought that > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > really done by one i2 transfer, it must be done in two. > > Then these modules are even more broken than first throught, and > quite simply it is pointless supporting the diagnostics on them > because we can never read the values in an atomic way. > > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > to read these 16 byte values atomically. Reading them with individual > byte reads results in a non-atomic read, and the 16-bit value can not > be trusted to be correct. Hi Pali I have to agree with Russell here. I would rather have no diagnostics than untrustable diagnostics. The only way this is going to be accepted is if the manufacture says that reading the first byte of a word snapshots the second byte as well in an atomic way and returns that snapshot on the second read. But i highly doubt that happens, given how bad these SFPs are. Andrew
On Wednesday 30 December 2020 17:05:46 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > This change is really required for those Realtek chips. I thought that > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > really done by one i2 transfer, it must be done in two. > > Then these modules are even more broken than first throught, and > quite simply it is pointless supporting the diagnostics on them > because we can never read the values in an atomic way. They are broken in a way that neither holy water help them... But from diagnostic 0x51 address we can read at least 8bit registers in atomic way :-) > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > to read these 16 byte values atomically. Reading them with individual > byte reads results in a non-atomic read, and the 16-bit value can not > be trusted to be correct. > > That is really not optional, no matter what any manufacturer says - if > they claim the SFP MSAs allows it, they're quite simply talking out of > a donkey's backside and you should dispose of the module in biohazard > packaging. :) > > So no, I hadn't understood this from your emails, and as I say above, > if this is the case, then we quite simply disable diagnostics on these > modules since they are _highly_ noncompliant. We have just two options: Disable 2 (and more) bytes reads from 0x51 address and therefore disable sfp_hwmon_read_sensor() function. Or allow 2 bytes non-atomic reads and allow at least semi-correct values for hwmon. I guess that upper 8bits would not change between two single byte i2c transfers too much (when they are done immediately one by one). But in any case we really need to set read operation for this eeprom to one byte.
On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > On Wed, Dec 30, 2020 at 05:05:46PM +0000, Russell King - ARM Linux admin wrote: > > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > > This change is really required for those Realtek chips. I thought that > > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > > really done by one i2 transfer, it must be done in two. > > > > Then these modules are even more broken than first throught, and > > quite simply it is pointless supporting the diagnostics on them > > because we can never read the values in an atomic way. > > > > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > > to read these 16 byte values atomically. Reading them with individual > > byte reads results in a non-atomic read, and the 16-bit value can not > > be trusted to be correct. > > Hi Pali > > I have to agree with Russell here. I would rather have no diagnostics > than untrustable diagnostics. Ok! So should we completely skip hwmon_device_register_with_info() call if (i2c_block_size < 2) ? > The only way this is going to be accepted is if the manufacture says > that reading the first byte of a word snapshots the second byte as > well in an atomic way and returns that snapshot on the second > read. But i highly doubt that happens, given how bad these SFPs are. I do not think that manufacture says something. I think that they even do not know that their Realtek chips are completely broken. I can imagine that vendor just says: it is working in our branded boxes with SFP cages and if it does not work in your kernel then problem is with your custom kernel and we do not care about 3rd parties.
On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > Hi Pali > > > > I have to agree with Russell here. I would rather have no diagnostics > > than untrustable diagnostics. > > Ok! > > So should we completely skip hwmon_device_register_with_info() call > if (i2c_block_size < 2) ? I don't think that alone is sufficient - there's also the matter of ethtool -m which will dump that information as well, and we don't want to offer it to userspace in an unreliable form. For reference, here is what SFF-8472 which defines the diagnostics, says about this: To guarantee coherency of the diagnostic monitoring data, the host is required to retrieve any multi-byte fields from the diagnostic monitoring data structure (IE: Rx Power MSB - byte 104 in A2h, Rx Power LSB - byte 105 in A2h) by the use of a single two-byte read sequence across the two-wire interface interface. The transceiver is required to ensure that any multi-byte fields which are updated with diagnostic monitoring data (e.g. Rx Power MSB - byte 104 in A2h, Rx Power LSB - byte 105 in A2h) must have this update done in a fashion which guarantees coherency and consistency of the data. In other words, the update of a multi-byte field by the transceiver must not occur such that a partially updated multi-byte field can be transferred to the host. Also, the transceiver shall not update a multi-byte field within the structure during the transfer of that multi-byte field to the host, such that partially updated data would be transferred to the host. The first paragraph is extremely definitive in how these fields shall be read atomically - by a _single_ two-byte read sequence. From what you are telling us, these modules do not support that. Therefore, by definition, they do *not* support proper and reliable reporting of diagnostic data, and are non-conformant with the SFP MSAs. So, they are basically broken, and the diagnostics can't be used to retrieve data that can be said to be useful. > I do not think that manufacture says something. I think that they even > do not know that their Realtek chips are completely broken. Oh, they do know. I had a response from CarlitoxxPro passed to me, which was: That is a behavior related on how your router/switch try to read the EEPROM, as described in the datasheet of the GPON ONU SFP, the EEPROM can be read in Sequential Single-Byte mode, not in Multi-byte mode as you router do, basically, your router is trying to read the full a0h table in a single call, and retrieve a null response. that is normal and not affect the operations of the GPON ONU SFP, because these values are informational only. so the Software for your router should be able to read in Single-Byte mode to read the content of the EEPROM in concordance to SFF-8431 which totally misses the point that it is /not/ up to the module to choose whether multi-byte reads are supported or not. If they bothered to gain a proper understanding of the MSAs, they would have noticed that the device on 0xA0 is required to behave as an Atmel AT24Cxx EEPROM. The following from INF-8074i, which is the very first definition of the SFP form factor modules: The SFP serial ID provides access to sophisticated identification information that describes the transceiver's capabilities, standard interfaces, manufacturer, and other information. The serial interface uses the 2-wire serial CMOS E2PROM protocol defined for the ATMEL AT24C01A/02/04 family of components. As they took less than one working day to provide the above response, I suspect they know full well that there's a problem - and it likely affects other "routers" as well. They're also confused about their SFF specifications. SFF-8431 is: "SFP+ 10 Gb/s and Low Speed Electrical Interface" which is not the correct specification for a 1Gbps module. > I can imagine that vendor just says: it is working in our branded boxes > with SFP cages and if it does not work in your kernel then problem is > with your custom kernel and we do not care about 3rd parties. Which shows why it's pointless producing an EEPROM validation tool that runs under Linux (as has been your suggestion). They won't use it, since their testing only goes as far as "does it work in our product?"
On Wed, Dec 30, 2020 at 06:31:52PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 17:05:46 Russell King - ARM Linux admin wrote: > > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > > This change is really required for those Realtek chips. I thought that > > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > > really done by one i2 transfer, it must be done in two. > > > > Then these modules are even more broken than first throught, and > > quite simply it is pointless supporting the diagnostics on them > > because we can never read the values in an atomic way. > > They are broken in a way that neither holy water help them... > > But from diagnostic 0x51 address we can read at least 8bit registers in > atomic way :-) ... which doesn't fit the requirements. > > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > > to read these 16 byte values atomically. Reading them with individual > > byte reads results in a non-atomic read, and the 16-bit value can not > > be trusted to be correct. > > > > That is really not optional, no matter what any manufacturer says - if > > they claim the SFP MSAs allows it, they're quite simply talking out of > > a donkey's backside and you should dispose of the module in biohazard > > packaging. :) > > > > So no, I hadn't understood this from your emails, and as I say above, > > if this is the case, then we quite simply disable diagnostics on these > > modules since they are _highly_ noncompliant. > > We have just two options: > > Disable 2 (and more) bytes reads from 0x51 address and therefore disable > sfp_hwmon_read_sensor() function. > > Or allow 2 bytes non-atomic reads and allow at least semi-correct values > for hwmon. I guess that upper 8bits would not change between two single > byte i2c transfers too much (when they are done immediately one by one). So when you read the temperature, and the MSB reads as the next higher value than the LSB, causing an error of 256, or vice versa causing an error of -256, which when scaled according to the factors causes a big error, that's acceptable. No, it isn't. If the data can't be read reliably, the data is useless. Consider a system that implements userspace monitoring for modules and checks the current values against pre-set thresholds - it suddenly gets a value that is outside of its alarm threshold due to this. It raises a false alarm. This is not good.
> Ok! > > So should we completely skip hwmon_device_register_with_info() call > if (i2c_block_size < 2) ? Yep. That would be a nice simple test. But does ethtool -m still export the second page? That is also unreliable. Andrew
On Wed, Dec 30, 2020 at 08:30:52PM +0100, Andrew Lunn wrote: > > Ok! > > > > So should we completely skip hwmon_device_register_with_info() call > > if (i2c_block_size < 2) ? > > Yep. That would be a nice simple test. > > But does ethtool -m still export the second page? That is also > unreliable. It will do, but we need to check how ethtool decides to request/dump that information - we probably need to make sfp_module_info() report that it's a ETH_MODULE_SFF_8079 style EEPROM, not the ETH_MODULE_SFF_8472 style.
> Which shows why it's pointless producing an EEPROM validation tool that > runs under Linux (as has been your suggestion). They won't use it, > since their testing only goes as far as "does it work in our product?" Yes, i would need SNIA to push for conformance testing, hold plug-testing events, and marketing that only devices which pass the conformance test are allowed to use the SNIA logo, etc. They do seem to have conformance testing for some of their storage standards, but nothing for SFPs. Andrew
On Wednesday 30 December 2020 19:09:58 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > > Hi Pali > > > > > > I have to agree with Russell here. I would rather have no diagnostics > > > than untrustable diagnostics. > > > > Ok! > > > > So should we completely skip hwmon_device_register_with_info() call > > if (i2c_block_size < 2) ? > > I don't think that alone is sufficient - there's also the matter of > ethtool -m which will dump that information as well, and we don't want > to offer it to userspace in an unreliable form. Any idea/preference how to disable access to these registers? > For reference, here is what SFF-8472 which defines the diagnostics, says > about this: > > To guarantee coherency of the diagnostic monitoring data, the host is > required to retrieve any multi-byte fields from the diagnostic > monitoring data structure (IE: Rx Power MSB - byte 104 in A2h, Rx Power > LSB - byte 105 in A2h) by the use of a single two-byte read sequence > across the two-wire interface interface. > > The transceiver is required to ensure that any multi-byte fields which > are updated with diagnostic monitoring data (e.g. Rx Power MSB - byte > 104 in A2h, Rx Power LSB - byte 105 in A2h) must have this update done > in a fashion which guarantees coherency and consistency of the data. In > other words, the update of a multi-byte field by the transceiver must > not occur such that a partially updated multi-byte field can be > transferred to the host. Also, the transceiver shall not update a > multi-byte field within the structure during the transfer of that > multi-byte field to the host, such that partially updated data would be > transferred to the host. > > The first paragraph is extremely definitive in how these fields shall > be read atomically - by a _single_ two-byte read sequence. From what > you are telling us, these modules do not support that. Therefore, by > definition, they do *not* support proper and reliable reporting of > diagnostic data, and are non-conformant with the SFP MSAs. > > So, they are basically broken, and the diagnostics can't be used to > retrieve data that can be said to be useful. I agree they are broken. We really should disable access to those 16bit registers. Anyway here is "datasheet" to some CarlitoxxPro SFP: https://www.docdroid.net/hRsJ560/cpgos03-0490v2-datasheet-10-pdf And on page 10 is written: The I2C system can support the mode of random address / single byteread which conform to SFF-8431. Which seems to be wrong. > > I do not think that manufacture says something. I think that they even > > do not know that their Realtek chips are completely broken. > > Oh, they do know. I had a response from CarlitoxxPro passed to me, which > was: Could you give me contact address? Anyway, we would rather should contact Realtek chip division, real manufacture. Not CarlitoxxPro rebrander which can just "buy product" from Realtek reseller and put its logo on stick (and maybe configuration like sn, mac address, password and enable/disable bit for web/telnet). > That is a behavior related on how your router/switch try to read the > EEPROM, as described in the datasheet of the GPON ONU SFP, the EEPROM > can be read in Sequential Single-Byte mode, not in Multi-byte mode as > you router do, basically, your router is trying to read the full a0h > table in a single call, and retrieve a null response. that is normal > and not affect the operations of the GPON ONU SFP, because these > values are informational only. so the Software for your router should > be able to read in Single-Byte mode to read the content of the EEPROM > in concordance to SFF-8431 > > which totally misses the point that it is /not/ up to the module to > choose whether multi-byte reads are supported or not. If they bothered > to gain a proper understanding of the MSAs, they would have noticed that > the device on 0xA0 is required to behave as an Atmel AT24Cxx EEPROM. > The following from INF-8074i, which is the very first definition of the > SFP form factor modules: > > The SFP serial ID provides access to sophisticated identification > information that describes the transceiver's capabilities, standard > interfaces, manufacturer, and other information. The serial interface > uses the 2-wire serial CMOS E2PROM protocol defined for the ATMEL > AT24C01A/02/04 family of components. > > As they took less than one working day to provide the above response, I > suspect they know full well that there's a problem - and it likely > affects other "routers" as well. As this issue is with all those Realtek chips I really think this issue is in underlying Realtek chip and not in CarlitoxxPro rebranding. Seems that they know about this issue and because it affects all GPON Realtek SFPs with that chip that cannot do anything with it, so just wrote it into "datasheet" and trying to find arguments "why behavior is correct" even it is not truth. > They're also confused about their SFF specifications. SFF-8431 is: "SFP+ > 10 Gb/s and Low Speed Electrical Interface" which is not the correct > specification for a 1Gbps module. Looks like "trying to find arguments why it is correct". > > I can imagine that vendor just says: it is working in our branded boxes > > with SFP cages and if it does not work in your kernel then problem is > > with your custom kernel and we do not care about 3rd parties. > > Which shows why it's pointless producing an EEPROM validation tool that > runs under Linux (as has been your suggestion). They won't use it, > since their testing only goes as far as "does it work in our product?" At least users could do their own validation and for rewritable EEPROMs they could be able to fix its content. Anyway, if there is such tool then users could start creating database of working and non-working SFPs where can be put result of this tool. It could help people to decide which SFP they should buy and which not. Sending page/database to manufactures with showing "do not buy this your SFP, does not work and is broken" maybe change their state and start doing validation if people stop buying their products or manufacture name would be on unsupported black list.
On Thu, Dec 31, 2020 at 01:14:10PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 19:09:58 Russell King - ARM Linux admin wrote: > > On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > > > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > > > Hi Pali > > > > > > > > I have to agree with Russell here. I would rather have no diagnostics > > > > than untrustable diagnostics. > > > > > > Ok! > > > > > > So should we completely skip hwmon_device_register_with_info() call > > > if (i2c_block_size < 2) ? > > > > I don't think that alone is sufficient - there's also the matter of > > ethtool -m which will dump that information as well, and we don't want > > to offer it to userspace in an unreliable form. > > Any idea/preference how to disable access to these registers? > > > For reference, here is what SFF-8472 which defines the diagnostics, says > > about this: > > > > To guarantee coherency of the diagnostic monitoring data, the host is > > required to retrieve any multi-byte fields from the diagnostic > > monitoring data structure (IE: Rx Power MSB - byte 104 in A2h, Rx Power > > LSB - byte 105 in A2h) by the use of a single two-byte read sequence > > across the two-wire interface interface. > > > > The transceiver is required to ensure that any multi-byte fields which > > are updated with diagnostic monitoring data (e.g. Rx Power MSB - byte > > 104 in A2h, Rx Power LSB - byte 105 in A2h) must have this update done > > in a fashion which guarantees coherency and consistency of the data. In > > other words, the update of a multi-byte field by the transceiver must > > not occur such that a partially updated multi-byte field can be > > transferred to the host. Also, the transceiver shall not update a > > multi-byte field within the structure during the transfer of that > > multi-byte field to the host, such that partially updated data would be > > transferred to the host. > > > > The first paragraph is extremely definitive in how these fields shall > > be read atomically - by a _single_ two-byte read sequence. From what > > you are telling us, these modules do not support that. Therefore, by > > definition, they do *not* support proper and reliable reporting of > > diagnostic data, and are non-conformant with the SFP MSAs. > > > > So, they are basically broken, and the diagnostics can't be used to > > retrieve data that can be said to be useful. > > I agree they are broken. We really should disable access to those 16bit > registers. > > Anyway here is "datasheet" to some CarlitoxxPro SFP: > https://www.docdroid.net/hRsJ560/cpgos03-0490v2-datasheet-10-pdf > > And on page 10 is written: > > The I2C system can support the mode of random address / single > byteread which conform to SFF-8431. > > Which seems to be wrong. Searching around, i found: http://read.pudn.com/downloads776/doc/3073304/RTL9601B-CG_Datasheet.pdf It has two i2c busses, a master and a slave. The master bus can do multi-byte transfers. The slave bus description says nothing in words about multi-byte transfers, but the diagram shows only single byte transfers. So any SFP based around this device is broken. The silly thing is, it is reading/writing from a shadow SRAM. The CPU is not directly involved in an I2C transaction. So it could easily read multiple bytes from the SRAM and return them. But it would still need a synchronisation method to handle writes from the CPU to the SRAM, in order to make these word values safe. Andrew
On Thu, Dec 31, 2020 at 01:14:10PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 19:09:58 Russell King - ARM Linux admin wrote: > > On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > > > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > > > Hi Pali > > > > > > > > I have to agree with Russell here. I would rather have no diagnostics > > > > than untrustable diagnostics. > > > > > > Ok! > > > > > > So should we completely skip hwmon_device_register_with_info() call > > > if (i2c_block_size < 2) ? > > > > I don't think that alone is sufficient - there's also the matter of > > ethtool -m which will dump that information as well, and we don't want > > to offer it to userspace in an unreliable form. > > Any idea/preference how to disable access to these registers? Page A0, byte 92: "Diagnostic Monitoring Type" is a 1 byte field with 8 single bit indicators describing how diagnostic monitoring is implemented in the particular transceiver. Note that if bit 6, address 92 is set indicating that digital diagnostic monitoring has been implemented, received power monitoring, transmitted power monitoring, bias current monitoring, supply voltage monitoring and temperature monitoring must all be implemented. Additionally, alarm and warning thresholds must be written as specified in this document at locations 00 to 55 on two-wire serial address 1010001X (A2h) (see Table 8-5). Unfortunately, we cannot simply set sfp->id.ext.diagmon to false, because it can also be used to indicate power, software reading of TX_DISABLE, LOS, etc. These are all single bytes, so could be returned correctly, assuming they have been implemented according to the spec. Looking at sfp_module_info(), adding a check for i2c_block_size < 2 when determining what length to return. ethtool should do the right thing, know that the second page has not been returned to user space. Andrew
On Thursday 31 December 2020 16:09:25 Andrew Lunn wrote: > On Thu, Dec 31, 2020 at 01:14:10PM +0100, Pali Rohár wrote: > > On Wednesday 30 December 2020 19:09:58 Russell King - ARM Linux admin wrote: > > > On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > > > > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > > > > Hi Pali > > > > > > > > > > I have to agree with Russell here. I would rather have no diagnostics > > > > > than untrustable diagnostics. > > > > > > > > Ok! > > > > > > > > So should we completely skip hwmon_device_register_with_info() call > > > > if (i2c_block_size < 2) ? > > > > > > I don't think that alone is sufficient - there's also the matter of > > > ethtool -m which will dump that information as well, and we don't want > > > to offer it to userspace in an unreliable form. > > > > Any idea/preference how to disable access to these registers? > > > > > For reference, here is what SFF-8472 which defines the diagnostics, says > > > about this: > > > > > > To guarantee coherency of the diagnostic monitoring data, the host is > > > required to retrieve any multi-byte fields from the diagnostic > > > monitoring data structure (IE: Rx Power MSB - byte 104 in A2h, Rx Power > > > LSB - byte 105 in A2h) by the use of a single two-byte read sequence > > > across the two-wire interface interface. > > > > > > The transceiver is required to ensure that any multi-byte fields which > > > are updated with diagnostic monitoring data (e.g. Rx Power MSB - byte > > > 104 in A2h, Rx Power LSB - byte 105 in A2h) must have this update done > > > in a fashion which guarantees coherency and consistency of the data. In > > > other words, the update of a multi-byte field by the transceiver must > > > not occur such that a partially updated multi-byte field can be > > > transferred to the host. Also, the transceiver shall not update a > > > multi-byte field within the structure during the transfer of that > > > multi-byte field to the host, such that partially updated data would be > > > transferred to the host. > > > > > > The first paragraph is extremely definitive in how these fields shall > > > be read atomically - by a _single_ two-byte read sequence. From what > > > you are telling us, these modules do not support that. Therefore, by > > > definition, they do *not* support proper and reliable reporting of > > > diagnostic data, and are non-conformant with the SFP MSAs. > > > > > > So, they are basically broken, and the diagnostics can't be used to > > > retrieve data that can be said to be useful. > > > > I agree they are broken. We really should disable access to those 16bit > > registers. > > > > Anyway here is "datasheet" to some CarlitoxxPro SFP: > > https://www.docdroid.net/hRsJ560/cpgos03-0490v2-datasheet-10-pdf > > > > And on page 10 is written: > > > > The I2C system can support the mode of random address / single > > byteread which conform to SFF-8431. > > > > Which seems to be wrong. > > Searching around, i found: > > http://read.pudn.com/downloads776/doc/3073304/RTL9601B-CG_Datasheet.pdf > > It has two i2c busses, a master and a slave. The master bus can do > multi-byte transfers. The slave bus description says nothing in words > about multi-byte transfers, but the diagram shows only single byte > transfers. Yes. Only i2c slave is used for communication with external entity and diagrams clearly shows that only single byte i2c transfers are supported. > So any SFP based around this device is broken. Exactly. That is why I send this patch in a way that try to detect these RTL chips and not particular vendors who create product on top of this chip. All SFP modules with this RTL9601B chip are broken and cannot be fixed. Re-branders and OEM vendors like CarlitoxxPro or UBNT should stop saying that they are compliant to SFP/SFF standards, because based on above descriptions it is not truth. > The silly thing is, it is reading/writing from a shadow SRAM. The CPU > is not directly involved in an I2C transaction. So it could easily > read multiple bytes from the SRAM and return them. But it would still > need a synchronisation method to handle writes from the CPU to the > SRAM, in order to make these word values safe. But there is a still issue how to read these values from SRAM outside of SFP module via i2c. And with current one-byte transfers of that i2c slave on RTL9601B it is impossible via current SFP diagnostic API specification.
On Thursday 31 December 2020 16:30:33 Andrew Lunn wrote: > On Thu, Dec 31, 2020 at 01:14:10PM +0100, Pali Rohár wrote: > > On Wednesday 30 December 2020 19:09:58 Russell King - ARM Linux admin wrote: > > > On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > > > > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > > > > Hi Pali > > > > > > > > > > I have to agree with Russell here. I would rather have no diagnostics > > > > > than untrustable diagnostics. > > > > > > > > Ok! > > > > > > > > So should we completely skip hwmon_device_register_with_info() call > > > > if (i2c_block_size < 2) ? > > > > > > I don't think that alone is sufficient - there's also the matter of > > > ethtool -m which will dump that information as well, and we don't want > > > to offer it to userspace in an unreliable form. > > > > Any idea/preference how to disable access to these registers? > > Page A0, byte 92: > > "Diagnostic Monitoring Type" is a 1 byte field with 8 single bit > indicators describing how diagnostic monitoring is implemented in the > particular transceiver. > > Note that if bit 6, address 92 is set indicating that digital > diagnostic monitoring has been implemented, received power > monitoring, transmitted power monitoring, bias current monitoring, > supply voltage monitoring and temperature monitoring must all be > implemented. Additionally, alarm and warning thresholds must be > written as specified in this document at locations 00 to 55 on > two-wire serial address 1010001X (A2h) (see Table 8-5). > > Unfortunately, we cannot simply set sfp->id.ext.diagmon to false, > because it can also be used to indicate power, software reading of > TX_DISABLE, LOS, etc. These are all single bytes, so could be returned > correctly, assuming they have been implemented according to the spec. > > Looking at sfp_module_info(), adding a check for i2c_block_size < 2 > when determining what length to return. ethtool should do the right > thing, know that the second page has not been returned to user space. But if we limit length of eeprom then userspace would not be able to access those TX_DISABLE, LOS and other bits from byte 110 at address A2. Problematic two-byte values are in byte range 96-109 at address A2. Therefore before byte 110.
> > Looking at sfp_module_info(), adding a check for i2c_block_size < 2 > > when determining what length to return. ethtool should do the right > > thing, know that the second page has not been returned to user space. > > But if we limit length of eeprom then userspace would not be able to > access those TX_DISABLE, LOS and other bits from byte 110 at address A2. Have you tested these bits to see if they actually work? If they don't work... Andrew
On Thursday 31 December 2020 18:13:38 Andrew Lunn wrote: > > > Looking at sfp_module_info(), adding a check for i2c_block_size < 2 > > > when determining what length to return. ethtool should do the right > > > thing, know that the second page has not been returned to user space. > > > > But if we limit length of eeprom then userspace would not be able to > > access those TX_DISABLE, LOS and other bits from byte 110 at address A2. > > Have you tested these bits to see if they actually work? If they don't > work... On Ubiquiti module that LOS bit does not work. I think that on CarlitoxxPro module LOS bit worked. But I cannot test it right now as I do not have access to testing OLT unit. Adding Thomas to loop. Can you check if CarlitoxxPro GPON ONT module supports LOS or other bits at byte offset 110 at address A2?
Hi Pali, I have a CarlitoxxPro module and I reported an issue about RX_LOS pin to the manufacturer. It looks to me that the module asserts "inverted LOS" through EEPROM but does not implement it. Consequently, the SFP state machine of my host router stays in check los state and link is not set up for the host interface. Below is a dump of the module's EEPROM: [root@clearfog-gt-8k ~]# ethtool -m eth0 Identifier : 0x03 (SFP) Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID) Connector : 0x01 (SC) Transceiver codes : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Encoding : 0x03 (NRZ) BR, Nominal : 1200MBd Rate identifier : 0x00 (unspecified) Length (SMF,km) : 20km Length (SMF) : 20000m Length (50um) : 0m Length (62.5um) : 0m Length (Copper) : 0m Length (OM3) : 0m Laser wavelength : 1310nm Vendor name : VSOL Vendor OUI : 00:00:00 Vendor PN : V2801F Vendor rev : 1.0 Option values : 0x00 0x1c Option : RX_LOS implemented, inverted Option : TX_FAULT implemented Option : TX_DISABLE implemented BR margin, max : 0% BR margin, min : 0% Vendor SN : CP202003180377 Date code : 200408 Optical diagnostics support : Yes Laser bias current : 0.000 mA Laser output power : 0.0000 mW / -inf dBm Receiver signal average optical power : 0.0000 mW / -inf dBm Module temperature : 31.00 degrees C / 87.80 degrees F Module voltage : 0.0000 V Alarm/warning flags implemented : Yes Laser bias current high alarm : Off Laser bias current low alarm : On Laser bias current high warning : Off Laser bias current low warning : Off Laser output power high alarm : Off Laser output power low alarm : On Laser output power high warning : Off Laser output power low warning : Off Module temperature high alarm : Off Module temperature low alarm : Off Module temperature high warning : Off Module temperature low warning : Off Module voltage high alarm : Off Module voltage low alarm : Off Module voltage high warning : Off Module voltage low warning : Off Laser rx power high alarm : Off Laser rx power low alarm : Off Laser rx power high warning : Off Laser rx power low warning : Off Laser bias current high alarm threshold : 74.752 mA Laser bias current low alarm threshold : 0.000 mA Laser bias current high warning threshold : 0.000 mA Laser bias current low warning threshold : 0.000 mA Laser output power high alarm threshold : 0.0000 mW / -inf dBm Laser output power low alarm threshold : 0.0000 mW / -inf dBm Laser output power high warning threshold : 0.0000 mW / -inf dBm Laser output power low warning threshold : 0.0000 mW / -inf dBm Module temperature high alarm threshold : 90.00 degrees C / 194.00 degrees F Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F Module voltage high alarm threshold : 0.0000 V Module voltage low alarm threshold : 0.0000 V Module voltage high warning threshold : 0.0000 V Module voltage low warning threshold : 0.0000 V Laser rx power high alarm threshold : 0.1536 mW / -8.14 dBm Laser rx power low alarm threshold : 0.0000 mW / -inf dBm Laser rx power high warning threshold : 0.0000 mW / -inf dBm Laser rx power low warning threshold : 0.0000 mW / -inf dBm Le sam. 2 janv. 2021 à 02:49, Pali Rohár <pali@kernel.org> a écrit : > > On Thursday 31 December 2020 18:13:38 Andrew Lunn wrote: > > > > Looking at sfp_module_info(), adding a check for i2c_block_size < 2 > > > > when determining what length to return. ethtool should do the right > > > > thing, know that the second page has not been returned to user space. > > > > > > But if we limit length of eeprom then userspace would not be able to > > > access those TX_DISABLE, LOS and other bits from byte 110 at address A2. > > > > Have you tested these bits to see if they actually work? If they don't > > work... > > On Ubiquiti module that LOS bit does not work. > > I think that on CarlitoxxPro module LOS bit worked. But I cannot test it > right now as I do not have access to testing OLT unit. > > Adding Thomas to loop. Can you check if CarlitoxxPro GPON ONT module > supports LOS or other bits at byte offset 110 at address A2?
Hello! On Sunday 03 January 2021 03:25:23 Thomas Schreiber wrote: > Hi Pali, > I have a CarlitoxxPro module and I reported an issue about RX_LOS pin > to the manufacturer. > It looks to me that the module asserts "inverted LOS" through EEPROM > but does not implement it. So, it is broken :-( But I'm not surprised. Anyway, I think you could be interested in this discussion about my patch series, but I forgot to CC you on the first patch/cover letter. You can read whole discussion on public archive available at: https://lore.kernel.org/netdev/20201230154755.14746-1-pali@kernel.org/t/#u If you have any comments, let me know so I can fix it for V2. Those RTL8672/RTL9601C SFP are extremely broken and I do not think that "rebrander" CarlitoxxPro would do anything with it. > Consequently, the SFP state machine of my > host router stays in check los state and link is not set up for the > host interface. So link does not work at all? > Below is a dump of the module's EEPROM: > > [root@clearfog-gt-8k ~]# ethtool -m eth0 > Identifier : 0x03 (SFP) > Extended identifier : 0x04 (GBIC/SFP defined by > 2-wire interface ID) > Connector : 0x01 (SC) > Transceiver codes : 0x00 0x00 0x00 0x00 0x00 > 0x00 0x00 0x00 0x00 > Encoding : 0x03 (NRZ) > BR, Nominal : 1200MBd > Rate identifier : 0x00 (unspecified) > Length (SMF,km) : 20km > Length (SMF) : 20000m > Length (50um) : 0m > Length (62.5um) : 0m > Length (Copper) : 0m > Length (OM3) : 0m > Laser wavelength : 1310nm > Vendor name : VSOL > Vendor OUI : 00:00:00 > Vendor PN : V2801F > Vendor rev : 1.0 > Option values : 0x00 0x1c > Option : RX_LOS implemented, inverted > Option : TX_FAULT implemented > Option : TX_DISABLE implemented > BR margin, max : 0% > BR margin, min : 0% > Vendor SN : CP202003180377 > Date code : 200408 > Optical diagnostics support : Yes > Laser bias current : 0.000 mA > Laser output power : 0.0000 mW / -inf dBm > Receiver signal average optical power : 0.0000 mW / -inf dBm > Module temperature : 31.00 degrees C / 87.80 degrees F > Module voltage : 0.0000 V > Alarm/warning flags implemented : Yes > Laser bias current high alarm : Off > Laser bias current low alarm : On > Laser bias current high warning : Off > Laser bias current low warning : Off > Laser output power high alarm : Off > Laser output power low alarm : On > Laser output power high warning : Off > Laser output power low warning : Off > Module temperature high alarm : Off > Module temperature low alarm : Off > Module temperature high warning : Off > Module temperature low warning : Off > Module voltage high alarm : Off > Module voltage low alarm : Off > Module voltage high warning : Off > Module voltage low warning : Off > Laser rx power high alarm : Off > Laser rx power low alarm : Off > Laser rx power high warning : Off > Laser rx power low warning : Off > Laser bias current high alarm threshold : 74.752 mA > Laser bias current low alarm threshold : 0.000 mA > Laser bias current high warning threshold : 0.000 mA > Laser bias current low warning threshold : 0.000 mA > Laser output power high alarm threshold : 0.0000 mW / -inf dBm > Laser output power low alarm threshold : 0.0000 mW / -inf dBm > Laser output power high warning threshold : 0.0000 mW / -inf dBm > Laser output power low warning threshold : 0.0000 mW / -inf dBm > Module temperature high alarm threshold : 90.00 degrees C / 194.00 degrees F > Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F > Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F > Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F > Module voltage high alarm threshold : 0.0000 V > Module voltage low alarm threshold : 0.0000 V > Module voltage high warning threshold : 0.0000 V > Module voltage low warning threshold : 0.0000 V > Laser rx power high alarm threshold : 0.1536 mW / -8.14 dBm > Laser rx power low alarm threshold : 0.0000 mW / -inf dBm > Laser rx power high warning threshold : 0.0000 mW / -inf dBm > Laser rx power low warning threshold : 0.0000 mW / -inf dBm > > > Le sam. 2 janv. 2021 à 02:49, Pali Rohár <pali@kernel.org> a écrit : > > > > On Thursday 31 December 2020 18:13:38 Andrew Lunn wrote: > > > > > Looking at sfp_module_info(), adding a check for i2c_block_size < 2 > > > > > when determining what length to return. ethtool should do the right > > > > > thing, know that the second page has not been returned to user space. > > > > > > > > But if we limit length of eeprom then userspace would not be able to > > > > access those TX_DISABLE, LOS and other bits from byte 110 at address A2. > > > > > > Have you tested these bits to see if they actually work? If they don't > > > work... > > > > On Ubiquiti module that LOS bit does not work. > > > > I think that on CarlitoxxPro module LOS bit worked. But I cannot test it > > right now as I do not have access to testing OLT unit. > > > > Adding Thomas to loop. Can you check if CarlitoxxPro GPON ONT module > > supports LOS or other bits at byte offset 110 at address A2?
On Sunday 03 January 2021 03:41:32 Pali Rohár wrote: > Hello! > > On Sunday 03 January 2021 03:25:23 Thomas Schreiber wrote: > > Hi Pali, > > I have a CarlitoxxPro module and I reported an issue about RX_LOS pin > > to the manufacturer. > > It looks to me that the module asserts "inverted LOS" through EEPROM > > but does not implement it. > > So, it is broken :-( But I'm not surprised. > > Anyway, I think you could be interested in this discussion about my > patch series, but I forgot to CC you on the first patch/cover letter. > You can read whole discussion on public archive available at: > > https://lore.kernel.org/netdev/20201230154755.14746-1-pali@kernel.org/t/#u > > If you have any comments, let me know so I can fix it for V2. > > Those RTL8672/RTL9601C SFP are extremely broken and I do not think that > "rebrander" CarlitoxxPro would do anything with it. > > > Consequently, the SFP state machine of my > > host router stays in check los state and link is not set up for the > > host interface. > > So link does not work at all? > > > Below is a dump of the module's EEPROM: > > > > [root@clearfog-gt-8k ~]# ethtool -m eth0 > > Identifier : 0x03 (SFP) > > Extended identifier : 0x04 (GBIC/SFP defined by > > 2-wire interface ID) > > Connector : 0x01 (SC) > > Transceiver codes : 0x00 0x00 0x00 0x00 0x00 > > 0x00 0x00 0x00 0x00 > > Encoding : 0x03 (NRZ) > > BR, Nominal : 1200MBd > > Rate identifier : 0x00 (unspecified) > > Length (SMF,km) : 20km > > Length (SMF) : 20000m > > Length (50um) : 0m > > Length (62.5um) : 0m > > Length (Copper) : 0m > > Length (OM3) : 0m > > Laser wavelength : 1310nm > > Vendor name : VSOL > > Vendor OUI : 00:00:00 > > Vendor PN : V2801F > > Vendor rev : 1.0 > > Option values : 0x00 0x1c > > Option : RX_LOS implemented, inverted > > Option : TX_FAULT implemented > > Option : TX_DISABLE implemented > > BR margin, max : 0% > > BR margin, min : 0% > > Vendor SN : CP202003180377 > > Date code : 200408 > > Optical diagnostics support : Yes > > Laser bias current : 0.000 mA > > Laser output power : 0.0000 mW / -inf dBm > > Receiver signal average optical power : 0.0000 mW / -inf dBm > > Module temperature : 31.00 degrees C / 87.80 degrees F > > Module voltage : 0.0000 V > > Alarm/warning flags implemented : Yes > > Laser bias current high alarm : Off > > Laser bias current low alarm : On > > Laser bias current high warning : Off > > Laser bias current low warning : Off > > Laser output power high alarm : Off > > Laser output power low alarm : On > > Laser output power high warning : Off > > Laser output power low warning : Off > > Module temperature high alarm : Off > > Module temperature low alarm : Off > > Module temperature high warning : Off > > Module temperature low warning : Off > > Module voltage high alarm : Off > > Module voltage low alarm : Off > > Module voltage high warning : Off > > Module voltage low warning : Off > > Laser rx power high alarm : Off > > Laser rx power low alarm : Off > > Laser rx power high warning : Off > > Laser rx power low warning : Off > > Laser bias current high alarm threshold : 74.752 mA > > Laser bias current low alarm threshold : 0.000 mA > > Laser bias current high warning threshold : 0.000 mA > > Laser bias current low warning threshold : 0.000 mA > > Laser output power high alarm threshold : 0.0000 mW / -inf dBm > > Laser output power low alarm threshold : 0.0000 mW / -inf dBm > > Laser output power high warning threshold : 0.0000 mW / -inf dBm > > Laser output power low warning threshold : 0.0000 mW / -inf dBm > > Module temperature high alarm threshold : 90.00 degrees C / 194.00 degrees F > > Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F > > Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F > > Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F > > Module voltage high alarm threshold : 0.0000 V > > Module voltage low alarm threshold : 0.0000 V > > Module voltage high warning threshold : 0.0000 V > > Module voltage low warning threshold : 0.0000 V > > Laser rx power high alarm threshold : 0.1536 mW / -8.14 dBm > > Laser rx power low alarm threshold : 0.0000 mW / -inf dBm > > Laser rx power high warning threshold : 0.0000 mW / -inf dBm > > Laser rx power low warning threshold : 0.0000 mW / -inf dBm > > > > > > Le sam. 2 janv. 2021 à 02:49, Pali Rohár <pali@kernel.org> a écrit : > > > > > > On Thursday 31 December 2020 18:13:38 Andrew Lunn wrote: > > > > > > Looking at sfp_module_info(), adding a check for i2c_block_size < 2 > > > > > > when determining what length to return. ethtool should do the right > > > > > > thing, know that the second page has not been returned to user space. > > > > > > > > > > But if we limit length of eeprom then userspace would not be able to > > > > > access those TX_DISABLE, LOS and other bits from byte 110 at address A2. > > > > > > > > Have you tested these bits to see if they actually work? If they don't > > > > work... > > > > > > On Ubiquiti module that LOS bit does not work. > > > > > > I think that on CarlitoxxPro module LOS bit worked. But I cannot test it > > > right now as I do not have access to testing OLT unit. On my tested CarlitoxxPro module is: Option values : 0x00 0x1c Option : RX_LOS implemented, inverted Option : TX_FAULT implemented Option : TX_DISABLE implemented When cable is disconnected then in EEPROM at position 0x16e is value 0x82. If I call 'ip link set eth1 up' then value changes to 0x02, module itself has a link and I can connect to its internal telnet/webserver to configure it. When cable is connected but connection is not established by OLT then this value is 0x80. If I call 'ip link set eth1 up' then value changes to 0x00 and kernel does not see a link (no carrier). So it seems that RX_LOS (bit 1 of 0x16e EEPROM) and also TX_DISABLE (bit 7 of 0x16e EEPROM) is implemented and working properly. And therefore we should allow access to these bits. I also tested UBNT module and result is: Option values : 0x00 0x06 Option : RX_LOS implemented Option : RX_LOS implemented, inverted Which means that those bits are not implemented. Anyway I check position 0x16e and value on its value is randomly either 0x79 or 0xff independently of the state of the GPON module. So it is really not implemented on UBNT. > > > Adding Thomas to loop. Can you check if CarlitoxxPro GPON ONT module > > > supports LOS or other bits at byte offset 110 at address A2?
On Wed, Jan 06, 2021 at 03:55:32PM +0100, Pali Rohár wrote: > On my tested CarlitoxxPro module is: > > Option values : 0x00 0x1c > Option : RX_LOS implemented, inverted > Option : TX_FAULT implemented > Option : TX_DISABLE implemented > > When cable is disconnected then in EEPROM at position 0x16e is value > 0x82. If I call 'ip link set eth1 up' then value changes to 0x02, module > itself has a link and I can connect to its internal telnet/webserver to > configure it. Bit 7 reflects the TX_DISABLE pin state. Bit 1 reflects the RX_LOS pin state. It isn't specified whether the inverted/non-inverted state is reflected in bit 1 or not - the definition just says that bit 1 is "Digital state of the RX_LOS Output Pin." > I also tested UBNT module and result is: > > Option values : 0x00 0x06 > Option : RX_LOS implemented > Option : RX_LOS implemented, inverted > > Which means that those bits are not implemented. > > Anyway I check position 0x16e and value on its value is randomly either > 0x79 or 0xff independently of the state of the GPON module. > > So it is really not implemented on UBNT. There are enhanced options at offset 93 which tell you which of the offset 110 signals are implemented. We already have support for these, but only when the corresponding GPIOs on the host side are not implemented.
On Wed, Jan 06, 2021 at 03:21:38PM +0000, Russell King - ARM Linux admin wrote: > On Wed, Jan 06, 2021 at 03:55:32PM +0100, Pali Rohár wrote: > > On my tested CarlitoxxPro module is: > > > > Option values : 0x00 0x1c > > Option : RX_LOS implemented, inverted > > Option : TX_FAULT implemented > > Option : TX_DISABLE implemented > > > > When cable is disconnected then in EEPROM at position 0x16e is value > > 0x82. If I call 'ip link set eth1 up' then value changes to 0x02, module > > itself has a link and I can connect to its internal telnet/webserver to > > configure it. > > Bit 7 reflects the TX_DISABLE pin state. Bit 1 reflects the RX_LOS pin > state. It isn't specified whether the inverted/non-inverted state is > reflected in bit 1 or not - the definition just says that bit 1 is > "Digital state of the RX_LOS Output Pin." > > > I also tested UBNT module and result is: > > > > Option values : 0x00 0x06 > > Option : RX_LOS implemented > > Option : RX_LOS implemented, inverted > > > > Which means that those bits are not implemented. > > > > Anyway I check position 0x16e and value on its value is randomly either > > 0x79 or 0xff independently of the state of the GPON module. > > > > So it is really not implemented on UBNT. > > There are enhanced options at offset 93 which tell you which of the > offset 110 signals are implemented. That's the ID EEPROM (A0) offset 93 for the Diagnostic address (A2) offset 110.
On Wed, Jan 06, 2021 at 03:23:38PM +0000, Russell King - ARM Linux admin wrote: > On Wed, Jan 06, 2021 at 03:21:38PM +0000, Russell King - ARM Linux admin wrote: > > On Wed, Jan 06, 2021 at 03:55:32PM +0100, Pali Rohár wrote: > > > On my tested CarlitoxxPro module is: > > > > > > Option values : 0x00 0x1c > > > Option : RX_LOS implemented, inverted > > > Option : TX_FAULT implemented > > > Option : TX_DISABLE implemented > > > > > > When cable is disconnected then in EEPROM at position 0x16e is value > > > 0x82. If I call 'ip link set eth1 up' then value changes to 0x02, module > > > itself has a link and I can connect to its internal telnet/webserver to > > > configure it. > > > > Bit 7 reflects the TX_DISABLE pin state. Bit 1 reflects the RX_LOS pin > > state. It isn't specified whether the inverted/non-inverted state is > > reflected in bit 1 or not - the definition just says that bit 1 is > > "Digital state of the RX_LOS Output Pin." > > > > > I also tested UBNT module and result is: > > > > > > Option values : 0x00 0x06 > > > Option : RX_LOS implemented > > > Option : RX_LOS implemented, inverted > > > > > > Which means that those bits are not implemented. > > > > > > Anyway I check position 0x16e and value on its value is randomly either > > > 0x79 or 0xff independently of the state of the GPON module. > > > > > > So it is really not implemented on UBNT. > > > > There are enhanced options at offset 93 which tell you which of the > > offset 110 signals are implemented. > > That's the ID EEPROM (A0) offset 93 for the Diagnostic address (A2) > offset 110. Looking at the EEPROM dumps you've sent me... the VSOL V2801F has 0xe0 at offset 93, meaning TX_DISABLE and TX_FAULT soft signals (which is basically offset 110) are implemented, RX_LOS is not. No soft signals are implemented on the Ubiquiti module.
On Wednesday 06 January 2021 15:27:07 Russell King - ARM Linux admin wrote: > On Wed, Jan 06, 2021 at 03:23:38PM +0000, Russell King - ARM Linux admin wrote: > > On Wed, Jan 06, 2021 at 03:21:38PM +0000, Russell King - ARM Linux admin wrote: > > > On Wed, Jan 06, 2021 at 03:55:32PM +0100, Pali Rohár wrote: > > > > On my tested CarlitoxxPro module is: > > > > > > > > Option values : 0x00 0x1c > > > > Option : RX_LOS implemented, inverted > > > > Option : TX_FAULT implemented > > > > Option : TX_DISABLE implemented > > > > > > > > When cable is disconnected then in EEPROM at position 0x16e is value > > > > 0x82. If I call 'ip link set eth1 up' then value changes to 0x02, module > > > > itself has a link and I can connect to its internal telnet/webserver to > > > > configure it. > > > > > > Bit 7 reflects the TX_DISABLE pin state. Bit 1 reflects the RX_LOS pin > > > state. It isn't specified whether the inverted/non-inverted state is > > > reflected in bit 1 or not - the definition just says that bit 1 is > > > "Digital state of the RX_LOS Output Pin." > > > > > > > I also tested UBNT module and result is: > > > > > > > > Option values : 0x00 0x06 > > > > Option : RX_LOS implemented > > > > Option : RX_LOS implemented, inverted > > > > > > > > Which means that those bits are not implemented. > > > > > > > > Anyway I check position 0x16e and value on its value is randomly either > > > > 0x79 or 0xff independently of the state of the GPON module. > > > > > > > > So it is really not implemented on UBNT. > > > > > > There are enhanced options at offset 93 which tell you which of the > > > offset 110 signals are implemented. > > > > That's the ID EEPROM (A0) offset 93 for the Diagnostic address (A2) > > offset 110. > > Looking at the EEPROM dumps you've sent me... the VSOL V2801F has > 0xe0 at offset 93, meaning TX_DISABLE and TX_FAULT soft signals > (which is basically offset 110) are implemented, RX_LOS is not. No > soft signals are implemented on the Ubiquiti module. UBNT has at EEPROM offset 0x5E value 0x80. CarlitoxxPRO has at this offset value 0xE0. So both SFPs claims that support alarm/warning flags and CarlitoxxPRO claims that support TX_DISABLE and TX_FAULT.
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 91d74c1a920a..490e78a72dd6 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -336,19 +336,11 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, size_t len) { struct i2c_msg msgs[2]; - size_t block_size; + u8 bus_addr = a2 ? 0x51 : 0x50; + size_t block_size = sfp->i2c_block_size; size_t this_len; - u8 bus_addr; int ret; - if (a2) { - block_size = 16; - bus_addr = 0x51; - } else { - block_size = sfp->i2c_block_size; - bus_addr = 0x50; - } - msgs[0].addr = bus_addr; msgs[0].flags = 0; msgs[0].len = 1; @@ -1642,26 +1634,30 @@ static int sfp_sm_mod_hpower(struct sfp *sfp, bool enable) return 0; } -/* Some modules (Nokia 3FE46541AA) lock up if byte 0x51 is read as a - * single read. Switch back to reading 16 byte blocks unless we have - * a CarlitoxxPro module (rebranded VSOL V2801F). Even more annoyingly, - * some VSOL V2801F have the vendor name changed to OEM. +/* GPON modules based on Realtek RTL8672 and RTL9601C chips (e.g. V-SOL + * V2801F, CarlitoxxPro CPGOS03-0490, Ubiquiti U-Fiber Instant, ...) do + * not support multibyte reads from the EEPROM. Each multi-byte read + * operation returns just one byte of EEPROM followed by zeros. There is + * no way to identify which modules are using Realtek RTL8672 and RTL9601C + * chips. Moreover every OEM of V-SOL V2801F module puts its own vendor + * name and vendor id into EEPROM, so there is even no way to detect if + * module is V-SOL V2801F. Therefore check for those zeros in the read + * data and then based on check switch to reading EEPROM to one byte + * at a time. */ -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base) +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len) { - if (!memcmp(base->vendor_name, "VSOL ", 16)) - return 1; - if (!memcmp(base->vendor_name, "OEM ", 16) && - !memcmp(base->vendor_pn, "V2801F ", 16)) - return 1; + size_t i, block_size = sfp->i2c_block_size; - /* Some modules can't cope with long reads */ - return 16; -} + /* Already using byte IO */ + if (block_size == 1) + return false; -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base *base) -{ - sfp->i2c_block_size = sfp_quirk_i2c_block_size(base); + for (i = 1; i < len; i += block_size) { + if (memchr_inv(buf + i, '\0', block_size - 1)) + return false; + } + return true; } static int sfp_cotsworks_fixup_check(struct sfp *sfp, struct sfp_eeprom_id *id) @@ -1705,11 +1701,7 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report) u8 check; int ret; - /* Some modules (CarlitoxxPro CPGOS03-0490) do not support multibyte - * reads from the EEPROM, so start by reading the base identifying - * information one byte at a time. - */ - sfp->i2c_block_size = 1; + sfp->i2c_block_size = 16; ret = sfp_read(sfp, false, 0, &id.base, sizeof(id.base)); if (ret < 0) { @@ -1723,6 +1715,27 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report) return -EAGAIN; } + if (sfp_id_needs_byte_io(sfp, &id.base, sizeof(id.base))) { + dev_info(sfp->dev, + "Detected broken RTL8672/RTL9601C emulated EEPROM\n"); + dev_info(sfp->dev, + "Switching to reading EEPROM to one byte at a time\n"); + sfp->i2c_block_size = 1; + + ret = sfp_read(sfp, false, 0, &id.base, sizeof(id.base)); + if (ret < 0) { + if (report) + dev_err(sfp->dev, "failed to read EEPROM: %d\n", + ret); + return -EAGAIN; + } + + if (ret != sizeof(id.base)) { + dev_err(sfp->dev, "EEPROM short read: %d\n", ret); + return -EAGAIN; + } + } + /* Cotsworks do not seem to update the checksums when they * do the final programming with the final module part number, * serial number and date code. @@ -1757,9 +1770,6 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report) } } - /* Apply any early module-specific quirks */ - sfp_quirks_base(sfp, &id.base); - ret = sfp_read(sfp, false, SFP_CC_BASE + 1, &id.ext, sizeof(id.ext)); if (ret < 0) { if (report)