Message ID | 20200306042831.17827-1-elder@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | net: introduce Qualcomm IPA driver (UPDATED) | expand |
From: Alex Elder <elder@linaro.org> Date: Thu, 5 Mar 2020 22:28:14 -0600 > This series presents the driver for the Qualcomm IP Accelerator (IPA). Series applied, thank you.
I am happy to see this driver upstream. >Arnd's concern was that the rmnet_data0 network device does not >have the benefit of information about the state of the underlying >IPA hardware in order to be effective in controlling TX flow. >The feared result is over-buffering of TX packets (bufferbloat). >I began working on some simple experiments to see whether (or how >much) his concern was warranted. But it turned out that completing >these experiments was much more work than had been hoped. Members of the bufferbloat project *care*, and have tools and testbeds for exploring these issues. It would be good to establish a relationship with the vendor, obtain hardware, and other (technical and financial) support, if possible. Is there any specific hardware now available (generally or in beta) that can be obtained by us to take a harder look? A contact at linaro or QCA willing discuss options?
On 3/9/20 11:54 AM, Dave Taht wrote: > I am happy to see this driver upstream. > >> Arnd's concern was that the rmnet_data0 network device does not >> have the benefit of information about the state of the underlying >> IPA hardware in order to be effective in controlling TX flow. >> The feared result is over-buffering of TX packets (bufferbloat). >> I began working on some simple experiments to see whether (or how >> much) his concern was warranted. But it turned out that completing >> these experiments was much more work than had been hoped. > > Members of the bufferbloat project *care*, and have tools and testbeds for > exploring these issues. It would be good to establish a relationship with > the vendor, obtain hardware, and other (technical and financial) support, if > possible. > > Is there any specific hardware now available (generally or in beta) that > can be obtained by us to take a harder look? A contact at linaro or QCA > willing discuss options? There exists some hardware that could be used, but at the moment I have not ported this code to operate on it. It is a current effort however, and I will be glad to keep you in the loop on progress. There are a couple of target environments we'd like to support but until last week the primary goal was inclusion in the upstream tree. I will follow up with you after the dust settles a little bit with this patch series, maybe in a week or so. In the mean time I'll also find out whether there are any other resources (people and/or hardware) available. -Alex
On Thu, Mar 5, 2020 at 8:28 PM Alex Elder <elder@linaro.org> wrote: > > This series presents the driver for the Qualcomm IP Accelerator (IPA). > > This is version 2 of this updated series. It includes the following > small changes since the previous version: > - Now based on net-next instead of v5.6-rc > - Config option now named CONFIG_QCOM_IPA > - Some minor cleanup in the GSI code > - Small change to replenish logic > - No longer depends on remoteproc bug fixes > What follows is the basically same explanation as was posted previously. > > -Alex > > I have posted earlier versions of this code previously, but it has > undergone quite a bit of development since the last time, so rather > than calling it "version 3" I'm just treating it as a new series > (indicating it's been updated in this message). The fast/data path > is the same as before. But the driver now (nearly) supports a > second platform, its transaction handling has been generalized > and improved, and modem activities are now handled in a more > unified way. > > This series is available (based on net-next in branch "ipa_updated-v2" > in this git repository: > https://git.linaro.org/people/alex.elder/linux.git > > The branch depends on other one other small patch that I sent out > for review earlier. > https://lore.kernel.org/lkml/20200306042302.17602-1-elder@linaro.org/ > I realize this is all already in (yay!), but it took me a long time to get around to fully reading this driver. I'll paste my notes here for posterity or possible future patches. Overall the driver seemed well documented and thoughtfully written. As someone who has seen the old downstream IPA driver (though I didn't look long as my brain started hurting), I greatly appreciate the work required by Alex to polish this all up. So firstly, thanks Alex! Onto the notes. There are a couple themes I noticed. The driver seems occasionally to be unnecessarily layer-caked. I noticed "could be inlined" as a common refrain in my feedback. There are also a couple places with hand-rolled refcounting, atomic exchanges, and odd mutexes. I haven't fully digested those to be able to know how to get rid of them, but I'll point them out as something that "doesn't smell quite right". Acronyms (for my own benefit): ee - execution environment ep - endpoint er - endpoint or route ID rt - resource type dcd - Dynamic clock division (request to GCC to turn you off) bcr - Backwards compatibility register comp - Core master port holb - ??? ipa_main.c: What is IPA_VALIDATION. Can this just be on always or removed? otherwise it will likely bit rot. I'd like to see this suspend_ref go away. ipa_reg.c can be inlined ipa_mem_init can be inlined. IPA_NOTIFY: Shouldn't CONFIG_IPA depend on IPA_NOTIFY? ipa_data.h Why are ipa_resource_src and ipa_resource_dst separate structures? maybe the extern globals at the bottom should just be moved into ipa_main.c ipa_endpoint.h Add a note for enum ipa_endpoint_name indicating who is TXing and RXing ipa_data-sc7180.c Where is IPA_ENDPOINT_MODEM_LAN_TX definition? ipa_clock.c IPA_CORE_CLOCK_RATE - Should probably be specified in DT as a fixed frequency rather than here in code. Interconnect bandwidths - Are these a function of the core clock rate? This may be fine for the initial version, but is there any way to derive the bandwidth requirement? ipa_interconnect_init_one - Probably best to just inline this ipa_clock_get_additional - Seems sketchy, would like to remove this Overall don't like the homebrew reference counting here. Would runtime PM help you do this? ipa_interrupt.h I'd like to get rid of ipa_interrupt_add and ipa_interrupt_remove. Seems like there's no need for these to be dynamically added, it's all one driver. ipa_interrupt.c Why does ipa_interrupt_setup() need to dynamically allocate the structure, can't we just embed it in struct ipa? Without the kzalloc, ipa_interrupt_setup() and ipa_interrupt_teardown() are simple enough they can probably be inlined (at least teardown for sure). Interrupt processing seems a little odd. What I would have expected is: Hard ISR reads pending bits, and immediately writes all pending bits to quiesce them. Save bitmask of pending bits, and send to the threaded handler. Threaded handler then reads and clears pending bits out, and acts on any. Fixes interrupt storm in ipa_isr() if an unexpected interrupt comes in but an expected interrupt is also pending. Avoids multiple register writes (one for each bit) in ipa_interrupt_process() Saves all the register reads in ipa_interrupt_process_all(). That additional read in the loop seems like it shouldn't be there either way. ipa_mem.h Is IPA_SHARED_MEM_SIZE supposed to be defined? It's mentioned in the comment. Comment says the number of canaries is the same for all IPA versions, but ipa_data-sdm845.c and ipa_data-sc7180.c seem to have different canary counts for IPA_MEM_UC_INFO? Should the number of canaries really be part of the chipset-specific config info if it's never going to change? Do the canary values eat into the previous region? Can we add a warning to ensure we don't write canary values off the beginning of the memory region? ipa_mem.c Maybe remove ipa_mem_teardown() if we're not planning to add anything to it soon, or inline it in the header for now. Does ipa_mem_zero_modem() erase canary values previously set up? gsi.h Why make gsi_evt_ring_state 0xf? Remove assignments and let enum do its thing. enum gsi_ee_id - Probably worth commenting that this defines the layout of the per-EE register regions, so rearranging this would horribly break our access to hardware. gsi_reg.h What is gsi v2.0? Is that the same as IPA 4.0? Why do the channel macros have things like CH_C and EE_N in them? Why not just CH and EE? Oh, I also see CH_E, what's that? gsi.c: enum gsi_err_code: Where's 0x7? gsi_channel_deprogram(): delete gsi_channel_update(): I'm worried about this refcount thing, how does it work? gsi_event_bitmap_init() can be inlined gsi_evt_ring_setup() and gsi_evt_ring_teardown() can be removed gsi_teardown(): inline gsi_evt_ring_exit(): remove ipa_gsi.h: Comment for ipa_gsi_channel_tx_completed has wrong function name copypasta. ipa_gsi.c: This is an interesting mezzanine interface, it looks like it was designed to keep GSI code from calling IPA code directly. Why is that? Could these at least be inlined into the ipa_gsi.h? gsi_trans.h: Why is it important that struct gsi_trans be < 128 bytes? gsi_trans.c: gsi_tre_type - Should this be in a header? TRE_FLAGS_ - Should these be in a header? Also, replace GENMASK(x,x) with BIT(x). TRE_FLAGS_IEOB_FMASK is never used (which is fine, but should it be?) gsi_trans_tre_reserve() - Why atomic_try_cmpxchg? What's the difference between that and atomic_cmpxchg? gsi_tre_len_opcode() - If len is truncated to 16 bits, why is u32 passed in? Is len sometimes used as 32 bits? gsi_trans_tre_fill() - If it doesn't do a 16-byte atomic write, is this a problem? Could the controller see a half-baked TRE? ipa_endpoint.c: What is HOLB timer? ipa_table.c: ipa_table_valid() - This just runs all 3-bit possibilities. Could use flags and a loop instead. ipa_table_teardown() - Remove? ipa_cmd.c: ipa_cmd_tag_process_add() - What happened here? Is this just functionality we're not using right now? ipa_modem.c ipa_start_xmit() - Could returning BUSY result in an infinite loop if something goes wrong in the lower layers? ipa_modem_start() - Shouldn't we print some errors if the state variable has an unexpected value (ie not RUNNING)? In those cases we are likely not in a good place. ipa_qmi.c: ipa_qmi_indication() could be inlined init_modem_driver_req() use of static means this can never run concurrently with itself, right? Also if the request gets stuck in qmi_txn_wait() you're hosed. ipa_qmi_msg.c You could macro-ize the initialization of these elements, which would make things way shorter, and probably easier to read. I'm imagining for instance the first element in the file could be reduced to IPA_QMI_ELEM(QMI_OPT_FLAG, 1, struct ipa_indication_register_req, master_driver_init_complete_valid, 0x10) ipa_smp2p.c: s/Motex/Mutex/ Actually I don't get why the mutex is needed at all. It's certainly not needed in ipa_smp2p_disable() (stores are already atomic), and threaded irqs already have mutual exclusion. Or are you trying to make sure ipa_smp2p_disable() doesn't return until ipa_smp2p_modem_setup_ready_isr() has fully completed? If that's really why, you should explain that's what it's doing and why it's necessary. Thinking more about it, why can't you just actually disable the irq? That calls synchronize_irq, which will flush out any instances of the irq running. Then no mutex necessary! ipa_smp2p_irq_init(), and _exit() can be inlined. I'd love to see clock_on and the weird reference counting go away. Is that really necessary?