[00/19] sg: v4 interface, rq sharing + multiple rqs

Message ID	20190524184809.25121-1-dgilbert@interlog.com (mailing list archive)
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Douglas Gilbert <dgilbert@interlog.com> To: linux-scsi@vger.kernel.org Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de, bart.vanassche@wdc.com Subject: [PATCH 00/19] sg: v4 interface, rq sharing + multiple rqs Date: Fri, 24 May 2019 14:47:50 -0400 Message-Id: <20190524184809.25121-1-dgilbert@interlog.com> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	sg: v4 interface, rq sharing + multiple rqs \| expand [00/19] sg: v4 interface, rq sharing + multiple rqs [01/19] sg: move functions around [02/19] sg: remove typedefs, type+formatting cleanup [03/19] sg: sg_log and is_enabled [04/19] sg: move header to uapi section [05/19] sg: replace rq array with lists [06/19] sg: sense buffer cleanup [07/19] sg: add sg v4 interface support [08/19] sg: add 8 byte SCSI LUN to sg_scsi_id [09/19] sg: expand sg_comm_wr_t [10/19] sg: add sg_ioabort ioctl [11/19] sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls [12/19] sg: add sg_set_get_extended ioctl [13/19] sg: sgat_elem_sz and sum_fd_dlens [14/19] sg: tag and more_async [15/19] sg: add fd sharing , change, unshare [16/19] sg: add shared requests [17/19] sg: add multiple request support [18/19] sg: add slave wait capability [19/19] sg: table of error numbers with meanings

Douglas Gilbert May 24, 2019, 6:47 p.m. UTC

This patchset extends the SCSI generic (sg) driver found in lk 5.2 .
The sg driver has a version number which is visible via an ioctl()
and is bumped from 3.5.36 to 4.0.12 by this patchset.
The additions and changes are described in some detail on this
long webpage:
    http://sg.danny.cz/sg/sg_v40.html
and references are made in various patches to relevant sections in
that document.

This patchset restores some functionality that was in the v2 driver
version and has been broken by some external patch in the interim
period (20 years). That functionality (NO_DXFER) has found new
uses in request sharing. Also functionality that has been dropped
from the bsg driver over the last year is added to this driver in
this patchset (e.g. async/non-blocking v4 interface usage, and
multiple request support).

This patchset is big and can be regarded as a driver rewrite. The
number of lines increases from around 3000 to over 6000. The size
of the module doubles although that can be reduced by turning off
SCSI logging.

TODO: during development, this driver had its own version of
      tracing which has been removed from the final patchset.
      Curiously that tracing code was never used in anger for
      its intended purpose, but the revamped SG_LOG macro was
      used a lot.
      The intention is to add tracing via the 'standard' kernel
      code. The addition is somewhat invasive, both to the sg
      driver (e.g. some otherwise private structures need to
      be made public) and to the tracing subsystem.

Testing:
The sg3_utils package has several extensions in sg3_utils-1.45
beta (revision 824) to support and test the version 4 sg
driver presented in this patchset.
The new and revised testing utilities are outlined on the
same webpage as above in the second half of the section
titled: "15 Downloads and testing".

This patchset is against Martin Petersen's 5.3/scsi-queue
branch. It will also apply to lk 5.1 and probably lk 5.0 .

Douglas Gilbert (19):
  sg: move functions around
  sg: remove typedefs, type+formatting cleanup
  sg: sg_log and is_enabled
  sg: move header to uapi section
  sg: replace rq array with lists
  sg: sense buffer cleanup
  sg: add sg v4 interface support
  sg: add 8 byte SCSI LUN to sg_scsi_id
  sg: expand sg_comm_wr_t
  sg: add sg_ioabort ioctl
  sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls
  sg: add sg_set_get_extended ioctl
  sg: sgat_elem_sz and sum_fd_dlens
  sg: tag and more_async
  sg: add fd sharing , change, unshare
  sg: add shared requests
  sg: add multiple request support
  sg: add slave wait capability
  sg: table of error numbers with meanings

 drivers/scsi/sg.c      | 6790 ++++++++++++++++++++++++++++++----------
 include/scsi/sg.h      |  268 +-
 include/uapi/scsi/sg.h |  475 +++
 3 files changed, 5682 insertions(+), 1851 deletions(-)
 create mode 100644 include/uapi/scsi/sg.h

Bart Van Assche May 26, 2019, 3:49 p.m. UTC | #1

On 5/24/19 11:47 AM, Douglas Gilbert wrote:
> This patchset restores some functionality that was in the v2 driver
> version and has been broken by some external patch in the interim
> period (20 years).

What is an "external patch"?

Thanks,

Bart.

Douglas Gilbert June 2, 2019, 8:23 p.m. UTC | #2

On 2019-05-26 11:49 a.m., Bart Van Assche wrote:
> On 5/24/19 11:47 AM, Douglas Gilbert wrote:
>> This patchset restores some functionality that was in the v2 driver
>> version and has been broken by some external patch in the interim
>> period (20 years).
> 
> What is an "external patch"?

It's a patch made by someone with a first name like Christoph, Jens or
James (picking some names at random) that is applied whether or not
the maintainer of said driver approves ("ack's) or not.

IMO The maintainer should be able to restore features removed in this
fashion (I'm talking specifically about the neutering of
SG_FLAG_NO_DXFER) without review as it is in the documented interface.
Plus I know of at least one power user who was peeved to find out that
it was quietly broken.

Doug Gilbert

P.S the use case is mirrored disks: reading one disk for the actual data,
and the other disk to check if there is an IO error. So the second disk
doesn't need its data transferred to the user space (thus saving time).

Bart Van Assche June 3, 2019, 4:24 a.m. UTC | #3

On 6/2/19 1:23 PM, Douglas Gilbert wrote:
> On 2019-05-26 11:49 a.m., Bart Van Assche wrote:
>> On 5/24/19 11:47 AM, Douglas Gilbert wrote:
>>> This patchset restores some functionality that was in the v2 driver
>>> version and has been broken by some external patch in the interim
>>> period (20 years).
>>
>> What is an "external patch"?
> 
> It's a patch made by someone with a first name like Christoph, Jens or
> James (picking some names at random) that is applied whether or not
> the maintainer of said driver approves ("ack's) or not.
> 
> IMO The maintainer should be able to restore features removed in this
> fashion (I'm talking specifically about the neutering of
> SG_FLAG_NO_DXFER) without review as it is in the documented interface.
> Plus I know of at least one power user who was peeved to find out that
> it was quietly broken.

That's unfortunate that that regression happened. As you probably know 
the traditional approach to address a regression is to post a fix that 
has the "Fixes:" and "Cc: stable" tags.

Bart.

Bart Van Assche June 3, 2019, 4:19 p.m. UTC | #4

On 5/24/19 11:47 AM, Douglas Gilbert wrote:
> This patchset is big and can be regarded as a driver rewrite. The
> number of lines increases from around 3000 to over 6000.

Every SCSI reviewer I know is too busy to review a patch series that 
involves so many changes. You may have to split this series into a 
number of smaller series if you want to encourage reviewers to have a look.

Bart.

Douglas Gilbert June 4, 2019, 9:31 p.m. UTC | #5

On 2019-06-03 12:19 p.m., Bart Van Assche wrote:
> On 5/24/19 11:47 AM, Douglas Gilbert wrote:
>> This patchset is big and can be regarded as a driver rewrite. The
>> number of lines increases from around 3000 to over 6000.
> 
> Every SCSI reviewer I know is too busy to review a patch series that involves so 
> many changes. You may have to split this series into a number of smaller series 
> if you want to encourage reviewers to have a look.

Cutting a patchset that touches around 1500 lines of a 3000 line
driver, then adds new functionality amounting to an extra 3000
lines of code (and comments), according to the "one change per
patch" rule would result in a patchset with hundreds of patches.

Further if they are to be bisectable then they must not only
compile and build, but run properly. Of course that is impossible
for new functionality as there is little to test the new
functionality against. Then there is the case of a driver maintainer
who wants to clean up code that has "rusted" over time, including
being weakened by hundreds of well-meaning but silly janitor type
patches. Should a maintainer be discouraged from doing that?

Looking around at current patchsets, LWN keeps a helpful list at
the bottom of this page (from last Thursday):
     https://lwn.net/Articles/789234/
[non-subscribers need to wait until next Thursday to view that].
I looked at the "Device Drivers" section. Most patchsets there
had between 10 and 20 patches, one had 33. One, the SIW Infiniband
driver, is over 10,000 lines long, and contains 'only' 12 patches.
Reviewers are obviously a scarce resource, but making their live's
easier shouldn't be a goal in itself.

Further, the utility of bottom up reviews by humans reduces with
time as the automatic tools to do that get better. Of the responses
to my "version 1" patchset 10 days ago, I would regard only one of
your responses "useful", and all but one of the kbuild robot's
responses useful. By "useful" I mean that they lead to better code.

If new functionality is being proposed, surely it is better to
check that it is documented and that test code exists. Then the
design and high level details of the implementation should be
assessed. And if the code doesn't involve special hardware,
shouldn't the test code be run as part of the review process?
Code reviews have their place, but over the last 10 years, bugs in
the sg and scsi_debug drivers have been found by _using_ those
drivers. Also with those drivers I and others have found bugs in
the block layer and the SCSI mid-level. To date I have had no
feedback about design document describing this patchset:
     http://sg.danny.cz/sg/sg_v40.html
even though I refer to sections of it in about half of the patches
in this patchset. Plus I have written extensive test code and made
it available; again no feedback on that.

In the article: "A ring buffer for epoll":
    https://lwn.net/Articles/789603/    [May 30, 2019]
Jonathan Corbet discusses these issues in the final section
titled: "Some closing grumbles".

Doug Gilbert

Bart Van Assche June 4, 2019, 9:58 p.m. UTC | #6

On 6/4/19 2:31 PM, Douglas Gilbert wrote:
> Then there is the case of a driver maintainer
> who wants to clean up code that has "rusted" over time, including
> being weakened by hundreds of well-meaning but silly janitor type
> patches. Should a maintainer be discouraged from doing that?

I think in this case the answer to that question is "yes, definitely".
Multiple companies, including my employer, use the SG I/O interface as 
the primary interface for controlling SMR drives. Any regression in the 
SG I/O code will have severe consequences for users of SMR drives. As 
you know a rewrite can introduce regressions. Since the current 
implementation works fine I think it is up to you to find a way to 
motivate existing SG I/O users to adopt your new implementation. The 
cloud companies I know employ kernel developers and test new kernel 
versions before deploying these internally. How are you going to 
motivate these companies to adopt your rewrite instead of combining e.g. 
the Linux kernel v5.1 SG I/O implementation with the latest version of 
the Linux kernel?

> To date I have had no feedback about design document describing this
> patchset:
>      http://sg.danny.cz/sg/sg_v40.html
> even though I refer to sections of it in about half of the patches
> in this patchset. Plus I have written extensive test code and made
> it available; again no feedback on that.

It is great that you took the time to write a design document. I can't 
speak for other kernel developers. But if I see a URL myself in a cover 
letter of a patch series or in a commit message I ignore it. If you want 
feedback on that design document please convert it into a documentation 
patch and include it in this patch series.

Thanks,

Bart.

Martin K. Petersen June 6, 2019, 2:34 p.m. UTC | #7

Doug,

> Cutting a patchset that touches around 1500 lines of a 3000 line
> driver, then adds new functionality amounting to an extra 3000 lines
> of code (and comments), according to the "one change per patch" rule
> would result in a patchset with hundreds of patches.

The problem here is that you think of it as a single patch set
transitioning the driver from major version X to version Y.

Linux kernel development has moved away from that model. The kernel
release cadence is more or less fixed at 10 weeks. The release process
is not controlled by features or component versions or anything of that
nature. It is set by passing of time.

The notion that a driver or kernel component may have a version number
applied to it is orthogonal to that process. A version is something you
as a driver maintainer may decide to use to describe a certain set of
commits. But it has no meaning wrt. the Linux development process.

If you want to transition sg from what you call v3 to v4, then the
process is that you submit a handful or two of small, easily digestible
patches at a time.

It may take a few kernel releases to get to where you want to be. But
that is the process that everybody else is following to get their
changes merged.

Nobody says you can only make one submission per submission window. It's
perfectly fine to submit patches 11-20 as soon as patches 1-10 have been
reviewed and merged. As an example of this, the HBA vendors usually send
several driver updates each release cycle.

> Further if they are to be bisectable then they must not only
> compile and build, but run properly.

Absolutely. That's a requirement.

> Of course that is impossible for new functionality as there is little
> to test the new functionality against.

The burden is on you to submit patches in an order in which they make
logical sense and in which they can be reviewed, bisected, and tested.

> I looked at the "Device Drivers" section. Most patchsets there
> had between 10 and 20 patches, one had 33.

The "number of patches in a driver submission" as a measure is a red
herring. sg is not a new driver, it has users.

> One, the SIW Infiniband driver, is over 10,000 lines long, and
> contains 'only' 12 patches.

But it was presumably developed out of tree in a separate repo. And the
thousands of commits that resulted in this driver have been collapsed
into 12 patches for initial submission.

The number of lines is also a red herring. If a patch changes 10,000
lines to make one logical change that's perfectly fine. What's not fine
is a patch changing 10,000 lines and also making an entirely different
logical change.

> Reviewers are obviously a scarce resource, but making their live's
> easier shouldn't be a goal in itself.

Couldn't disagree more. If you want your code merged, you will have to
present it in a way that caters to the reviewers. Because without
reviews, your code won't get merged.

> If new functionality is being proposed, surely it is better to check
> that it is documented and that test code exists. Then the design and
> high level details of the implementation should be assessed.

Testing and documentation are absolutely important. But so is
documenting what compelled a code change.

Reviews aid in verifying that the thought process outlined in the patch
description matches the code changes performed. This is where the whole
"one logical change" comes from. And when things subsequently break, it
is then easy to identify which assumptions the patch author made that
turned out not to be valid.

> To date I have had no feedback about design document describing this
> patchset: http://sg.danny.cz/sg/sg_v40.html

This suffers the same problem as your patch series in that it
encapsulates 26 years of thought in a single blob.

Presumably that document developed over time and didn't go from nothing
to 22K words in an instant. You need to document that process. Also,
having a design document that describes a wealth of changes after the
fact is not terribly helpful.

I understand appreciate that you are focused on the end product. But to
get there you need to slowly and iteratively submit patches against v3
that can be independently reviewed and merged.

Please pick one feature at a time. Carve that into a few patches that
each logically only do one thing. And then submit that as a patch set
with an intro mail that describes the design of the feature, which
assumptions are made, what the benefits are, who needs this capability,
etc.

Douglas Gilbert June 6, 2019, 10:42 p.m. UTC | #8

On 2019-06-06 10:34 a.m., Martin K. Petersen wrote:
> 
> Doug,
> 
>> Cutting a patchset that touches around 1500 lines of a 3000 line
>> driver, then adds new functionality amounting to an extra 3000 lines
>> of code (and comments), according to the "one change per patch" rule
>> would result in a patchset with hundreds of patches.
> 
> The problem here is that you think of it as a single patch set
> transitioning the driver from major version X to version Y.
> 
> Linux kernel development has moved away from that model. The kernel
> release cadence is more or less fixed at 10 weeks. The release process
> is not controlled by features or component versions or anything of that
> nature. It is set by passing of time.
> 
> The notion that a driver or kernel component may have a version number
> applied to it is orthogonal to that process. A version is something you
> as a driver maintainer may decide to use to describe a certain set of
> commits. But it has no meaning wrt. the Linux development process.
> 
> If you want to transition sg from what you call v3 to v4, then the
> process is that you submit a handful or two of small, easily digestible
> patches at a time.
> 
> It may take a few kernel releases to get to where you want to be. But
> that is the process that everybody else is following to get their
> changes merged.
> 
> Nobody says you can only make one submission per submission window. It's
> perfectly fine to submit patches 11-20 as soon as patches 1-10 have been
> reviewed and merged. As an example of this, the HBA vendors usually send
> several driver updates each release cycle.
> 
>> Further if they are to be bisectable then they must not only
>> compile and build, but run properly.
> 
> Absolutely. That's a requirement.
> 
>> Of course that is impossible for new functionality as there is little
>> to test the new functionality against.
> 
> The burden is on you to submit patches in an order in which they make
> logical sense and in which they can be reviewed, bisected, and tested.
> 
>> I looked at the "Device Drivers" section. Most patchsets there
>> had between 10 and 20 patches, one had 33.
> 
> The "number of patches in a driver submission" as a measure is a red
> herring. sg is not a new driver, it has users.
> 
>> One, the SIW Infiniband driver, is over 10,000 lines long, and
>> contains 'only' 12 patches.
> 
> But it was presumably developed out of tree in a separate repo. And the
> thousands of commits that resulted in this driver have been collapsed
> into 12 patches for initial submission.
> 
> The number of lines is also a red herring. If a patch changes 10,000
> lines to make one logical change that's perfectly fine. What's not fine
> is a patch changing 10,000 lines and also making an entirely different
> logical change.
> 
>> Reviewers are obviously a scarce resource, but making their live's
>> easier shouldn't be a goal in itself.
> 
> Couldn't disagree more. If you want your code merged, you will have to
> present it in a way that caters to the reviewers. Because without
> reviews, your code won't get merged.
> 
>> If new functionality is being proposed, surely it is better to check
>> that it is documented and that test code exists. Then the design and
>> high level details of the implementation should be assessed.
> 
> Testing and documentation are absolutely important. But so is
> documenting what compelled a code change.
> 
> Reviews aid in verifying that the thought process outlined in the patch
> description matches the code changes performed. This is where the whole
> "one logical change" comes from. And when things subsequently break, it
> is then easy to identify which assumptions the patch author made that
> turned out not to be valid.
> 
>> To date I have had no feedback about design document describing this
>> patchset: http://sg.danny.cz/sg/sg_v40.html
> 
> This suffers the same problem as your patch series in that it
> encapsulates 26 years of thought in a single blob.
> 
> Presumably that document developed over time and didn't go from nothing
> to 22K words in an instant. You need to document that process. Also,
> having a design document that describes a wealth of changes after the
> fact is not terribly helpful.

So tl;dr ?

And if I hid the code, would my "blob" then qualify as a design document?

Also if my sg v2/v3 documentation targetted the kernel submission
processes between 1998 and 2002, would it be much use today? I
think not.

The way C++ standards are being developed these days is interesting.
A proposer comes with an idea, code, test cases, rationale; then clang
and gcc folks implement it and then the committee looks at it seriously.
Some proposals get shunted through TS groups for a longer look.

I'm old enough to have been through the Ada debacle. So I know about
design documentation that went from the floor to the ceiling, several
times. Ada wasn't (isn't) a bad language but it was killed by its
associated documentation workflow. The Ada prophets claimed it was
a self-evident truth that almost all documentation (and test cases)
must be written before any code was cut. Ada is still used in air
traffic control, railway signalling, etc; not sure if that is
re-assuring.

BTW Not everything in that "blob" is new. Only a small part of the
large ioctl table is new; plus the v1, v2, v3 and v4 interfaces
have all been in place for over 10 years. The object tree of the
driver remains the same.

> I understand appreciate that you are focused on the end product. But to
> get there you need to slowly and iteratively submit patches against v3
> that can be independently reviewed and merged.
> 
> Please pick one feature at a time. Carve that into a few patches that
> each logically only do one thing. And then submit that as a patch set
> with an intro mail that describes the design of the feature, which
> assumptions are made, what the benefits are, who needs this capability,
> etc.

Thank you for repeating the party line. I expected none other. As a
bonus, you took the "scarce resource" bait.

And I'm reminded of the well reasoned process that the cabal in which
Christoph, you and (I suspect) James conspired, to remove bidirectional
SCSI command support from the kernel. You dismissed it masterfully as
"old cruft" while Christoph continued to misquote Boaz after he had been
told by same to stop or properly qualify the quote in question. Cache
lines also got an honourable mention as a justification. But where was
the design document (aka a "blob") justifying the change/removal,
complete with a fallback strategy if the removal "blew up"? And where
are the design documents for the sd driver and its ongoing
evolutionary changes? Ever seen anything written about the sr or
ses driver?
... in short: don't do as I do, do as I say ...

I will probably produce a first half patchset, when I have properly
adjusted my "cadence".

Doug Gilbert

P.S. For historical accuracy it is 21 years not 26. I have never spoken
to, or received any correspondence (or design documents) from, Lawrence
Foard. In 1998 he had not responded to emails for several years (I was
told) and Alan Cox was looking for someone, anyone to take over
maintenance of the sg driver. IMO Alan considered that ongoing maintenance
and evolutionary change of the existing driver was more important than a
replacement driver that was on offer. So now I'm looking at a bit more
evolution.

Martin K. Petersen June 7, 2019, 4:58 a.m. UTC | #9

Doug,

> So tl;dr ?

I actually tried.

I also spent some time reading the patches two weeks ago. The new series
is an improvement over the initial sg v4 posting from last winter. But
it still needs work.

> Also if my sg v2/v3 documentation targetted the kernel submission
> processes between 1998 and 2002, would it be much use today? I think
> not.

No, but a diff, change bars, or something similar would have been very
helpful to clearly identify what's changed between v3 and v4. Just like
it's done in standards documents.

You have your section 3 bullet list. But the rest of the document is
huge and it is not immediately obvious what the v4-specific bits are.

> Thank you for repeating the party line. I expected none other.

Don't shoot the messenger. This is all documented in detail in:

      Documentation/process/submitting-patches.rst

Please adjust your patch submissions accordingly. Especially the first
few paragraphs of section 2 in that document are worth paying attention
to. Your patch descriptions typically don't have a problem statement or
justification. Why are you making all these changes? Why do we need a v4
in the first place? Why don't the existing ioctls suffice? Why aren't 16
outstanding requests per fd enough? Etc.

> And where are the design documents for the sd driver and its ongoing
> evolutionary changes? Ever seen anything written about the sr or ses
> driver?

I think you are missing my point. We don't have gnarly design documents
because every change is either a self-contained commit or small series
where each patch describes why a change was made. The git log *is* the
design document.

Each of your bullet features in section 3 ("Changes to sg driver between
version 3.5.36 and 4.0") would ideally be submitted as a separate patch
series whose individual merits could be discussed on the list.

This is in sharp contrast to the "This patchset is big and can be
regarded as a driver rewrite" approach that signals a preconceived
commitment to a particular set of features and design choices. Why
comment on something you can't influence? Who has time to read 22K words
on a website and try to find out how those words relate to the posted
patches?

The only person on linux-scsi that *has* to look at your patches is
me. Everybody else is a volunteer you need to entice to invest time and
effort in reviewing your work. Quite possibly in their spare time.

That's why I suggested that the burden is on you. You need to present
your work in a way that engages people, is manageable in terms of time
commitment, and makes them feel their investment is worthwhile and
appreciated. Please take that into consideration.

[00/19] sg: v4 interface, rq sharing + multiple rqs

Message

Comments