Message ID | 20241028213541.1529-2-ouster@cs.stanford.edu (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Begin upstreaming Homa transport protocol | expand |
> +/** > + * struct homa_recvmsg_args - Provides information needed by Homa's > + * recvmsg; passed to recvmsg using the msg_control field. > + */ > +struct homa_recvmsg_args { > + /** > + * @id: (in/out) Initially specifies the id of the desired RPC, or 0 > + * if any RPC is OK; returns the actual id received. > + */ > + uint64_t id; > + > + /** > + * @completion_cookie: (out) If the incoming message is a response, > + * this will return the completion cookie specified when the > + * request was sent. For requests this will always be zero. > + */ > + uint64_t completion_cookie; > + > + /** > + * @flags: (in) OR-ed combination of bits that control the operation. > + * See below for values. > + */ > + int flags; Maybe give this a fixed size, otherwise it gets interesting when you have a 32 bit userspace running on top of a 64 bit kernel. > + > + /** > + * @error_addr: the address of the peer is stored here when available. > + * This field is different from the msg_name field in struct msghdr > + * in that the msg_name field isn't set after errors. This field will > + * always be set when peer information is available, which includes > + * some error cases. > + */ > + union sockaddr_in_union peer_addr; > + > + /** > + * @num_bpages: (in/out) Number of valid entries in @bpage_offsets. > + * Passes in bpages from previous messages that can now be > + * recycled; returns bpages from the new message. > + */ > + uint32_t num_bpages; > + > + uint32_t _pad[1]; If you ever want to be able to use this sometime in the future, it would be good to document that it should be filled with zero, and test is it zero. And if the kernel ever passes this structure back to userspace it should also fill it with zero. > +#if !defined(__cplusplus) > +_Static_assert(sizeof(struct homa_recvmsg_args) >= 120, > + "homa_recvmsg_args shrunk"); > +_Static_assert(sizeof(struct homa_recvmsg_args) <= 120, > + "homa_recvmsg_args grew"); Did you build for 32 bit systems? Andrew
On Tue, Oct 29, 2024 at 2:59 PM Andrew Lunn <andrew@lunn.ch> wrote: > > > + int flags; > > Maybe give this a fixed size, otherwise it gets interesting when you > have a 32 bit userspace running on top of a 64 bit kernel. Good point; will do. > > + uint32_t _pad[1]; > > If you ever want to be able to use this sometime in the future, it > would be good to document that it should be filled with zero, and test > is it zero. And if the kernel ever passes this structure back to > userspace it should also fill it with zero. It does have to be filled with zero, and it is checked. I'll document that. > > +#if !defined(__cplusplus) > > +_Static_assert(sizeof(struct homa_recvmsg_args) >= 120, > > + "homa_recvmsg_args shrunk"); > > +_Static_assert(sizeof(struct homa_recvmsg_args) <= 120, > > + "homa_recvmsg_args grew"); > > Did you build for 32 bit systems? Sadly no: my development system doesn't currently have any cross-compiling versions of gcc :-( I think the best thing is to remove these assertions from the kernel version of Homa. They are there to make sure that I don't accidentally change the size of the structure; I will keep them in the GitHub repo for Homa, which should serve that purpose. Thanks for the quick comments. -John-
> > Did you build for 32 bit systems? > > Sadly no: my development system doesn't currently have any > cross-compiling versions of gcc :-( I'm not sure in this case it is actually a cross compile. Your default amd64 tool chain should also be able to compile for i386. export ARCH=i386 unset CROSS_COMPILE make defconfig make Andrew
On Wed, Oct 30, 2024 at 5:41 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > > Did you build for 32 bit systems? > > > > Sadly no: my development system doesn't currently have any > > cross-compiling versions of gcc :-( > > I'm not sure in this case it is actually a cross compile. Your default > amd64 tool chain should also be able to compile for i386. > > export ARCH=i386 > unset CROSS_COMPILE > make defconfig > make Thanks for this additional information. I have now compiled Homa (along with the rest of the kernel) for ARCH=i386; in the process I learned about uintptr_t and do_div. Question: is the distinction between the types u64 and __u64 significant? If so, is there someplace where it is explained when I should use each? So far I have been using __u64 (almost) everywhere. -John-
On Fri, Nov 01, 2024 at 10:47:20AM -0700, John Ousterhout wrote: > On Wed, Oct 30, 2024 at 5:41 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > Did you build for 32 bit systems? > > > > > > Sadly no: my development system doesn't currently have any > > > cross-compiling versions of gcc :-( > > > > I'm not sure in this case it is actually a cross compile. Your default > > amd64 tool chain should also be able to compile for i386. > > > > export ARCH=i386 > > unset CROSS_COMPILE > > make defconfig > > make > > Thanks for this additional information. I have now compiled Homa > (along with the rest of the kernel) for ARCH=i386; in the process I > learned about uintptr_t and do_div. > > Question: is the distinction between the types u64 and __u64 > significant? If so, is there someplace where it is explained when I > should use each? So far I have been using __u64 (almost) everywhere. /include/uapi/asm-generic/int-ll64.h says: /* * __xx is ok: it doesn't pollute the POSIX namespace. Use these in the * header files exported to user space */ So for files you export to userspace, anything in include/uapi, you should be using __u64. In the kernel, i think it does not matter, and i did find: typedef __u64 u64; so they probably end up identical. u64 seems more popular in net/ than __u64, probably because it is shorter. Andrew
On 28/10/2024 21:35, John Ousterhout wrote: > Note: for man pages, see the Homa Wiki at: > https://homa-transport.atlassian.net/wiki/spaces/HOMA/overview > > Signed-off-by: John Ousterhout <ouster@cs.stanford.edu> ... > +/** > + * Holds either an IPv4 or IPv6 address (smaller and easier to use than > + * sockaddr_storage). > + */ > +union sockaddr_in_union { > + struct sockaddr sa; > + struct sockaddr_in in4; > + struct sockaddr_in6 in6; > +}; Are there fundamental reasons why Homa can only run over IP and not other L3 networks? Or performance measurements showing that the cost of using sockaddr_storage is excessive? Otherwise, baking this into the uAPI seems unwise. > + /** > + * @error_addr: the address of the peer is stored here when available. > + * This field is different from the msg_name field in struct msghdr > + * in that the msg_name field isn't set after errors. This field will > + * always be set when peer information is available, which includes > + * some error cases. > + */ > + union sockaddr_in_union peer_addr; Member name (peer_addr) doesn't match the kerneldoc (@error_addr). > +int homa_send(int sockfd, const void *message_buf, > + size_t length, const union sockaddr_in_union *dest_addr, > + uint64_t *id, uint64_t completion_cookie); > +int homa_sendv(int sockfd, const struct iovec *iov, > + int iovcnt, const union sockaddr_in_union *dest_addr, > + uint64_t *id, uint64_t completion_cookie); > +ssize_t homa_reply(int sockfd, const void *message_buf, > + size_t length, const union sockaddr_in_union *dest_addr, > + uint64_t id); > +ssize_t homa_replyv(int sockfd, const struct iovec *iov, > + int iovcnt, const union sockaddr_in_union *dest_addr, > + uint64_t id); I don't think these belong in here. They seem to be userland library functions which wrap the sendmsg syscall, and as far as I can tell the definitions corresponding to these prototypes do not appear in the patch series.
On Thu, Nov 7, 2024 at 1:58 PM Edward Cree <ecree.xilinx@gmail.com> wrote: > > On 28/10/2024 21:35, John Ousterhout wrote: > > Note: for man pages, see the Homa Wiki at: > > https://homa-transport.atlassian.net/wiki/spaces/HOMA/overview > > > > Signed-off-by: John Ousterhout <ouster@cs.stanford.edu> > ... > > +/** > > + * Holds either an IPv4 or IPv6 address (smaller and easier to use than > > + * sockaddr_storage). > > + */ > > +union sockaddr_in_union { > > + struct sockaddr sa; > > + struct sockaddr_in in4; > > + struct sockaddr_in6 in6; > > +}; > > Are there fundamental reasons why Homa can only run over IP and not > other L3 networks? Or performance measurements showing that the > cost of using sockaddr_storage is excessive? > Otherwise, baking this into the uAPI seems unwise. This structure made it easier to write code that runs over both IPv4 and IPv6. But, I see your point about the limitations it creates (there is no fundamental reason Homa couldn't run over other datagram protocols). In looking over the code, I don't think this structure is used anymore in the kernel code or the kernel-user interface (it appears in one structure, but I believe that field is now obsolete and can be eliminated); its remaining uses are in user-level code. I will remove sockaddr_in_union from this file. > > + /** > > + * @error_addr: the address of the peer is stored here when available. > > + * This field is different from the msg_name field in struct msghdr > > + * in that the msg_name field isn't set after errors. This field will > > + * always be set when peer information is available, which includes > > + * some error cases. > > + */ > > + union sockaddr_in_union peer_addr; > > Member name (peer_addr) doesn't match the kerneldoc (@error_addr). I will fix. > > +int homa_send(int sockfd, const void *message_buf, > > + size_t length, const union sockaddr_in_union *dest_addr, > > + uint64_t *id, uint64_t completion_cookie); > > +int homa_sendv(int sockfd, const struct iovec *iov, > > + int iovcnt, const union sockaddr_in_union *dest_addr, > > + uint64_t *id, uint64_t completion_cookie); > > +ssize_t homa_reply(int sockfd, const void *message_buf, > > + size_t length, const union sockaddr_in_union *dest_addr, > > + uint64_t id); > > +ssize_t homa_replyv(int sockfd, const struct iovec *iov, > > + int iovcnt, const union sockaddr_in_union *dest_addr, > > + uint64_t id); > > I don't think these belong in here. They seem to be userland > library functions which wrap the sendmsg syscall, and as far as > I can tell the definitions corresponding to these prototypes do > not appear in the patch series. I'll remove for now. This leaves open the question of where these declarations should go once the userland library is upstreamed. Those library methods are low-level wrappers that make it easier to use the sendmsg kernel call for Homa; users will probably think of them as if they were system calls. It feels awkward to require people to #include 2 different header files in order to use Homa kernel calls; is it considered bad form to mix declarations for very low-level methods like these ("not much more than kernel calls") with those for "real" kernel calls? Do you know of other low-level kernel-call wrappers in Linux that are analogous to these? If so, how are they handled? Thanks for your comments. -John-
On 08/11/2024 17:55, John Ousterhout wrote: > This leaves open the question of where these > declarations should go once the userland library is upstreamed. Those > library methods are low-level wrappers that make it easier to use the > sendmsg kernel call for Homa; users will probably think of them as if > they were system calls. It feels awkward to require people to #include > 2 different header files in order to use Homa kernel calls; is it > considered bad form to mix declarations for very low-level methods > like these ("not much more than kernel calls") with those for "real" > kernel calls? include/uapi/ does sometimes contain 'static inline' wrappers. But declarations for actual functions that need linkage are avoided AFAICT. The expectation normally is that userland application code will #include a library header, which takes care of #including any necessary kernel uAPI headers, ideally packaged separately from the kernel rather than just taking the include/uapi/ directory of whatever kernel is currently running. (Back in the day there were some classic Linus rants[1] warning against the latter.) Then both the helper functions and their declarations live in the library, where they can be linked into the application, and not mixed in with the kernel headers. > Do you know of other low-level kernel-call wrappers in > Linux that are analogous to these? If so, how are they handled? The closest analogy that comes to mind is the bpf system call and libbpf. libbpf lives in the tools/lib/bpf/ directory of the kernel tree, but is often packaged and distributed independently[2] of the kernel package. If there is a reason to tie the maintenance of your wrappers to the kernel project/git repo then this can be suitable. But I'm not an expert on this, so I hope someone with more experience around uAPI stuff will chime in. Might be worth CCing linux-api[3] on the next version of this patch. HTH, -ed [1]: https://yarchive.net/comp/linux/kernel_headers.html#23 [2]: https://github.com/libbpf/libbpf [3]: https://www.kernel.org/doc/man-pages/linux-api-ml.html
On Fri, 8 Nov 2024 22:02:27 +0000 Edward Cree <ecree.xilinx@gmail.com> wrote: > > Do you know of other low-level kernel-call wrappers in > > Linux that are analogous to these? If so, how are they handled? > > The closest analogy that comes to mind is the bpf system call and libbpf. > libbpf lives in the tools/lib/bpf/ directory of the kernel tree, but is > often packaged and distributed independently[2] of the kernel package. > If there is a reason to tie the maintenance of your wrappers to the > kernel project/git repo then this can be suitable. liburing for ioring calls is a better example. There are lots of versioning issues in any API. It took several years for BPF to get to run anywhere status. Hopefully, you can learn from those problems.
diff --git a/include/uapi/linux/homa.h b/include/uapi/linux/homa.h new file mode 100644 index 000000000000..306d272e4b63 --- /dev/null +++ b/include/uapi/linux/homa.h @@ -0,0 +1,199 @@ +/* SPDX-License-Identifier: BSD-2-Clause */ + +/* This file defines the kernel call interface for the Homa + * transport protocol. + */ + +#ifndef _UAPI_LINUX_HOMA_H +#define _UAPI_LINUX_HOMA_H + +#include <linux/types.h> +#ifndef __KERNEL__ +#include <netinet/in.h> +#include <sys/socket.h> +#endif + +#ifdef __cplusplus +extern "C" +{ +#endif + +/* IANA-assigned Internet Protocol number for Homa. */ +#define IPPROTO_HOMA 146 + +/** + * define HOMA_MAX_MESSAGE_LENGTH - Maximum bytes of payload in a Homa + * request or response message. + */ +#define HOMA_MAX_MESSAGE_LENGTH 1000000 + +/** + * define HOMA_BPAGE_SIZE - Number of bytes in pages used for receive + * buffers. Must be power of two. + */ +#define HOMA_BPAGE_SHIFT 16 +#define HOMA_BPAGE_SIZE (1 << HOMA_BPAGE_SHIFT) + +/** + * define HOMA_MAX_BPAGES: The largest number of bpages that will be required + * to store an incoming message. + */ +#define HOMA_MAX_BPAGES ((HOMA_MAX_MESSAGE_LENGTH + HOMA_BPAGE_SIZE - 1) \ + >> HOMA_BPAGE_SHIFT) + +/** + * define HOMA_MIN_DEFAULT_PORT - The 16-bit port space is divided into + * two nonoverlapping regions. Ports 1-32767 are reserved exclusively + * for well-defined server ports. The remaining ports are used for client + * ports; these are allocated automatically by Homa. Port 0 is reserved. + */ +#define HOMA_MIN_DEFAULT_PORT 0x8000 + +/** + * Holds either an IPv4 or IPv6 address (smaller and easier to use than + * sockaddr_storage). + */ +union sockaddr_in_union { + struct sockaddr sa; + struct sockaddr_in in4; + struct sockaddr_in6 in6; +}; + +/** + * struct homa_sendmsg_args - Provides information needed by Homa's + * sendmsg; passed to sendmsg using the msg_control field. + */ +struct homa_sendmsg_args { + /** + * @id: (in/out) An initial value of 0 means a new request is + * being sent; nonzero means the message is a reply to the given + * id. If the message is a request, then the value is modified to + * hold the id of the new RPC. + */ + uint64_t id; + + /** + * @completion_cookie: (in) Used only for request messages; will be + * returned by recvmsg when the RPC completes. Typically used to + * locate app-specific info about the RPC. + */ + uint64_t completion_cookie; +}; + +#if !defined(__cplusplus) +_Static_assert(sizeof(struct homa_sendmsg_args) >= 16, + "homa_sendmsg_args shrunk"); +_Static_assert(sizeof(struct homa_sendmsg_args) <= 16, + "homa_sendmsg_args grew"); +#endif + +/** + * struct homa_recvmsg_args - Provides information needed by Homa's + * recvmsg; passed to recvmsg using the msg_control field. + */ +struct homa_recvmsg_args { + /** + * @id: (in/out) Initially specifies the id of the desired RPC, or 0 + * if any RPC is OK; returns the actual id received. + */ + uint64_t id; + + /** + * @completion_cookie: (out) If the incoming message is a response, + * this will return the completion cookie specified when the + * request was sent. For requests this will always be zero. + */ + uint64_t completion_cookie; + + /** + * @flags: (in) OR-ed combination of bits that control the operation. + * See below for values. + */ + int flags; + + /** + * @error_addr: the address of the peer is stored here when available. + * This field is different from the msg_name field in struct msghdr + * in that the msg_name field isn't set after errors. This field will + * always be set when peer information is available, which includes + * some error cases. + */ + union sockaddr_in_union peer_addr; + + /** + * @num_bpages: (in/out) Number of valid entries in @bpage_offsets. + * Passes in bpages from previous messages that can now be + * recycled; returns bpages from the new message. + */ + uint32_t num_bpages; + + uint32_t _pad[1]; + + /** + * @bpage_offsets: (in/out) Each entry is an offset into the buffer + * region for the socket pool. When returned from recvmsg, the + * offsets indicate where fragments of the new message are stored. All + * entries but the last refer to full buffer pages (HOMA_BPAGE_SIZE bytes) + * and are bpage-aligned. The last entry may refer to a bpage fragment and + * is not necessarily aligned. The application now owns these bpages and + * must eventually return them to Homa, using bpage_offsets in a future + * recvmsg invocation. + */ + uint32_t bpage_offsets[HOMA_MAX_BPAGES]; +}; + +#if !defined(__cplusplus) +_Static_assert(sizeof(struct homa_recvmsg_args) >= 120, + "homa_recvmsg_args shrunk"); +_Static_assert(sizeof(struct homa_recvmsg_args) <= 120, + "homa_recvmsg_args grew"); +#endif + +/* Flag bits for homa_recvmsg_args.flags (see man page for documentation): + */ +#define HOMA_RECVMSG_REQUEST 0x01 +#define HOMA_RECVMSG_RESPONSE 0x02 +#define HOMA_RECVMSG_NONBLOCKING 0x04 +#define HOMA_RECVMSG_VALID_FLAGS 0x07 + +/** define SO_HOMA_SET_BUF: setsockopt option for specifying buffer region. */ +#define SO_HOMA_SET_BUF 10 + +/** struct homa_set_buf - setsockopt argument for SO_HOMA_SET_BUF. */ +struct homa_set_buf_args { + /** @start: First byte of buffer region. */ + void *start; + + /** @length: Total number of bytes available at @start. */ + size_t length; +}; + +/** + * Meanings of the bits in Homa's flag word, which can be set using + * "sysctl /net/homa/flags". + */ + +/** + * Disable the output throttling mechanism: always send all packets + * immediately. + */ +#define HOMA_FLAG_DONT_THROTTLE 2 + +int homa_send(int sockfd, const void *message_buf, + size_t length, const union sockaddr_in_union *dest_addr, + uint64_t *id, uint64_t completion_cookie); +int homa_sendv(int sockfd, const struct iovec *iov, + int iovcnt, const union sockaddr_in_union *dest_addr, + uint64_t *id, uint64_t completion_cookie); +ssize_t homa_reply(int sockfd, const void *message_buf, + size_t length, const union sockaddr_in_union *dest_addr, + uint64_t id); +ssize_t homa_replyv(int sockfd, const struct iovec *iov, + int iovcnt, const union sockaddr_in_union *dest_addr, + uint64_t id); + +#ifdef __cplusplus +} +#endif + +#endif /* _UAPI_LINUX_HOMA_H */
Note: for man pages, see the Homa Wiki at: https://homa-transport.atlassian.net/wiki/spaces/HOMA/overview Signed-off-by: John Ousterhout <ouster@cs.stanford.edu> --- include/uapi/linux/homa.h | 199 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 199 insertions(+) create mode 100644 include/uapi/linux/homa.h