mbox series

[dlm/next,00/12] dlm: net-namespace functionality

Message ID 20240819183742.2263895-1-aahringo@redhat.com (mailing list archive)
Headers show
Series dlm: net-namespace functionality | expand

Message

Alexander Aring Aug. 19, 2024, 6:37 p.m. UTC
Hi,

this patch-series is huge but brings a lot of basic "fun" net-namespace
functionality to DLM. Currently you need a couple of Linux kernel
instances running in e.g. Virtual Machines. With this patch-series I
want to break out of this virtual machine world dealing with multiple
kernels need to boot them all individually, etc. Now you can use DLM in
only one Linux kernel instance and each "node" (previously represented
by a virtual machine) is separate by a net-namespace. Why
net-namespaces? It just fits to the DLM design for now, you need to have
them anyway because the internal DLM socket handling on a per node
basis. What we do additionally is to separate the DLM lockspaces (the
lockspace that is being registered) by net-namespaces as this represents
a "network entity" (node). There might be reasons to introduce a
complete new kind of namespaces (locking namespace?) but I don't want to
do this step now and as I said net-namespaces are required anyway for
the DLM sockets.

You need some new user space tooling as a new netlink net-namespace
aware UAPI is introduced (but can co-exist with configfs that operates
on init_net only). See [0] for more steps, there is a copr repo for the
new tooling and can be enabled by:

$ dnf copr enable aring/nldlm
$ dnf install nldlm

or compile it yourself.

Then there is currently a very simple script [1] to show a 3 nodes
cluster
using gfs2 on a multiple loop block devices on a shared loop block
device
image (sounds weird but I do something like that). There are currently
some user space synchronization issues that I solve by simple sleeps,
but
they are only user space problems.

To test it I recommend some virtual machine "but only one" and run the
[1] script. Afterwards you have in your executed net-namespace the 3
mountpoints /cluster/node1, /cluster/node2/ and /cluster/node3. Any vfs
operations on those mountpoints acts as a per node entity operation.

We can use it for testing, development and also scale testing to have a
large number of nodes joining a lockspace (which seems to be a problem
right now). Instead of running 1000 vms, we can run 1000 net-namespaces
in a more resource limited environment. For me it seems gfs2 can handle
several mounts and still separate the resource according their global
variables. Their data structures e.g. glock hash seems to have in their
key a separation for that (fsid?). However this is still an experimental
feature we might run into issues that requires more separation related
to net-namespaces. However basic testing seems to run just fine.

Limitations

I disable any functionality for the DLM character device that allow
plock handling or do DLM locking from user space. Just don't use any
plock locking in gfs2 for now. But basic vfs operations should work. You
can even sniff DLM traffic on the created "dlmsw" virtual bridge.

- Alex

[0] https://gitlab.com/netcoder/nldlm
[1] https://gitlab.com/netcoder/gfs2ns-examples/-/blob/main/three_nodes

changes since PATCH:
 - add comments for lib/kobject.c
 - add Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
   for kobject patches
 - add more people, netdev ml in cc

Alexander Aring (12):
  dlm: introduce dlm_find_lockspace_name()
  dlm: disallow different configs nodeid storages
  dlm: add struct net to dlm_new_lockspace()
  dlm: handle port as __be16 network byte order
  dlm: use dlm_config as only cluster configuration
  dlm: dlm_config_info config fields to unsigned int
  dlm: rename config to configfs
  kobject: add kset_type_create_and_add() helper
  kobject: export generic helper ops
  dlm: separate dlm lockspaces per net-namespace
  dlm: add nldlm net-namespace aware UAPI
  gfs2: separate mount context by net-namespaces

 drivers/md/md-cluster.c |    3 +-
 fs/dlm/Makefile         |    2 +
 fs/dlm/config.c         | 1291 +++++++++++++++----------------------
 fs/dlm/config.h         |  215 +++++--
 fs/dlm/configfs.c       |  882 ++++++++++++++++++++++++++
 fs/dlm/configfs.h       |   19 +
 fs/dlm/debug_fs.c       |   24 +-
 fs/dlm/dir.c            |    4 +-
 fs/dlm/dlm_internal.h   |   24 +-
 fs/dlm/lock.c           |   64 +-
 fs/dlm/lock.h           |    3 +-
 fs/dlm/lockspace.c      |  220 ++++---
 fs/dlm/lockspace.h      |   12 +-
 fs/dlm/lowcomms.c       |  525 ++++++++--------
 fs/dlm/lowcomms.h       |   29 +-
 fs/dlm/main.c           |    5 -
 fs/dlm/member.c         |   36 +-
 fs/dlm/midcomms.c       |  287 ++++-----
 fs/dlm/midcomms.h       |   31 +-
 fs/dlm/nldlm.c          | 1330 +++++++++++++++++++++++++++++++++++++++
 fs/dlm/nldlm.h          |  176 ++++++
 fs/dlm/plock.c          |    2 +-
 fs/dlm/rcom.c           |   16 +-
 fs/dlm/rcom.h           |    3 +-
 fs/dlm/recover.c        |   17 +-
 fs/dlm/user.c           |   63 +-
 fs/dlm/user.h           |    2 +-
 fs/gfs2/glock.c         |    8 +
 fs/gfs2/incore.h        |    2 +
 fs/gfs2/lock_dlm.c      |    6 +-
 fs/gfs2/ops_fstype.c    |    5 +
 fs/gfs2/sys.c           |   27 +-
 fs/ocfs2/stack_user.c   |    2 +-
 include/linux/dlm.h     |    9 +-
 include/linux/kobject.h |   10 +-
 lib/kobject.c           |   65 +-
 36 files changed, 3955 insertions(+), 1464 deletions(-)
 create mode 100644 fs/dlm/configfs.c
 create mode 100644 fs/dlm/configfs.h
 create mode 100644 fs/dlm/nldlm.c
 create mode 100644 fs/dlm/nldlm.h