Message ID | 20250308-p9_conn_err_benign_data_race-v1-1-729e57d5832b@iencinas.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | 9p/trans_fd: mark concurrent read and writes to p9_conn->err | expand |
Ignacio Encinas wrote on Sat, Mar 08, 2025 at 06:47:38PM +0100: > Writes for the error value of a connection are spinlock-protected inside > p9_conn_cancel, but lockless reads are present elsewhere to avoid > performing unnecessary work after an error has been met. > > Mark the write and lockless reads to make KCSAN happy. Mark the write as > exclusive following the recommendation in "Lock-Protected Writes with > Lockless Reads" in tools/memory-model/Documentation/access-marking.txt > while we are at it. Thank for looking into it, I wasn't aware this could be enough to please the KCSAN gods. Unfortunately neither have a repro so will be hard to test but I guess it can't hurt, so will pick this up after a bit. > Reported-by: syzbot+d69a7cc8c683c2cb7506@syzkaller.appspotmail.com > Reported-by: syzbot+483d6c9b9231ea7e1851@syzkaller.appspotmail.com > Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> > --- > Hello! I noticed these syzbot reports that seem to repeat periodically > and figured I should send a patch. > > The read-paths look very similar to the one changed here [1]. Perhaps it > would make sense to make them the same? I've just gone over read/write work and I think overall the logic doesn't look too bad as the checks for m->err are just optimizations that could be skipped entierly. For example, even if read work misses the check and recv some data, the p9_tag_lookup is what actually protects the "req", so either cancel didn't cancel yet and it'll get two status updates but it's valid memory and the refcounting is also correct, or the cancel was finished and read won't find the request. (I guess one could argue that two status updates could be a problem in the p9_client_rpc path, but the data actually has been received and the mount is busted anyway so I don't think any bad bug would happen.. Famous last words, yes) Write likewise will just find itself with nothing to do as the list had been emptied (and p9_fd_request does check m->err under lock so can't add new items) So, sure, they could recheck but I don't see the point; if syzbot is happy with this patch I think that's good enough. > [1] https://lore.kernel.org/all/ZTZtHdqifXlWG8nN@codewreck.org/ > --- > net/9p/trans_fd.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > index 196060dc6138af10e99ad04a76ee36a11f770c65..5458e6530084cabeb01d13e9b9a4b1b8f338e494 100644 > --- a/net/9p/trans_fd.c > +++ b/net/9p/trans_fd.c > @@ -194,9 +194,10 @@ static void p9_conn_cancel(struct p9_conn *m, int err) > if (m->err) { This is under spin lock and I don't see the compiler reordering this read and write, but should this also get READ_ONCE? > spin_unlock(&m->req_lock); > return; > } > > - m->err = err; > + WRITE_ONCE(m->err, err); > + ASSERT_EXCLUSIVE_WRITER(m->err); Thanks,
On 8/3/25 23:08, Dominique Martinet wrote: > Thank for looking into it, I wasn't aware this could be enough to please > the KCSAN gods. Thank you for reviewing it! > I've just gone over read/write work and I think overall the logic > doesn't look too bad as the checks for m->err are just optimizations > that could be skipped entierly. That was my impression too. Thanks for confirming! As far as I know, this is as non-problematic as it gets. > So, sure, they could recheck but I don't see the point; if syzbot is > happy with this patch I think that's good enough. I think KCSAN shouldn't complain anymore. However, let me send a v2: >> [1] https://lore.kernel.org/all/ZTZtHdqifXlWG8nN@codewreck.org/ I last-minute edited this snippet because it looks like it should be like the rest of them (just a READ_ONCE, no spinlock) @@ -673,7 +674,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) spin_lock(&m->req_lock); - if (m->err < 0) { + if (READ_ONCE(m->err) < 0) { spin_unlock(&m->req_lock); return m->err; } but as I left it, it doesn't make any sense. It's either a racy read + READ_ONCE to make KCSAN happy or a protected read which shouldn't be a problem. I'll just drop this hunk and leave it as it was. >> --- >> net/9p/trans_fd.c | 11 ++++++----- >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c >> index 196060dc6138af10e99ad04a76ee36a11f770c65..5458e6530084cabeb01d13e9b9a4b1b8f338e494 100644 >> --- a/net/9p/trans_fd.c >> +++ b/net/9p/trans_fd.c >> @@ -194,9 +194,10 @@ static void p9_conn_cancel(struct p9_conn *m, int err) >> if (m->err) { > > This is under spin lock and I don't see the compiler reordering this > read and write, but should this also get READ_ONCE? It wouldn't hurt, but I don't think it would do anything. spin_lock and spin_unlock both emit compiler barriers so that code can't be moved out of critical sections (apart from doing actual locking, release-acquire ordering ...). I guess the only function of a READ_ONCE here would be to ensure atomicity of the read, but 1) There are no concurrent writes when this read is happening due to the spinlock being locked 2) Getting the load teared is almost impossible(?) as it is an aligned 4 byte read. Even if the load got garbage, we would just return without reading the actual value. I'll wait a couple of days to send the v2 in case there is any more feedback. Thanks again!
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 196060dc6138af10e99ad04a76ee36a11f770c65..5458e6530084cabeb01d13e9b9a4b1b8f338e494 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -196,7 +196,8 @@ static void p9_conn_cancel(struct p9_conn *m, int err) return; } - m->err = err; + WRITE_ONCE(m->err, err); + ASSERT_EXCLUSIVE_WRITER(m->err); list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { list_move(&req->req_list, &cancel_list); @@ -283,7 +284,7 @@ static void p9_read_work(struct work_struct *work) m = container_of(work, struct p9_conn, rq); - if (m->err < 0) + if (READ_ONCE(m->err) < 0) return; p9_debug(P9_DEBUG_TRANS, "start mux %p pos %zd\n", m, m->rc.offset); @@ -450,7 +451,7 @@ static void p9_write_work(struct work_struct *work) m = container_of(work, struct p9_conn, wq); - if (m->err < 0) { + if (READ_ONCE(m->err) < 0) { clear_bit(Wworksched, &m->wsched); return; } @@ -622,7 +623,7 @@ static void p9_poll_mux(struct p9_conn *m) __poll_t n; int err = -ECONNRESET; - if (m->err < 0) + if (READ_ONCE(m->err) < 0) return; n = p9_fd_poll(m->client, NULL, &err); @@ -673,7 +674,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) spin_lock(&m->req_lock); - if (m->err < 0) { + if (READ_ONCE(m->err) < 0) { spin_unlock(&m->req_lock); return m->err; }
Writes for the error value of a connection are spinlock-protected inside p9_conn_cancel, but lockless reads are present elsewhere to avoid performing unnecessary work after an error has been met. Mark the write and lockless reads to make KCSAN happy. Mark the write as exclusive following the recommendation in "Lock-Protected Writes with Lockless Reads" in tools/memory-model/Documentation/access-marking.txt while we are at it. Reported-by: syzbot+d69a7cc8c683c2cb7506@syzkaller.appspotmail.com Reported-by: syzbot+483d6c9b9231ea7e1851@syzkaller.appspotmail.com Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> --- Hello! I noticed these syzbot reports that seem to repeat periodically and figured I should send a patch. The read-paths look very similar to the one changed here [1]. Perhaps it would make sense to make them the same? [1] https://lore.kernel.org/all/ZTZtHdqifXlWG8nN@codewreck.org/ --- net/9p/trans_fd.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) --- base-commit: 2a520073e74fbb956b5564818fc5529dcc7e9f0e change-id: 20250308-p9_conn_err_benign_data_race-2758fe8bbed0 Best regards,