Message ID | 20220824094029.1634519-50-bmeng.cn@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tests/qtest: Enable running qtest on Windows | expand |
Hi On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng.cn@gmail.com> wrote: > From: Bin Meng <bin.meng@windriver.com> > > Random failure was observed when running qtests on Windows due to > "Broken pipe" detected by qmp_fd_receive(). What happened is that > the qtest executable sends testing data over a socket to the QEMU > under test but no response is received. The errno of the recv() > call from the qtest executable indicates ETIMEOUT, due to the qmp > chardev's tcp_chr_read() is never called to receive testing data > hence no response is sent to the other side. > > tcp_chr_read() is registered as the callback of the socket watch > GSource. The reason of the callback not being called by glib, is > that the source check fails to indicate the source is ready. There > are two socket watch sources created to monitor the same socket > event object from the char-socket backend in update_ioc_handlers(). > During the source check phase, qio_channel_socket_source_check() > calls WSAEnumNetworkEvents() to discovers occurrences of network > events for the indicated socket, clear internal network event records, > and reset the event object. Testing shows that if we don't reset the > event object by not passing the event handle to WSAEnumNetworkEvents() > the symptom goes away and qtest runs very stably. > > It looks we don't need to call WSAEnumNetworkEvents() at all, as we > don't parse the result of WSANETWORKEVENTS returned from this API. > We use select() to poll the socket status. Fix this instability by > dropping the WSAEnumNetworkEvents() call. > > Signed-off-by: Bin Meng <bin.meng@windriver.com> > What clears the event then? > --- > During the testing, I removed the following codes in update_ioc_handlers(): > > remove_hup_source(s); > s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); > g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, > chr, NULL); > g_source_attach(s->hup_source, chr->gcontext); > > and such change also makes the symptom go away. > > And if I moved the above codes to the beginning, before the call to > io_add_watch_poll(), the symptom also goes away. > > It seems two sources watching on the same socket event object is > the key that leads to the instability. The order of adding a source > watch seems to also play a role but I can't explain why. > Hopefully a Windows and glib expert could explain this behavior. > > Feel free to leave that comment in the commit message. This is strange, as both sources should have different events, clearing one shouldn't affect the other. I guess it's WSAEnumNetworkEvents clearing of the internal network event records that is problematic. Can you check if you replace the call with ResetEvent() everything works? > io/channel-watch.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/io/channel-watch.c b/io/channel-watch.c > index 89f3c8a88a..e34d86e810 100644 > --- a/io/channel-watch.c > +++ b/io/channel-watch.c > @@ -115,17 +115,13 @@ static gboolean > qio_channel_socket_source_check(GSource *source) > { > static struct timeval tv0; > - > QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source; > - WSANETWORKEVENTS ev; > fd_set rfds, wfds, xfds; > > if (!ssource->condition) { > return 0; > } > > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > - > FD_ZERO(&rfds); > FD_ZERO(&wfds); > FD_ZERO(&xfds); > Unrelated, after this chunk, there is FD_SET((SOCKET)ssource->socket, &rfds); it seems we can drop the cast. Feel free to send another patch.
On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau <marcandre.lureau@gmail.com> wrote: > > Hi > > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng.cn@gmail.com> wrote: >> >> From: Bin Meng <bin.meng@windriver.com> >> >> Random failure was observed when running qtests on Windows due to >> "Broken pipe" detected by qmp_fd_receive(). What happened is that >> the qtest executable sends testing data over a socket to the QEMU >> under test but no response is received. The errno of the recv() >> call from the qtest executable indicates ETIMEOUT, due to the qmp >> chardev's tcp_chr_read() is never called to receive testing data >> hence no response is sent to the other side. >> >> tcp_chr_read() is registered as the callback of the socket watch >> GSource. The reason of the callback not being called by glib, is >> that the source check fails to indicate the source is ready. There >> are two socket watch sources created to monitor the same socket >> event object from the char-socket backend in update_ioc_handlers(). >> >> During the source check phase, qio_channel_socket_source_check() >> calls WSAEnumNetworkEvents() to discovers occurrences of network >> events for the indicated socket, clear internal network event records, >> and reset the event object. Testing shows that if we don't reset the >> event object by not passing the event handle to WSAEnumNetworkEvents() >> the symptom goes away and qtest runs very stably. >> >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we >> don't parse the result of WSANETWORKEVENTS returned from this API. >> We use select() to poll the socket status. Fix this instability by >> dropping the WSAEnumNetworkEvents() call. >> >> Signed-off-by: Bin Meng <bin.meng@windriver.com> > > > What clears the event then? > It seems we don't need to clear the event as everything still works as expected. >> >> --- >> During the testing, I removed the following codes in update_ioc_handlers(): >> >> remove_hup_source(s); >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, >> chr, NULL); >> g_source_attach(s->hup_source, chr->gcontext); >> >> and such change also makes the symptom go away. >> >> And if I moved the above codes to the beginning, before the call to >> io_add_watch_poll(), the symptom also goes away. >> >> It seems two sources watching on the same socket event object is >> the key that leads to the instability. The order of adding a source >> watch seems to also play a role but I can't explain why. >> Hopefully a Windows and glib expert could explain this behavior. >> > > Feel free to leave that comment in the commit message. Sure, will add the above message into the commit in v2. > > This is strange, as both sources should have different events, clearing one shouldn't affect the other. Both sources have the same event, as one QIO channel only has one socket, and one socket can only be bound to one event. > > I guess it's WSAEnumNetworkEvents clearing of the internal network event records that is problematic. > > Can you check if you replace the call with ResetEvent() everything works? No, ResetEvent() does not work, and is not recommended by MSDN [1] too, which says: The proper way to reset the state of an event object used with the WSAEventSelect function is to pass the handle of the event object to the WSAEnumNetworkEvents function in the hEventObject parameter. This will reset the event object and adjust the status of active FD events on the socket in an atomic fashion. [1] https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect > > >> >> io/channel-watch.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/io/channel-watch.c b/io/channel-watch.c >> index 89f3c8a88a..e34d86e810 100644 >> --- a/io/channel-watch.c >> +++ b/io/channel-watch.c >> @@ -115,17 +115,13 @@ static gboolean >> qio_channel_socket_source_check(GSource *source) >> { >> static struct timeval tv0; >> - >> QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source; >> - WSANETWORKEVENTS ev; >> fd_set rfds, wfds, xfds; >> >> if (!ssource->condition) { >> return 0; >> } >> >> - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); >> - >> FD_ZERO(&rfds); >> FD_ZERO(&wfds); >> FD_ZERO(&xfds); > > > Unrelated, after this chunk, there is > FD_SET((SOCKET)ssource->socket, &rfds); > > it seems we can drop the cast. Feel free to send another patch. > Yeah, this cast is unnecessary. Will spin a new patch in v2. Thanks! Regards, Bin
Hi On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng.cn@gmail.com> wrote: > On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau > <marcandre.lureau@gmail.com> wrote: > > > > Hi > > > > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng.cn@gmail.com> wrote: > >> > >> From: Bin Meng <bin.meng@windriver.com> > >> > >> Random failure was observed when running qtests on Windows due to > >> "Broken pipe" detected by qmp_fd_receive(). What happened is that > >> the qtest executable sends testing data over a socket to the QEMU > >> under test but no response is received. The errno of the recv() > >> call from the qtest executable indicates ETIMEOUT, due to the qmp > >> chardev's tcp_chr_read() is never called to receive testing data > >> hence no response is sent to the other side. > >> > >> tcp_chr_read() is registered as the callback of the socket watch > >> GSource. The reason of the callback not being called by glib, is > >> that the source check fails to indicate the source is ready. There > >> are two socket watch sources created to monitor the same socket > >> event object from the char-socket backend in update_ioc_handlers(). > >> > >> During the source check phase, qio_channel_socket_source_check() > >> calls WSAEnumNetworkEvents() to discovers occurrences of network > >> events for the indicated socket, clear internal network event records, > >> and reset the event object. Testing shows that if we don't reset the > >> event object by not passing the event handle to WSAEnumNetworkEvents() > >> the symptom goes away and qtest runs very stably. > >> > >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we > >> don't parse the result of WSANETWORKEVENTS returned from this API. > >> We use select() to poll the socket status. Fix this instability by > >> dropping the WSAEnumNetworkEvents() call. > >> > >> Signed-off-by: Bin Meng <bin.meng@windriver.com> > > > > > > What clears the event then? > > > > It seems we don't need to clear the event as everything still works as > expected. > Well, it can "work" but are you sure it doesn't have a busy loop? >> > >> --- > >> During the testing, I removed the following codes in > update_ioc_handlers(): > >> > >> remove_hup_source(s); > >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); > >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, > >> chr, NULL); > >> g_source_attach(s->hup_source, chr->gcontext); > >> > >> and such change also makes the symptom go away. > >> > >> And if I moved the above codes to the beginning, before the call to > >> io_add_watch_poll(), the symptom also goes away. > >> > >> It seems two sources watching on the same socket event object is > >> the key that leads to the instability. The order of adding a source > >> watch seems to also play a role but I can't explain why. > >> Hopefully a Windows and glib expert could explain this behavior. > >> > > > > Feel free to leave that comment in the commit message. > > Sure, will add the above message into the commit in v2. > > > > > This is strange, as both sources should have different events, clearing > one shouldn't affect the other. > > Both sources have the same event, as one QIO channel only has one > socket, and one socket can only be bound to one event. > "one socket can only be bound to one event", is that a WSAEventSelect limitation? > > > > I guess it's WSAEnumNetworkEvents clearing of the internal network event > records that is problematic. > > > > Can you check if you replace the call with ResetEvent() everything works? > > No, ResetEvent() does not work, and is not recommended by MSDN [1] > too, which says: > It probably works to some extent (I see glib is using it https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What you mean is that it doesn't solve the issue, I guess. > > The proper way to reset the state of an event object used with the > WSAEventSelect function is to pass the handle of the event object to > the WSAEnumNetworkEvents function in the hEventObject parameter. This > will reset the event object and adjust the status of active FD events > on the socket in an atomic fashion. > > This is not what you want though if you have multiple event objects for the same socket. > [1] > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect > > > > > > >> > >> io/channel-watch.c | 4 ---- > >> 1 file changed, 4 deletions(-) > >> > >> diff --git a/io/channel-watch.c b/io/channel-watch.c > >> index 89f3c8a88a..e34d86e810 100644 > >> --- a/io/channel-watch.c > >> +++ b/io/channel-watch.c > >> @@ -115,17 +115,13 @@ static gboolean > >> qio_channel_socket_source_check(GSource *source) > >> { > >> static struct timeval tv0; > >> - > >> QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source; > >> - WSANETWORKEVENTS ev; > >> fd_set rfds, wfds, xfds; > >> > >> if (!ssource->condition) { > >> return 0; > >> } > >> > >> - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > >> - > >> FD_ZERO(&rfds); > >> FD_ZERO(&wfds); > >> FD_ZERO(&xfds); > > > > > > Unrelated, after this chunk, there is > > FD_SET((SOCKET)ssource->socket, &rfds); > > > > it seems we can drop the cast. Feel free to send another patch. > > > > Yeah, this cast is unnecessary. Will spin a new patch in v2. Thanks! > > Regards, > Bin >
On Mon, Sep 5, 2022 at 2:04 PM Marc-André Lureau <marcandre.lureau@gmail.com> wrote: > > Hi > > On Sun, Sep 4, 2022 at 10:24 AM Bin Meng <bmeng.cn@gmail.com> wrote: >> >> On Thu, Sep 1, 2022 at 8:58 PM Marc-André Lureau >> <marcandre.lureau@gmail.com> wrote: >> > >> > Hi >> > >> > On Wed, Aug 24, 2022 at 2:49 PM Bin Meng <bmeng.cn@gmail.com> wrote: >> >> >> >> From: Bin Meng <bin.meng@windriver.com> >> >> >> >> Random failure was observed when running qtests on Windows due to >> >> "Broken pipe" detected by qmp_fd_receive(). What happened is that >> >> the qtest executable sends testing data over a socket to the QEMU >> >> under test but no response is received. The errno of the recv() >> >> call from the qtest executable indicates ETIMEOUT, due to the qmp >> >> chardev's tcp_chr_read() is never called to receive testing data >> >> hence no response is sent to the other side. >> >> >> >> tcp_chr_read() is registered as the callback of the socket watch >> >> GSource. The reason of the callback not being called by glib, is >> >> that the source check fails to indicate the source is ready. There >> >> are two socket watch sources created to monitor the same socket >> >> event object from the char-socket backend in update_ioc_handlers(). >> >> >> >> During the source check phase, qio_channel_socket_source_check() >> >> calls WSAEnumNetworkEvents() to discovers occurrences of network >> >> events for the indicated socket, clear internal network event records, >> >> and reset the event object. Testing shows that if we don't reset the >> >> event object by not passing the event handle to WSAEnumNetworkEvents() >> >> the symptom goes away and qtest runs very stably. >> >> >> >> It looks we don't need to call WSAEnumNetworkEvents() at all, as we >> >> don't parse the result of WSANETWORKEVENTS returned from this API. >> >> We use select() to poll the socket status. Fix this instability by >> >> dropping the WSAEnumNetworkEvents() call. >> >> >> >> Signed-off-by: Bin Meng <bin.meng@windriver.com> >> > >> > >> > What clears the event then? >> > >> >> It seems we don't need to clear the event as everything still works as expected. > > > Well, it can "work" but are you sure it doesn't have a busy loop? You mean busy loop in g_poll()? >> >> >> >> --- >> >> During the testing, I removed the following codes in update_ioc_handlers(): >> >> >> >> remove_hup_source(s); >> >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); >> >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, >> >> chr, NULL); >> >> g_source_attach(s->hup_source, chr->gcontext); >> >> >> >> and such change also makes the symptom go away. >> >> >> >> And if I moved the above codes to the beginning, before the call to >> >> io_add_watch_poll(), the symptom also goes away. >> >> >> >> It seems two sources watching on the same socket event object is >> >> the key that leads to the instability. The order of adding a source >> >> watch seems to also play a role but I can't explain why. >> >> Hopefully a Windows and glib expert could explain this behavior. >> >> >> > >> > Feel free to leave that comment in the commit message. >> >> Sure, will add the above message into the commit in v2. >> >> > >> > This is strange, as both sources should have different events, clearing one shouldn't affect the other. >> >> Both sources have the same event, as one QIO channel only has one >> socket, and one socket can only be bound to one event. > > > "one socket can only be bound to one event", is that a WSAEventSelect limitation? > Yes, please see the MSDN: It is not possible to specify different event objects for different network events. The following code will not work; the second call will cancel the effects of the first, and only the FD_WRITE network event will be associated with hEventObject2: rc = WSAEventSelect(s, hEventObject1, FD_READ); rc = WSAEventSelect(s, hEventObject2, FD_WRITE); //bad >> >> > >> > I guess it's WSAEnumNetworkEvents clearing of the internal network event records that is problematic. >> > >> > Can you check if you replace the call with ResetEvent() everything works? >> >> No, ResetEvent() does not work, and is not recommended by MSDN [1] >> too, which says: > > > It probably works to some extent (I see glib is using it https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/giowin32.c#L903), What you mean is that it doesn't solve the issue, I guess. Correct, it does not solve the issue. >> >> >> The proper way to reset the state of an event object used with the >> WSAEventSelect function is to pass the handle of the event object to >> the WSAEnumNetworkEvents function in the hEventObject parameter. This >> will reset the event object and adjust the status of active FD events >> on the socket in an atomic fashion. >> > > This is not what you want though if you have multiple event objects for the same socket. > >> >> [1] https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect Regards, Bin
Hi all, I did reach the same issue while trying to connect a gdb to qemu on Windows hosts. Some inputs send by gdb aren't getting correctly pulled, they will be retrieved only once g_poll times out. As you explained earlier, the issue, AFAICT, is that WSAEnumNetworkEvents will reset the internal events before select can detect them. Sadly, I didn't find any way to adjust the current code using select to avoid that. As select and any cleaner (ever WSAEnumNetworkEvents or WSEResetEvent) cannot be called in an atomic way, it seems that events can always be received between the reset and the select. At least, all my attempts failed. The only solution I've found is to move completely to the Windows API and stop calling select. This, however, needs some tricks. In Particular, we need to "remove" the support of GSources having only G_IO_HUP. This is already kind of done as we currently don't detect them in qio_channel_socket_source_check anyway. This is mandatory because of the two GSources created on the same socket. IIRC, the first one (I'll call it the master GSource) is created during the initialization of the channel-socket by update_ioc_handlers and will have only G_IO_HUP to catch up. The second one is created during the "prepare" phase of the first one, in io_watch_poll_prepare. This one will have all the events we want to pull (G_IO_IN here). As they are referring to the same socket, the Windows API cannot know on which GSources we want to catch which events. The solution is then to avoid WSAEnumNetworkEvents for the master GSource which only has G_IO_HUP (or for any GSource having only that). As I said above, the current code doesn't do anything with it anyway. So, IMO, it's safe to do so. I'll send you my patch attached. I was planning to send it in the following weeks anyway. I was just waiting to be sure everything looks fine on our CI. Feel free to test and modify it if needed. PS: I don't know if it will correctly extend if I simply attach it to my mail. If not, tell me I'll simply copy-paste it, even if it might mess up the space/tab stuff. > >> >> > >> >> --- > >> >> During the testing, I removed the following codes in update_ioc_handlers(): > >> >> > >> >> remove_hup_source(s); > >> >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); > >> >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, > >> >> chr, NULL); > >> >> g_source_attach(s->hup_source, chr->gcontext); > >> >> > >> >> and such change also makes the symptom go away. > >> >> > >> >> And if I moved the above codes to the beginning, before the call to > >> >> io_add_watch_poll(), the symptom also goes away. > >> >> > >> >> It seems two sources watching on the same socket event object is > >> >> the key that leads to the instability. The order of adding a source > >> >> watch seems to also play a role but I can't explain why. > >> >> Hopefully a Windows and glib expert could explain this behavior. > >> >> > >> > > >> > Feel free to leave that comment in the commit message. > >> > >> Sure, will add the above message into the commit in v2. > >> > >> > > >> > This is strange, as both sources should have different events, clearing one shouldn't affect the other. > >> > >> Both sources have the same event, as one QIO channel only has one > >> socket, and one socket can only be bound to one event. > > > > > > "one socket can only be bound to one event", is that a WSAEventSelect limitation? > > > > Yes, please see the MSDN: > > It is not possible to specify different event objects for different > network events. The following code will not work; the second call will > cancel the effects of the first, and only the FD_WRITE network event > will be associated with hEventObject2: > > rc = WSAEventSelect(s, hEventObject1, FD_READ); > rc = WSAEventSelect(s, hEventObject2, FD_WRITE); //bad Yes, the Windows API is handled at socket levels. That's why having two GSources on the same sockets is problematic. Note that maybe there is a mix to be done between your patch with the update_ioc_handlers part to be removed and my patch which improves the polling of channel-watch. Thanks, Clément
Hi Clément, On Mon, Sep 5, 2022 at 4:10 PM Clément Chigot <chigot@adacore.com> wrote: > > Hi all, > > I did reach the same issue while trying to connect a gdb to qemu on > Windows hosts. Some inputs send by gdb aren't getting correctly pulled, > they will be retrieved only once g_poll times out. > > As you explained earlier, the issue, AFAICT, is that WSAEnumNetworkEvents > will reset the internal events before select can detect them. No, I don't think WSAEnumNetworkEvents and select cannot be used together. MSDN says: https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select "The select function has no effect on the persistence of socket events registered with WSAAsyncSelect or WSAEventSelect." That sounds to me like current usage in QEMU codes does not have a problem. > Sadly, I didn't find any way to adjust the current code using select to > avoid that. As select and any cleaner (ever WSAEnumNetworkEvents or > WSEResetEvent) cannot be called in an atomic way, it seems that events > can always be received between the reset and the select. At least, all > my attempts failed. According to MSDN: https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect "Having successfully recorded the occurrence of the network event (by setting the corresponding bit in the internal network event record) and signaled the associated event object, no further actions are taken for that network event until the application makes the function call that implicitly reenables the setting of that network event and signaling of the associated event object." So events will be kept unsignaled after they are signaled, until the reenable routine is called. For example, recv() for the FD_READ event. > > The only solution I've found is to move completely to the Windows API > and stop calling select. This, however, needs some tricks. In Particular, we > need to "remove" the support of GSources having only G_IO_HUP. > This is already kind of done as we currently don't detect them in > qio_channel_socket_source_check anyway. > This is mandatory because of the two GSources created on the same socket. > IIRC, the first one (I'll call it the master GSource) is created during > the initialization of the channel-socket by update_ioc_handlers and will > have only G_IO_HUP to catch up. > The second one is created during the "prepare" phase of the first one, > in io_watch_poll_prepare. This one will have all the events we want > to pull (G_IO_IN here). > As they are referring to the same socket, the Windows API cannot know > on which GSources we want to catch which events. The solution is then I think Windows knows which events to catch. As WSAEventSelect() in channel-watch.c::qio_channel_create_socket_watch() tells Windows which event it should monitor. Both the "master" and "child" GSources are watching on the same socket with G_IO_IN condition (see qio_channel_create_socket_watch), and GLib is smart enough to merge these two resources into one in the event handle array which is passed to WaitForMultipleObjectsEx() in g_poll(). I checked your patch, what you did seems to be something one would naturally write, but what is currently in the QEMU sources seems to be written intentionally. +Paolo Bonzini , you are the one who implemented the socket watch on Windows. Could you please help analyze this issue? > to avoid WSAEnumNetworkEvents for the master GSource which only has > G_IO_HUP (or for any GSource having only that). > As I said above, the current code doesn't do anything with it anyway. > So, IMO, it's safe to do so. > > I'll send you my patch attached. I was planning to send it in the following > weeks anyway. I was just waiting to be sure everything looks fine on our > CI. Feel free to test and modify it if needed. I tested your patch. Unfortunately there is still one test case (migration-test.exe) throwing up the "Broken pipe" message. Can you test my patch instead to see if your gdb issue can be fixed? > > PS: I don't know if it will correctly extend if I simply attach it to > my mail. If not, tell me I'll simply copy-paste it, even if it might > mess up the space/tab stuff. > > > >> >> > > >> >> --- > > >> >> During the testing, I removed the following codes in update_ioc_handlers(): > > >> >> > > >> >> remove_hup_source(s); > > >> >> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); > > >> >> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, > > >> >> chr, NULL); > > >> >> g_source_attach(s->hup_source, chr->gcontext); > > >> >> > > >> >> and such change also makes the symptom go away. > > >> >> > > >> >> And if I moved the above codes to the beginning, before the call to > > >> >> io_add_watch_poll(), the symptom also goes away. > > >> >> > > >> >> It seems two sources watching on the same socket event object is > > >> >> the key that leads to the instability. The order of adding a source > > >> >> watch seems to also play a role but I can't explain why. > > >> >> Hopefully a Windows and glib expert could explain this behavior. > > >> >> > > >> > > > >> > Feel free to leave that comment in the commit message. > > >> > > >> Sure, will add the above message into the commit in v2. > > >> > > >> > > > >> > This is strange, as both sources should have different events, clearing one shouldn't affect the other. > > >> > > >> Both sources have the same event, as one QIO channel only has one > > >> socket, and one socket can only be bound to one event. > > > > > > > > > "one socket can only be bound to one event", is that a WSAEventSelect limitation? > > > > > > > Yes, please see the MSDN: > > > > It is not possible to specify different event objects for different > > network events. The following code will not work; the second call will > > cancel the effects of the first, and only the FD_WRITE network event > > will be associated with hEventObject2: > > > > rc = WSAEventSelect(s, hEventObject1, FD_READ); > > rc = WSAEventSelect(s, hEventObject2, FD_WRITE); //bad > > Yes, the Windows API is handled at socket levels. That's why having > two GSources on the same sockets is problematic. > Note that maybe there is a mix to be done between your patch with > the update_ioc_handlers part to be removed and my patch which improves > the polling of channel-watch. Regards, Bin
Hi Bin, > On Mon, Sep 5, 2022 at 4:10 PM Clément Chigot <chigot@adacore.com> wrote: > > > > Hi all, > > > > I did reach the same issue while trying to connect a gdb to qemu on > > Windows hosts. Some inputs send by gdb aren't getting correctly pulled, > > they will be retrieved only once g_poll times out. > > > > As you explained earlier, the issue, AFAICT, is that WSAEnumNetworkEvents > > will reset the internal events before select can detect them. > > No, I don't think WSAEnumNetworkEvents and select cannot be used together. > > MSDN says: > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select > > "The select function has no effect on the persistence of socket events > registered with WSAAsyncSelect or WSAEventSelect." > > That sounds to me like current usage in QEMU codes does not have a problem. Yes, but WSAEnumNetworkEvents has effects. Thus, it might reset the events before select detects them no ? Moreover, I'm not sure what they mean by "persistence of WSAEventSelect". Does that mean that select doesn't change the events wanted to be notified as set by WSAEventSelect or that select isn't resetting the events once retrieved, as you imply ? Moreover, the current code is creating the event object with CreateEvent(NULL, FALSE, FALSE, NULL). The first FALSE implies that we want an auto-reset. The MSDN doc says: "Boolean that specifies whether a manual-reset or auto-reset event object is created. If TRUE, then you must use the ResetEvent function to manually reset the state to nonsignaled. If FALSE, the system automatically resets the state to nonsignaled after a single waiting thread has been released." However, I'm not sure if I understand correctly what "after a single waiting thread has been released." signified. Is the reset occuring after recv() or after select or WSAEnumNetworkEvents calls ? AFAIR, I've tried to move to manual-reset with the current qemu but it doesn't change anything. > > Sadly, I didn't find any way to adjust the current code using select to > > avoid that. As select and any cleaner (ever WSAEnumNetworkEvents or > > WSEResetEvent) cannot be called in an atomic way, it seems that events > > can always be received between the reset and the select. At least, all > > my attempts failed. > > According to MSDN: > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect > > "Having successfully recorded the occurrence of the network event (by > setting the corresponding bit in the internal network event record) > and signaled the associated event object, no further actions are taken > for that network event until the application makes the function call > that implicitly reenables the setting of that network event and > signaling of the associated event object." > > So events will be kept unsignaled after they are signaled, until the > reenable routine is called. For example, recv() for the FD_READ event. In my understanding, WSAEnumNetworkEvents states the opposite: https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaenumnetworkevents "The Windows Sockets provider guarantees that the operations of copying the network event record, clearing it and resetting any associated event object are atomic, such that the next occurrence of a nominated network event will cause the event object to become set." > > The only solution I've found is to move completely to the Windows API > > and stop calling select. This, however, needs some tricks. In Particular, we > > need to "remove" the support of GSources having only G_IO_HUP. > > This is already kind of done as we currently don't detect them in > > qio_channel_socket_source_check anyway. > > This is mandatory because of the two GSources created on the same socket. > > IIRC, the first one (I'll call it the master GSource) is created during > > the initialization of the channel-socket by update_ioc_handlers and will > > have only G_IO_HUP to catch up. > > The second one is created during the "prepare" phase of the first one, > > in io_watch_poll_prepare. This one will have all the events we want > > to pull (G_IO_IN here). > > As they are referring to the same socket, the Windows API cannot know > > on which GSources we want to catch which events. The solution is then > > I think Windows knows which events to catch. As WSAEventSelect() in > channel-watch.c::qio_channel_create_socket_watch() tells Windows which > event it should monitor. > > Both the "master" and "child" GSources are watching on the same socket > with G_IO_IN condition (see qio_channel_create_socket_watch), and GLib > is smart enough to merge these two resources into one in the event > handle array which is passed to WaitForMultipleObjectsEx() in > g_poll(). I've forgotten to mention it. But the current code only fails with newer glib versions. I wasn't able to test all of them but it's working with 2.54 and starts failing with versions after 2.60. The qemu master isn't supporting 2.54 anymore. Thus I've tested that with our internal qemu-6 which runs with 2.54. When upgrading glib, it starts behaving like our issue. Sadly, I wasn't able to test with glib-2.56/2.58 nor to find anything relevant in glib code which would explain our issue. (But I wasn't aware of the two GSources at that time so maybe it's worth investigating again). > I checked your patch, what you did seems to be something one would > naturally write, but what is currently in the QEMU sources seems to be > written intentionally. > > +Paolo Bonzini , you are the one who implemented the socket watch on > Windows. Could you please help analyze this issue? > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > G_IO_HUP (or for any GSource having only that). > > As I said above, the current code doesn't do anything with it anyway. > > So, IMO, it's safe to do so. > > > > I'll send you my patch attached. I was planning to send it in the following > > weeks anyway. I was just waiting to be sure everything looks fine on our > > CI. Feel free to test and modify it if needed. > > I tested your patch. Unfortunately there is still one test case > (migration-test.exe) throwing up the "Broken pipe" message. I must say I didn't fully test it against qemu testsuite yet. Maybe there are some refinements to be done. "Broken pipe" might be linked to the missing G_IO_HUP support. > Can you test my patch instead to see if your gdb issue can be fixed? Yeah sure. I'll try to do it this afternoon.
Hi Clément, On Tue, Sep 6, 2022 at 3:41 PM Clément Chigot <chigot@adacore.com> wrote: > > Hi Bin, > > > On Mon, Sep 5, 2022 at 4:10 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > Hi all, > > > > > > I did reach the same issue while trying to connect a gdb to qemu on > > > Windows hosts. Some inputs send by gdb aren't getting correctly pulled, > > > they will be retrieved only once g_poll times out. > > > > > > As you explained earlier, the issue, AFAICT, is that WSAEnumNetworkEvents > > > will reset the internal events before select can detect them. > > > > No, I don't think WSAEnumNetworkEvents and select cannot be used together. > > > > MSDN says: > > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select > > > > "The select function has no effect on the persistence of socket events > > registered with WSAAsyncSelect or WSAEventSelect." > > > > That sounds to me like current usage in QEMU codes does not have a problem. > > Yes, but WSAEnumNetworkEvents has effects. Thus, it might reset the > events before > select detects them no ? Even if the event *handle* is reset, select can still detect the network event has happened. I think the above MSDN statement is trying to explain this. > Moreover, I'm not sure what they mean by "persistence of WSAEventSelect". > Does that mean that select doesn't change the events wanted to be > notified as set > by WSAEventSelect or that select isn't resetting the events once > retrieved, as you > imply ? My understanding is that select just determines the socket status neither from the event handle associated with the socket, nor does it change anything on the event handle. > Moreover, the current code is creating the event object with > CreateEvent(NULL, FALSE, FALSE, NULL). The first FALSE implies that we want > an auto-reset. The MSDN doc says: > > "Boolean that specifies whether a manual-reset or auto-reset event > object is created. > If TRUE, then you must use the ResetEvent function to manually reset > the state to > nonsignaled. If FALSE, the system automatically resets the state to nonsignaled > after a single waiting thread has been released." Yes, I think this is a bug however when I changed to CreateEvent(NULL, TRUE, FALSE, NULL) or WSACreateEvent() there are still "Broken pipe" errors ... > However, I'm not sure if I understand correctly what "after a single > waiting thread > has been released." signified. Is the reset occuring after recv() or > after select or > WSAEnumNetworkEvents calls ? AFAIR, I've tried to move to manual-reset with > the current qemu but it doesn't change anything. The same here :( > > > > Sadly, I didn't find any way to adjust the current code using select to > > > avoid that. As select and any cleaner (ever WSAEnumNetworkEvents or > > > WSEResetEvent) cannot be called in an atomic way, it seems that events > > > can always be received between the reset and the select. At least, all > > > my attempts failed. > > > > According to MSDN: > > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaeventselect > > > > "Having successfully recorded the occurrence of the network event (by > > setting the corresponding bit in the internal network event record) > > and signaled the associated event object, no further actions are taken > > for that network event until the application makes the function call > > that implicitly reenables the setting of that network event and > > signaling of the associated event object." > > > > So events will be kept unsignaled after they are signaled, until the > > reenable routine is called. For example, recv() for the FD_READ event. > > In my understanding, WSAEnumNetworkEvents states the opposite: > https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsaenumnetworkevents > > "The Windows Sockets provider guarantees that the operations of copying > the network event record, clearing it and resetting any associated event > object are atomic, such that the next occurrence of a nominated network > event will cause the event object to become set." My interpretation of this does not conflict with the WSAEventSelect(). I think they are just two things. > > > > The only solution I've found is to move completely to the Windows API > > > and stop calling select. This, however, needs some tricks. In Particular, we > > > need to "remove" the support of GSources having only G_IO_HUP. > > > This is already kind of done as we currently don't detect them in > > > qio_channel_socket_source_check anyway. > > > This is mandatory because of the two GSources created on the same socket. > > > IIRC, the first one (I'll call it the master GSource) is created during > > > the initialization of the channel-socket by update_ioc_handlers and will > > > have only G_IO_HUP to catch up. > > > The second one is created during the "prepare" phase of the first one, > > > in io_watch_poll_prepare. This one will have all the events we want > > > to pull (G_IO_IN here). > > > As they are referring to the same socket, the Windows API cannot know > > > on which GSources we want to catch which events. The solution is then > > > > I think Windows knows which events to catch. As WSAEventSelect() in > > channel-watch.c::qio_channel_create_socket_watch() tells Windows which > > event it should monitor. > > > > Both the "master" and "child" GSources are watching on the same socket > > with G_IO_IN condition (see qio_channel_create_socket_watch), and GLib > > is smart enough to merge these two resources into one in the event > > handle array which is passed to WaitForMultipleObjectsEx() in > > g_poll(). > > I've forgotten to mention it. But the current code only fails with newer glib > versions. I wasn't able to test all of them but it's working with 2.54 and > starts failing with versions after 2.60. > The qemu master isn't supporting 2.54 anymore. Thus I've tested that with > our internal qemu-6 which runs with 2.54. When upgrading glib, it starts > behaving like our issue. > Sadly, I wasn't able to test with glib-2.56/2.58 nor to find anything relevant > in glib code which would explain our issue. (But I wasn't aware of the > two GSources at that time so maybe it's worth investigating again). > Ah, good to know glib version matters. > > I checked your patch, what you did seems to be something one would > > naturally write, but what is currently in the QEMU sources seems to be > > written intentionally. > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > Windows. Could you please help analyze this issue? > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > G_IO_HUP (or for any GSource having only that). > > > As I said above, the current code doesn't do anything with it anyway. > > > So, IMO, it's safe to do so. > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > CI. Feel free to test and modify it if needed. > > > > I tested your patch. Unfortunately there is still one test case > > (migration-test.exe) throwing up the "Broken pipe" message. > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > some refinements to be done. "Broken pipe" might be linked to the missing > G_IO_HUP support. > > > Can you test my patch instead to see if your gdb issue can be fixed? > > Yeah sure. I'll try to do it this afternoon. Thanks! Regards, Bin
> > > I checked your patch, what you did seems to be something one would > > > naturally write, but what is currently in the QEMU sources seems to be > > > written intentionally. > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > Windows. Could you please help analyze this issue? > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > G_IO_HUP (or for any GSource having only that). > > > > As I said above, the current code doesn't do anything with it anyway. > > > > So, IMO, it's safe to do so. > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > CI. Feel free to test and modify it if needed. > > > > > > I tested your patch. Unfortunately there is still one test case > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > some refinements to be done. "Broken pipe" might be linked to the missing > > G_IO_HUP support. > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > Yeah sure. I'll try to do it this afternoon. I can't explain how mad at me I am... I'm pretty sure your patch was the first thing I've tried when I encountered this issue. But it wasn't working or IIRC the issue went away but that was because the polling was actually disabled (looping indefinitely)...I'm suspecting that I already had changed the CreateEvent for WSACreateEvent which forces you to handle the reset. Finally, I end up struggling reworking the whole check function... But yeah, your patch does work fine on my gdb issues too. And I guess the events are reset when recv() is being called because of the auto-reset feature set up by CreateEvent(). IIUC, what Marc-André means by busy loop is the polling being looping indefinitely as I encountered. I can ensure that this patch doesn't do that. It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. It'll show what g_poll is doing and it's normally always available on Windows. Anyway, we'll wait for Paolo to see if he remembers why he had to call WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might be a good start to improve the whole polling on Windows but if it doesn't work in your case, it then needs some refinements. Thanks, Clément
Hi Clément, On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > I checked your patch, what you did seems to be something one would > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > written intentionally. > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > Windows. Could you please help analyze this issue? > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > G_IO_HUP (or for any GSource having only that). > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > So, IMO, it's safe to do so. > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > G_IO_HUP support. > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > Yeah sure. I'll try to do it this afternoon. > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > thing I've tried when I encountered this issue. But it wasn't working > or IIRC the > issue went away but that was because the polling was actually disabled (looping > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > WSACreateEvent which forces you to handle the reset. > Finally, I end up struggling reworking the whole check function... > But yeah, your patch does work fine on my gdb issues too. Good to know this patch works for you too. > And I guess the events are reset when recv() is being called because of the > auto-reset feature set up by CreateEvent(). > IIUC, what Marc-André means by busy loop is the polling being looping > indefinitely as I encountered. I can ensure that this patch doesn't do that. > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > It'll show what g_poll is doing and it's normally always available on > Windows. > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > be a good start to improve the whole polling on Windows but if it doesn't > work in your case, it then needs some refinements. > Yeah, this issue bugged me quite a lot. If we want to reset the event in qio_channel_socket_source_check(), we will have to do the following to make sure qtests are happy. diff --git a/io/channel-watch.c b/io/channel-watch.c index 43d38494f7..f1e1650b81 100644 --- a/io/channel-watch.c +++ b/io/channel-watch.c @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) return 0; } - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); - FD_ZERO(&rfds); FD_ZERO(&wfds); FD_ZERO(&xfds); @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) ssource->revents |= G_IO_PRI; } + if (ssource->revents) { + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); + } + return ssource->revents; } Removing "if (ssource->revents)" won't work. It seems to me that resetting the event twice (one time with the master Gsource, and the other time with the child GSource) causes some bizarre behavior. But MSDN [1] says "Resetting an event that is already reset has no effect." [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent Regards, Bin
On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > Hi Clément, > > On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > > I checked your patch, what you did seems to be something one would > > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > > written intentionally. > > > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > > Windows. Could you please help analyze this issue? > > > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > > G_IO_HUP (or for any GSource having only that). > > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > > So, IMO, it's safe to do so. > > > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > > G_IO_HUP support. > > > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > > > Yeah sure. I'll try to do it this afternoon. > > > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > > thing I've tried when I encountered this issue. But it wasn't working > > or IIRC the > > issue went away but that was because the polling was actually disabled (looping > > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > > WSACreateEvent which forces you to handle the reset. > > Finally, I end up struggling reworking the whole check function... > > But yeah, your patch does work fine on my gdb issues too. > > Good to know this patch works for you too. > > > And I guess the events are reset when recv() is being called because of the > > auto-reset feature set up by CreateEvent(). > > IIUC, what Marc-André means by busy loop is the polling being looping > > indefinitely as I encountered. I can ensure that this patch doesn't do that. > > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > > It'll show what g_poll is doing and it's normally always available on > > Windows. > > > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > > be a good start to improve the whole polling on Windows but if it doesn't > > work in your case, it then needs some refinements. > > > > Yeah, this issue bugged me quite a lot. If we want to reset the event > in qio_channel_socket_source_check(), we will have to do the following > to make sure qtests are happy. > > diff --git a/io/channel-watch.c b/io/channel-watch.c > index 43d38494f7..f1e1650b81 100644 > --- a/io/channel-watch.c > +++ b/io/channel-watch.c > @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) > return 0; > } > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > - > FD_ZERO(&rfds); > FD_ZERO(&wfds); > FD_ZERO(&xfds); > @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) > ssource->revents |= G_IO_PRI; > } > + if (ssource->revents) { > + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > + } > + > return ssource->revents; > } > > Removing "if (ssource->revents)" won't work. > > It seems to me that resetting the event twice (one time with the > master Gsource, and the other time with the child GSource) causes some > bizarre behavior. But MSDN [1] says > > "Resetting an event that is already reset has no effect." > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > Paolo, any comments about this issue? Regards, Bin
On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > Hi Clément, > > > > On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > > > > I checked your patch, what you did seems to be something one would > > > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > > > written intentionally. > > > > > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > > > Windows. Could you please help analyze this issue? > > > > > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > > > G_IO_HUP (or for any GSource having only that). > > > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > > > So, IMO, it's safe to do so. > > > > > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > > > G_IO_HUP support. > > > > > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > > > > > Yeah sure. I'll try to do it this afternoon. > > > > > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > > > thing I've tried when I encountered this issue. But it wasn't working > > > or IIRC the > > > issue went away but that was because the polling was actually disabled (looping > > > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > > > WSACreateEvent which forces you to handle the reset. > > > Finally, I end up struggling reworking the whole check function... > > > But yeah, your patch does work fine on my gdb issues too. > > > > Good to know this patch works for you too. > > > > > And I guess the events are reset when recv() is being called because of the > > > auto-reset feature set up by CreateEvent(). > > > IIUC, what Marc-André means by busy loop is the polling being looping > > > indefinitely as I encountered. I can ensure that this patch doesn't do that. > > > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > > > It'll show what g_poll is doing and it's normally always available on > > > Windows. > > > > > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > > > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > > > be a good start to improve the whole polling on Windows but if it doesn't > > > work in your case, it then needs some refinements. > > > > > > > Yeah, this issue bugged me quite a lot. If we want to reset the event > > in qio_channel_socket_source_check(), we will have to do the following > > to make sure qtests are happy. > > > > diff --git a/io/channel-watch.c b/io/channel-watch.c > > index 43d38494f7..f1e1650b81 100644 > > --- a/io/channel-watch.c > > +++ b/io/channel-watch.c > > @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) > > return 0; > > } > > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > - > > FD_ZERO(&rfds); > > FD_ZERO(&wfds); > > FD_ZERO(&xfds); > > @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) > > ssource->revents |= G_IO_PRI; > > } > > + if (ssource->revents) { > > + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > + } > > + > > return ssource->revents; > > } > > > > Removing "if (ssource->revents)" won't work. > > > > It seems to me that resetting the event twice (one time with the > > master Gsource, and the other time with the child GSource) causes some > > bizarre behavior. But MSDN [1] says > > > > "Resetting an event that is already reset has no effect." > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > Paolo, any comments about this issue? > v2 series has been sent out, and this patch remains unchanged. Paolo, still would appreciate your comments. Regards, Bin
Hi Paolo, On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > Hi Clément, > > > > > > On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > > > > > > I checked your patch, what you did seems to be something one would > > > > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > > > > written intentionally. > > > > > > > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > > > > Windows. Could you please help analyze this issue? > > > > > > > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > > > > G_IO_HUP (or for any GSource having only that). > > > > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > > > > So, IMO, it's safe to do so. > > > > > > > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > > > > G_IO_HUP support. > > > > > > > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > > > > > > > Yeah sure. I'll try to do it this afternoon. > > > > > > > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > > > > thing I've tried when I encountered this issue. But it wasn't working > > > > or IIRC the > > > > issue went away but that was because the polling was actually disabled (looping > > > > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > > > > WSACreateEvent which forces you to handle the reset. > > > > Finally, I end up struggling reworking the whole check function... > > > > But yeah, your patch does work fine on my gdb issues too. > > > > > > Good to know this patch works for you too. > > > > > > > And I guess the events are reset when recv() is being called because of the > > > > auto-reset feature set up by CreateEvent(). > > > > IIUC, what Marc-André means by busy loop is the polling being looping > > > > indefinitely as I encountered. I can ensure that this patch doesn't do that. > > > > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > > > > It'll show what g_poll is doing and it's normally always available on > > > > Windows. > > > > > > > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > > > > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > > > > be a good start to improve the whole polling on Windows but if it doesn't > > > > work in your case, it then needs some refinements. > > > > > > > > > > Yeah, this issue bugged me quite a lot. If we want to reset the event > > > in qio_channel_socket_source_check(), we will have to do the following > > > to make sure qtests are happy. > > > > > > diff --git a/io/channel-watch.c b/io/channel-watch.c > > > index 43d38494f7..f1e1650b81 100644 > > > --- a/io/channel-watch.c > > > +++ b/io/channel-watch.c > > > @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) > > > return 0; > > > } > > > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > - > > > FD_ZERO(&rfds); > > > FD_ZERO(&wfds); > > > FD_ZERO(&xfds); > > > @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) > > > ssource->revents |= G_IO_PRI; > > > } > > > + if (ssource->revents) { > > > + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > + } > > > + > > > return ssource->revents; > > > } > > > > > > Removing "if (ssource->revents)" won't work. > > > > > > It seems to me that resetting the event twice (one time with the > > > master Gsource, and the other time with the child GSource) causes some > > > bizarre behavior. But MSDN [1] says > > > > > > "Resetting an event that is already reset has no effect." > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > Paolo, any comments about this issue? > > > > v2 series has been sent out, and this patch remains unchanged. > > Paolo, still would appreciate your comments. > Ping? Regards, Bin
Hi Paolo, On Wed, Sep 28, 2022 at 2:10 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > Hi Paolo, > > On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > Hi Clément, > > > > > > > > On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > > > > > > > > I checked your patch, what you did seems to be something one would > > > > > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > > > > > written intentionally. > > > > > > > > > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > > > > > Windows. Could you please help analyze this issue? > > > > > > > > > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > > > > > G_IO_HUP (or for any GSource having only that). > > > > > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > > > > > So, IMO, it's safe to do so. > > > > > > > > > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > > > > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > > > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > > > > > G_IO_HUP support. > > > > > > > > > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > > > > > > > > > Yeah sure. I'll try to do it this afternoon. > > > > > > > > > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > > > > > thing I've tried when I encountered this issue. But it wasn't working > > > > > or IIRC the > > > > > issue went away but that was because the polling was actually disabled (looping > > > > > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > > > > > WSACreateEvent which forces you to handle the reset. > > > > > Finally, I end up struggling reworking the whole check function... > > > > > But yeah, your patch does work fine on my gdb issues too. > > > > > > > > Good to know this patch works for you too. > > > > > > > > > And I guess the events are reset when recv() is being called because of the > > > > > auto-reset feature set up by CreateEvent(). > > > > > IIUC, what Marc-André means by busy loop is the polling being looping > > > > > indefinitely as I encountered. I can ensure that this patch doesn't do that. > > > > > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > > > > > It'll show what g_poll is doing and it's normally always available on > > > > > Windows. > > > > > > > > > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > > > > > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > > > > > be a good start to improve the whole polling on Windows but if it doesn't > > > > > work in your case, it then needs some refinements. > > > > > > > > > > > > > Yeah, this issue bugged me quite a lot. If we want to reset the event > > > > in qio_channel_socket_source_check(), we will have to do the following > > > > to make sure qtests are happy. > > > > > > > > diff --git a/io/channel-watch.c b/io/channel-watch.c > > > > index 43d38494f7..f1e1650b81 100644 > > > > --- a/io/channel-watch.c > > > > +++ b/io/channel-watch.c > > > > @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) > > > > return 0; > > > > } > > > > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > > - > > > > FD_ZERO(&rfds); > > > > FD_ZERO(&wfds); > > > > FD_ZERO(&xfds); > > > > @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) > > > > ssource->revents |= G_IO_PRI; > > > > } > > > > + if (ssource->revents) { > > > > + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > > + } > > > > + > > > > return ssource->revents; > > > > } > > > > > > > > Removing "if (ssource->revents)" won't work. > > > > > > > > It seems to me that resetting the event twice (one time with the > > > > master Gsource, and the other time with the child GSource) causes some > > > > bizarre behavior. But MSDN [1] says > > > > > > > > "Resetting an event that is already reset has no effect." > > > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > > > > Paolo, any comments about this issue? > > > > > > > v2 series has been sent out, and this patch remains unchanged. > > > > Paolo, still would appreciate your comments. > > > > Ping? > Ping? Can you please comment?? Regards, Bin
Hi Paolo, On Thu, Oct 6, 2022 at 11:03 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > Hi Paolo, > > On Wed, Sep 28, 2022 at 2:10 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > Hi Paolo, > > > > On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > Hi Clément, > > > > > > > > > > On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > > > > > > > > > > I checked your patch, what you did seems to be something one would > > > > > > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > > > > > > written intentionally. > > > > > > > > > > > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > > > > > > Windows. Could you please help analyze this issue? > > > > > > > > > > > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > > > > > > G_IO_HUP (or for any GSource having only that). > > > > > > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > > > > > > So, IMO, it's safe to do so. > > > > > > > > > > > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > > > > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > > > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > > > > > > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > > > > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > > > > > > G_IO_HUP support. > > > > > > > > > > > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > > > > > > > > > > > Yeah sure. I'll try to do it this afternoon. > > > > > > > > > > > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > > > > > > thing I've tried when I encountered this issue. But it wasn't working > > > > > > or IIRC the > > > > > > issue went away but that was because the polling was actually disabled (looping > > > > > > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > > > > > > WSACreateEvent which forces you to handle the reset. > > > > > > Finally, I end up struggling reworking the whole check function... > > > > > > But yeah, your patch does work fine on my gdb issues too. > > > > > > > > > > Good to know this patch works for you too. > > > > > > > > > > > And I guess the events are reset when recv() is being called because of the > > > > > > auto-reset feature set up by CreateEvent(). > > > > > > IIUC, what Marc-André means by busy loop is the polling being looping > > > > > > indefinitely as I encountered. I can ensure that this patch doesn't do that. > > > > > > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > > > > > > It'll show what g_poll is doing and it's normally always available on > > > > > > Windows. > > > > > > > > > > > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > > > > > > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > > > > > > be a good start to improve the whole polling on Windows but if it doesn't > > > > > > work in your case, it then needs some refinements. > > > > > > > > > > > > > > > > Yeah, this issue bugged me quite a lot. If we want to reset the event > > > > > in qio_channel_socket_source_check(), we will have to do the following > > > > > to make sure qtests are happy. > > > > > > > > > > diff --git a/io/channel-watch.c b/io/channel-watch.c > > > > > index 43d38494f7..f1e1650b81 100644 > > > > > --- a/io/channel-watch.c > > > > > +++ b/io/channel-watch.c > > > > > @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) > > > > > return 0; > > > > > } > > > > > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > > > - > > > > > FD_ZERO(&rfds); > > > > > FD_ZERO(&wfds); > > > > > FD_ZERO(&xfds); > > > > > @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) > > > > > ssource->revents |= G_IO_PRI; > > > > > } > > > > > + if (ssource->revents) { > > > > > + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > > > + } > > > > > + > > > > > return ssource->revents; > > > > > } > > > > > > > > > > Removing "if (ssource->revents)" won't work. > > > > > > > > > > It seems to me that resetting the event twice (one time with the > > > > > master Gsource, and the other time with the child GSource) causes some > > > > > bizarre behavior. But MSDN [1] says > > > > > > > > > > "Resetting an event that is already reset has no effect." > > > > > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > > > > > > > Paolo, any comments about this issue? > > > > > > > > > > v2 series has been sent out, and this patch remains unchanged. > > > > > > Paolo, still would appreciate your comments. > > > > > > > Ping? > > > > Ping? Can you please comment?? > Ping? Regards, Bin
+more people On Tue, Oct 11, 2022 at 6:42 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > Hi Paolo, > > On Thu, Oct 6, 2022 at 11:03 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > Hi Paolo, > > > > On Wed, Sep 28, 2022 at 2:10 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > Hi Paolo, > > > > > > On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > Hi Clément, > > > > > > > > > > > > On Tue, Sep 6, 2022 at 8:06 PM Clément Chigot <chigot@adacore.com> wrote: > > > > > > > > > > > > > > > > > I checked your patch, what you did seems to be something one would > > > > > > > > > > naturally write, but what is currently in the QEMU sources seems to be > > > > > > > > > > written intentionally. > > > > > > > > > > > > > > > > > > > > +Paolo Bonzini , you are the one who implemented the socket watch on > > > > > > > > > > Windows. Could you please help analyze this issue? > > > > > > > > > > > > > > > > > > > > > to avoid WSAEnumNetworkEvents for the master GSource which only has > > > > > > > > > > > G_IO_HUP (or for any GSource having only that). > > > > > > > > > > > As I said above, the current code doesn't do anything with it anyway. > > > > > > > > > > > So, IMO, it's safe to do so. > > > > > > > > > > > > > > > > > > > > > > I'll send you my patch attached. I was planning to send it in the following > > > > > > > > > > > weeks anyway. I was just waiting to be sure everything looks fine on our > > > > > > > > > > > CI. Feel free to test and modify it if needed. > > > > > > > > > > > > > > > > > > > > I tested your patch. Unfortunately there is still one test case > > > > > > > > > > (migration-test.exe) throwing up the "Broken pipe" message. > > > > > > > > > > > > > > > > > > I must say I didn't fully test it against qemu testsuite yet. Maybe there are > > > > > > > > > some refinements to be done. "Broken pipe" might be linked to the missing > > > > > > > > > G_IO_HUP support. > > > > > > > > > > > > > > > > > > > Can you test my patch instead to see if your gdb issue can be fixed? > > > > > > > > > > > > > > > > > > Yeah sure. I'll try to do it this afternoon. > > > > > > > > > > > > > > I can't explain how mad at me I am... I'm pretty sure your patch was the first > > > > > > > thing I've tried when I encountered this issue. But it wasn't working > > > > > > > or IIRC the > > > > > > > issue went away but that was because the polling was actually disabled (looping > > > > > > > indefinitely)...I'm suspecting that I already had changed the CreateEvent for > > > > > > > WSACreateEvent which forces you to handle the reset. > > > > > > > Finally, I end up struggling reworking the whole check function... > > > > > > > But yeah, your patch does work fine on my gdb issues too. > > > > > > > > > > > > Good to know this patch works for you too. > > > > > > > > > > > > > And I guess the events are reset when recv() is being called because of the > > > > > > > auto-reset feature set up by CreateEvent(). > > > > > > > IIUC, what Marc-André means by busy loop is the polling being looping > > > > > > > indefinitely as I encountered. I can ensure that this patch doesn't do that. > > > > > > > It can be easily checked by setting the env variable G_MAIN_POLL_DEBUG. > > > > > > > It'll show what g_poll is doing and it's normally always available on > > > > > > > Windows. > > > > > > > > > > > > > > Anyway, we'll wait for Paolo to see if he remembers why he had to call > > > > > > > WSAEnumNetworkEvents. Otherwize, let's go for your patch. Mine might > > > > > > > be a good start to improve the whole polling on Windows but if it doesn't > > > > > > > work in your case, it then needs some refinements. > > > > > > > > > > > > > > > > > > > Yeah, this issue bugged me quite a lot. If we want to reset the event > > > > > > in qio_channel_socket_source_check(), we will have to do the following > > > > > > to make sure qtests are happy. > > > > > > > > > > > > diff --git a/io/channel-watch.c b/io/channel-watch.c > > > > > > index 43d38494f7..f1e1650b81 100644 > > > > > > --- a/io/channel-watch.c > > > > > > +++ b/io/channel-watch.c > > > > > > @@ -124,8 +124,6 @@ qio_channel_socket_source_check(GSource *source) > > > > > > return 0; > > > > > > } > > > > > > - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > > > > - > > > > > > FD_ZERO(&rfds); > > > > > > FD_ZERO(&wfds); > > > > > > FD_ZERO(&xfds); > > > > > > @@ -153,6 +151,10 @@ qio_channel_socket_source_check(GSource *source) > > > > > > ssource->revents |= G_IO_PRI; > > > > > > } > > > > > > + if (ssource->revents) { > > > > > > + WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); > > > > > > + } > > > > > > + > > > > > > return ssource->revents; > > > > > > } > > > > > > > > > > > > Removing "if (ssource->revents)" won't work. > > > > > > > > > > > > It seems to me that resetting the event twice (one time with the > > > > > > master Gsource, and the other time with the child GSource) causes some > > > > > > bizarre behavior. But MSDN [1] says > > > > > > > > > > > > "Resetting an event that is already reset has no effect." > > > > > > > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > > > > > > > > > > Paolo, any comments about this issue? > > > > > > > > > > > > > v2 series has been sent out, and this patch remains unchanged. > > > > > > > > Paolo, still would appreciate your comments. > > > > > > > > > > Ping? > > > > > > > Ping? Can you please comment?? > > > > Ping? > Hi, Paolo remains silent. Please let me know who else could approve this change. Thanks. Regards, Bin
On Mon, Oct 17, 2022 at 08:21:37PM +0800, Bin Meng wrote: > +more people > > On Tue, Oct 11, 2022 at 6:42 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > Hi Paolo, > > > > On Thu, Oct 6, 2022 at 11:03 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > Hi Paolo, > > > > > > On Wed, Sep 28, 2022 at 2:10 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > Hi Paolo, > > > > > > > > On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > It seems to me that resetting the event twice (one time with the > > > > > > > master Gsource, and the other time with the child GSource) causes some > > > > > > > bizarre behavior. But MSDN [1] says > > > > > > > > > > > > > > "Resetting an event that is already reset has no effect." > > > > > > > > > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > > > > > > > > > > > > > Paolo, any comments about this issue? > > > > > > > > > > v2 series has been sent out, and this patch remains unchanged. > > > > > > > > > > Paolo, still would appreciate your comments. > > > > > > > > Ping? > > > > > > Ping? Can you please comment?? > > > > Ping? > > Paolo remains silent. Please let me know who else could approve this > change. Thanks. Given there has been plenty of time for objecting, I'll queue this patch on the basis that you've tested it on a real Windows host and found it better than what we have today. With regards, Daniel
On Wed, Aug 24, 2022 at 05:40:27PM +0800, Bin Meng wrote: > From: Bin Meng <bin.meng@windriver.com> > > Random failure was observed when running qtests on Windows due to > "Broken pipe" detected by qmp_fd_receive(). What happened is that > the qtest executable sends testing data over a socket to the QEMU > under test but no response is received. The errno of the recv() > call from the qtest executable indicates ETIMEOUT, due to the qmp > chardev's tcp_chr_read() is never called to receive testing data > hence no response is sent to the other side. > > tcp_chr_read() is registered as the callback of the socket watch > GSource. The reason of the callback not being called by glib, is > that the source check fails to indicate the source is ready. There > are two socket watch sources created to monitor the same socket > event object from the char-socket backend in update_ioc_handlers(). > During the source check phase, qio_channel_socket_source_check() > calls WSAEnumNetworkEvents() to discovers occurrences of network > events for the indicated socket, clear internal network event records, > and reset the event object. Testing shows that if we don't reset the > event object by not passing the event handle to WSAEnumNetworkEvents() > the symptom goes away and qtest runs very stably. > > It looks we don't need to call WSAEnumNetworkEvents() at all, as we > don't parse the result of WSANETWORKEVENTS returned from this API. > We use select() to poll the socket status. Fix this instability by > dropping the WSAEnumNetworkEvents() call. > > Signed-off-by: Bin Meng <bin.meng@windriver.com> > --- > During the testing, I removed the following codes in update_ioc_handlers(): > > remove_hup_source(s); > s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP); > g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup, > chr, NULL); > g_source_attach(s->hup_source, chr->gcontext); > > and such change also makes the symptom go away. > > And if I moved the above codes to the beginning, before the call to > io_add_watch_poll(), the symptom also goes away. > > It seems two sources watching on the same socket event object is > the key that leads to the instability. The order of adding a source > watch seems to also play a role but I can't explain why. > Hopefully a Windows and glib expert could explain this behavior. > > io/channel-watch.c | 4 ---- > 1 file changed, 4 deletions(-) Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> and queued With regards, Daniel
On Mon, Oct 17, 2022 at 8:30 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Oct 17, 2022 at 08:21:37PM +0800, Bin Meng wrote: > > +more people > > > > On Tue, Oct 11, 2022 at 6:42 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > Hi Paolo, > > > > > > On Thu, Oct 6, 2022 at 11:03 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > Hi Paolo, > > > > > > > > On Wed, Sep 28, 2022 at 2:10 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > Hi Paolo, > > > > > > > > > > On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > It seems to me that resetting the event twice (one time with the > > > > > > > > master Gsource, and the other time with the child GSource) causes some > > > > > > > > bizarre behavior. But MSDN [1] says > > > > > > > > > > > > > > > > "Resetting an event that is already reset has no effect." > > > > > > > > > > > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > > > > > > > > > > > > > > > > Paolo, any comments about this issue? > > > > > > > > > > > > v2 series has been sent out, and this patch remains unchanged. > > > > > > > > > > > > Paolo, still would appreciate your comments. > > > > > > > > > > Ping? > > > > > > > > Ping? Can you please comment?? > > > > > > Ping? > > > > Paolo remains silent. Please let me know who else could approve this > > change. Thanks. > > Given there has been plenty of time for objecting, I'll queue this > patch on the basis that you've tested it on a real Windows host > and found it better than what we have today. > Thank you Daniel! Please queue the following patches from v5 instead. Message-ids: 20221006151927.2079583-15-bmeng.cn@gmail.com 20221006151927.2079583-16-bmeng.cn@gmail.com 20221006151927.2079583-17-bmeng.cn@gmail.com Regards, Bin
On Mon, Oct 17, 2022 at 09:03:15PM +0800, Bin Meng wrote: > On Mon, Oct 17, 2022 at 8:30 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Mon, Oct 17, 2022 at 08:21:37PM +0800, Bin Meng wrote: > > > +more people > > > > > > On Tue, Oct 11, 2022 at 6:42 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > Hi Paolo, > > > > > > > > On Thu, Oct 6, 2022 at 11:03 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > Hi Paolo, > > > > > > > > > > On Wed, Sep 28, 2022 at 2:10 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > Hi Paolo, > > > > > > > > > > > > On Wed, Sep 21, 2022 at 9:02 AM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, Sep 14, 2022 at 4:08 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > > > > > > > > On Wed, Sep 7, 2022 at 1:07 PM Bin Meng <bmeng.cn@gmail.com> wrote: > > > > > > > > > It seems to me that resetting the event twice (one time with the > > > > > > > > > master Gsource, and the other time with the child GSource) causes some > > > > > > > > > bizarre behavior. But MSDN [1] says > > > > > > > > > > > > > > > > > > "Resetting an event that is already reset has no effect." > > > > > > > > > > > > > > > > > > [1] https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-resetevent > > > > > > > > > > > > > > > > > > > > > > > > > Paolo, any comments about this issue? > > > > > > > > > > > > > > v2 series has been sent out, and this patch remains unchanged. > > > > > > > > > > > > > > Paolo, still would appreciate your comments. > > > > > > > > > > > > Ping? > > > > > > > > > > Ping? Can you please comment?? > > > > > > > > Ping? > > > > > > Paolo remains silent. Please let me know who else could approve this > > > change. Thanks. > > > > Given there has been plenty of time for objecting, I'll queue this > > patch on the basis that you've tested it on a real Windows host > > and found it better than what we have today. > > > > Thank you Daniel! > > Please queue the following patches from v5 instead. > > Message-ids: > > 20221006151927.2079583-15-bmeng.cn@gmail.com > 20221006151927.2079583-16-bmeng.cn@gmail.com > 20221006151927.2079583-17-bmeng.cn@gmail.com Ok, have done so. With regards, Daniel
diff --git a/io/channel-watch.c b/io/channel-watch.c index 89f3c8a88a..e34d86e810 100644 --- a/io/channel-watch.c +++ b/io/channel-watch.c @@ -115,17 +115,13 @@ static gboolean qio_channel_socket_source_check(GSource *source) { static struct timeval tv0; - QIOChannelSocketSource *ssource = (QIOChannelSocketSource *)source; - WSANETWORKEVENTS ev; fd_set rfds, wfds, xfds; if (!ssource->condition) { return 0; } - WSAEnumNetworkEvents(ssource->socket, ssource->ioc->event, &ev); - FD_ZERO(&rfds); FD_ZERO(&wfds); FD_ZERO(&xfds);