Message ID | 20240109040621.1248647-1-prestwoj@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | dhcp: fix overflow causing retries to stop | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
prestwoj/iwd-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-ci-build | success | Build - Configure |
prestwoj/iwd-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-ci-clang | success | clang PASS |
prestwoj/iwd-ci-makecheck | success | Make Check |
prestwoj/iwd-ci-testrunner | success | test-runner PASS |
Hi James, On 1/8/24 22:06, James Prestwood wrote: > If DHCP is in a SELECTING/REQUESTING state and the number of attempts > reached a value where 2 << attempts overflowed an unsigned int the > next timeout would become zero, causing DHCP to never retry. > > Since 5 attempts results in a value of 64 we can instead just limit > the attempts to 5, and set next_timeout to 64 after that as the spec > requires. Overflowing next_timeout implies that discovery is taking ~25 minutes? I think I'd rather we gave up and sent a NO_LEASE event after retries hits a certain max number? Regards, -Denis
Hi Denis, On 1/8/24 8:27 PM, Denis Kenzior wrote: > Hi James, > > On 1/8/24 22:06, James Prestwood wrote: >> If DHCP is in a SELECTING/REQUESTING state and the number of attempts >> reached a value where 2 << attempts overflowed an unsigned int the >> next timeout would become zero, causing DHCP to never retry. >> >> Since 5 attempts results in a value of 64 we can instead just limit >> the attempts to 5, and set next_timeout to 64 after that as the spec >> requires. > > Overflowing next_timeout implies that discovery is taking ~25 > minutes? I think I'd rather we gave up and sent a NO_LEASE event > after retries hits a certain max number? Yep, this is whats happening, the DHCP server is down. Obviously we could change the logic, but NO_LEASE appears to just restart the DHCP client, i.e. no real difference between trying discovery forever. Should we add another event, L_DHCP_CLIENT_EVENT_TIMEOUT? Thanks, james > > Regards, > -Denis
Hi James, On 1/9/24 06:18, James Prestwood wrote: > Hi Denis, > > On 1/8/24 8:27 PM, Denis Kenzior wrote: >> Hi James, >> >> On 1/8/24 22:06, James Prestwood wrote: >>> If DHCP is in a SELECTING/REQUESTING state and the number of attempts >>> reached a value where 2 << attempts overflowed an unsigned int the >>> next timeout would become zero, causing DHCP to never retry. >>> >>> Since 5 attempts results in a value of 64 we can instead just limit >>> the attempts to 5, and set next_timeout to 64 after that as the spec >>> requires. >> >> Overflowing next_timeout implies that discovery is taking ~25 minutes? I >> think I'd rather we gave up and sent a NO_LEASE event after retries hits a >> certain max number? > > Yep, this is whats happening, the DHCP server is down. Obviously we could change > the logic, but NO_LEASE appears to just restart the DHCP client, i.e. no real I'm not sure why netconfig does that actually. From the comment it seems like it expects NO_LEASE event when the old address could not be obtained. But, if I recall correctly, ell's dhcp client implementation only includes the old address when using RENEW or REBIND, and never on a fresh dhcp_client_start()-up due to privacy policy. In the end, it is up to netconfig to implement a policy it wants. Re-starting discovery is a valid policy, but I think in iwd's case, we should have netconfig treat 'NO_LEASE' as a DHCP server going down / server not present. Having a max number of retries can facilitate that. For example, if we're connecting and reach max_attempts of 6 or so, then give up and have iwd fail the connection. > difference between trying discovery forever. Should we add another event, > L_DHCP_CLIENT_EVENT_TIMEOUT? l_dhcp_client sends NO_LEASE when it gets a NAK and there's no point in continuing without further intervention (or configuration change). Server not responding seems to fall into the same category, but we can also add an event explicitly for this. Maybe MAX_DISCOVER_ATTEMPTS_REACHED? Regards, -Denis
diff --git a/ell/dhcp.c b/ell/dhcp.c index ece3e23..4f8b369 100644 --- a/ell/dhcp.c +++ b/ell/dhcp.c @@ -557,9 +557,14 @@ static void dhcp_client_timeout_resend(struct l_timeout *timeout, * RFC 2131 Section 4.1: * "The retransmission delay SHOULD be doubled with subsequent * retransmissions up to a maximum of 64 seconds. + * + * More than 5 attempts will trigger the maximum, don't increase + * any more to avoid overflowing next_timeout. */ - client->attempt += 1; - next_timeout = minsize(2 << client->attempt, 64); + if (client->attempt > 5) + next_timeout = 64; + else + next_timeout = 2 << client->attempt++; break; case DHCP_STATE_INIT: case DHCP_STATE_INIT_REBOOT: