diff mbox series

[net] net/tls: missing received data after fast remote close

Message ID 1605440628-1283-1-git-send-email-vfedorenko@novek.ru (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net/tls: missing received data after fast remote close | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Vadim Fedorenko Nov. 15, 2020, 11:43 a.m. UTC
In case when tcp socket received FIN after some data and the
parser haven't started before reading data caller will receive
an empty buffer. This behavior differs from plain TCP socket and
leads to special treating in user-space. Patch unpauses parser
directly if we have unparsed data in tcp receive queue.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 net/tls/tls_sw.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jakub Kicinski Nov. 17, 2020, 10:38 p.m. UTC | #1
On Sun, 15 Nov 2020 14:43:48 +0300 Vadim Fedorenko wrote:
> In case when tcp socket received FIN after some data and the
> parser haven't started before reading data caller will receive
> an empty buffer.

This is pretty terse, too terse for me to understand..

> This behavior differs from plain TCP socket and
> leads to special treating in user-space. Patch unpauses parser
> directly if we have unparsed data in tcp receive queue.

Sure, but why is the parser paused? Does it pause itself on FIN?

> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 2fe9e2c..4db6943 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1289,6 +1289,9 @@ static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock,
>  	struct sk_buff *skb;
>  	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  
> +	if (!ctx->recv_pkt && skb_queue_empty(&sk->sk_receive_queue))
> +		__strp_unpause(&ctx->strp);
> +
>  	while (!(skb = ctx->recv_pkt) && sk_psock_queue_empty(psock)) {
>  		if (sk->sk_err) {
>  			*err = sock_error(sk);
Vadim Fedorenko Nov. 18, 2020, 12:50 a.m. UTC | #2
On 17.11.2020 22:38, Jakub Kicinski wrote:
> On Sun, 15 Nov 2020 14:43:48 +0300 Vadim Fedorenko wrote:
>> In case when tcp socket received FIN after some data and the
>> parser haven't started before reading data caller will receive
>> an empty buffer.
> This is pretty terse, too terse for me to understand..
The flow is simple. Server sends small amount of data right after the
connection is configured and closes the connection. In this case
receiver sees TLS Handshake data, configures TLS socket right after
Change Cipher Spec record. While the configuration is in process, TCP
socket receives small Application Data record, Encrypted Alert record
and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN
and sk_flag with SK_DONE bit set.
>> This behavior differs from plain TCP socket and
>> leads to special treating in user-space. Patch unpauses parser
>> directly if we have unparsed data in tcp receive queue.
> Sure, but why is the parser paused? Does it pause itself on FIN?
No, it doesn't start even once. The trace looks like:

tcp_recvmsg is called
tcp_recvmsg returns 1 (Change Cipher Spec record data)
tls_setsockopt is called
tls_setsockopt returns
tls_recvmsg is called
tls_recvmsg returns 0
__strp_recv is called
stack
         __strp_recv+1
         tcp_read_sock+169
         strp_read_sock+104
         strp_work+68
         process_one_work+436
         worker_thread+80
         kthread+276
         ret_from_fork+34tls_read_size called

So it looks like strp_work was scheduled after tls_recvmsg and
nothing triggered parser because all the data was received before
tls_setsockopt ended the configuration process.
>
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 2fe9e2c..4db6943 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1289,6 +1289,9 @@ static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock,
>>   	struct sk_buff *skb;
>>   	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>>   
>> +	if (!ctx->recv_pkt && skb_queue_empty(&sk->sk_receive_queue))
>> +		__strp_unpause(&ctx->strp);
>> +
>>   	while (!(skb = ctx->recv_pkt) && sk_psock_queue_empty(psock)) {
>>   		if (sk->sk_err) {
>>   			*err = sock_error(sk);
Jakub Kicinski Nov. 18, 2020, 1:53 a.m. UTC | #3
On Wed, 18 Nov 2020 00:50:48 +0000 Vadim Fedorenko wrote:
> On 17.11.2020 22:38, Jakub Kicinski wrote:
> > On Sun, 15 Nov 2020 14:43:48 +0300 Vadim Fedorenko wrote:  
> >> In case when tcp socket received FIN after some data and the
> >> parser haven't started before reading data caller will receive
> >> an empty buffer.  
> > This is pretty terse, too terse for me to understand..  
> The flow is simple. Server sends small amount of data right after the
> connection is configured and closes the connection. In this case
> receiver sees TLS Handshake data, configures TLS socket right after
> Change Cipher Spec record. While the configuration is in process, TCP
> socket receives small Application Data record, Encrypted Alert record
> and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN
> and sk_flag with SK_DONE bit set.

Thanks! That's clear. This is a race, right, you can't trigger 
it reliably?

BTW please feel free to add your cases to the tls selftest in
tools/testing/selftests.

> >> This behavior differs from plain TCP socket and
> >> leads to special treating in user-space. Patch unpauses parser
> >> directly if we have unparsed data in tcp receive queue.  
> > Sure, but why is the parser paused? Does it pause itself on FIN?  
> No, it doesn't start even once. The trace looks like:
> 
> tcp_recvmsg is called
> tcp_recvmsg returns 1 (Change Cipher Spec record data)
> tls_setsockopt is called
> tls_setsockopt returns
> tls_recvmsg is called
> tls_recvmsg returns 0
> __strp_recv is called
> stack
>          __strp_recv+1
>          tcp_read_sock+169
>          strp_read_sock+104
>          strp_work+68
>          process_one_work+436
>          worker_thread+80
>          kthread+276
>          ret_from_fork+34tls_read_size called
> 
> So it looks like strp_work was scheduled after tls_recvmsg and
> nothing triggered parser because all the data was received before
> tls_setsockopt ended the configuration process.

Um. That makes me think we need to flush_work() on the strparser after
we configure rx tls, no? Or __unpause at the right time instead of
dealing with the async nature of strp_check_rcv()?
Vadim Fedorenko Nov. 18, 2020, 2:47 a.m. UTC | #4
On 18.11.2020 01:53, Jakub Kicinski wrote:
> On Wed, 18 Nov 2020 00:50:48 +0000 Vadim Fedorenko wrote:
>> On 17.11.2020 22:38, Jakub Kicinski wrote:
>>> On Sun, 15 Nov 2020 14:43:48 +0300 Vadim Fedorenko wrote:
>>>> In case when tcp socket received FIN after some data and the
>>>> parser haven't started before reading data caller will receive
>>>> an empty buffer.
>>> This is pretty terse, too terse for me to understand..
>> The flow is simple. Server sends small amount of data right after the
>> connection is configured and closes the connection. In this case
>> receiver sees TLS Handshake data, configures TLS socket right after
>> Change Cipher Spec record. While the configuration is in process, TCP
>> socket receives small Application Data record, Encrypted Alert record
>> and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN
>> and sk_flag with SK_DONE bit set.
> Thanks! That's clear. This is a race, right, you can't trigger
> it reliably?
It is triggered in the test cases usually but nit always. To trigger
this problem always I just add some delay before setsockopt() call.
This delay leaves sometime for kernel to receive all the data from
sender in test case.
> BTW please feel free to add your cases to the tls selftest in
> tools/testing/selftests.
Sure, will add this case in selftests.
>
>>>> This behavior differs from plain TCP socket and
>>>> leads to special treating in user-space. Patch unpauses parser
>>>> directly if we have unparsed data in tcp receive queue.
>>> Sure, but why is the parser paused? Does it pause itself on FIN?
>> No, it doesn't start even once. The trace looks like:
>>
>> tcp_recvmsg is called
>> tcp_recvmsg returns 1 (Change Cipher Spec record data)
>> tls_setsockopt is called
>> tls_setsockopt returns
>> tls_recvmsg is called
>> tls_recvmsg returns 0
>> __strp_recv is called
>> stack
>>           __strp_recv+1
>>           tcp_read_sock+169
>>           strp_read_sock+104
>>           strp_work+68
>>           process_one_work+436
>>           worker_thread+80
>>           kthread+276
>>           ret_from_fork+34tls_read_size called
>>
>> So it looks like strp_work was scheduled after tls_recvmsg and
>> nothing triggered parser because all the data was received before
>> tls_setsockopt ended the configuration process.
> Um. That makes me think we need to flush_work() on the strparser after
> we configure rx tls, no? Or __unpause at the right time instead of
> dealing with the async nature of strp_check_rcv()?
I'm not sure that flush_work() will do right way in this case. Because:
1. The work is idle after tls_sw_strparser_arm()
2. The strparser needs socket lock to do it's work - that could be a
deadlock because setsockopt_conf already holds socket lock. I'm not
sure that it's a good idea to unlock socket just to wait the strparser.

The async nature of parser is OK for classic HTTPS server/client case
because it's very good to have parsed record before actual call to recvmsg
or splice_read is done. The code inside the loop in tls_wait_data is slow
path - maybe just move the check and the __unpause in this slow path?
Jakub Kicinski Nov. 18, 2020, 4:23 p.m. UTC | #5
On Wed, 18 Nov 2020 02:47:24 +0000 Vadim Fedorenko wrote:
> >>>> This behavior differs from plain TCP socket and
> >>>> leads to special treating in user-space. Patch unpauses parser
> >>>> directly if we have unparsed data in tcp receive queue.  
> >>> Sure, but why is the parser paused? Does it pause itself on FIN?  
> >> No, it doesn't start even once. The trace looks like:
> >>
> >> tcp_recvmsg is called
> >> tcp_recvmsg returns 1 (Change Cipher Spec record data)
> >> tls_setsockopt is called
> >> tls_setsockopt returns
> >> tls_recvmsg is called
> >> tls_recvmsg returns 0
> >> __strp_recv is called
> >> stack
> >>           __strp_recv+1
> >>           tcp_read_sock+169
> >>           strp_read_sock+104
> >>           strp_work+68
> >>           process_one_work+436
> >>           worker_thread+80
> >>           kthread+276
> >>           ret_from_fork+34tls_read_size called
> >>
> >> So it looks like strp_work was scheduled after tls_recvmsg and
> >> nothing triggered parser because all the data was received before
> >> tls_setsockopt ended the configuration process.  
> > Um. That makes me think we need to flush_work() on the strparser after
> > we configure rx tls, no? Or __unpause at the right time instead of
> > dealing with the async nature of strp_check_rcv()?  
> I'm not sure that flush_work() will do right way in this case. Because:
> 1. The work is idle after tls_sw_strparser_arm()

Not sure what you mean by idle, it queues the work via strp_check_rcv().

> 2. The strparser needs socket lock to do it's work - that could be a
> deadlock because setsockopt_conf already holds socket lock. I'm not
> sure that it's a good idea to unlock socket just to wait the strparser.

Ack, I meant in do_tls_setsockopt() after the lock is released.

> The async nature of parser is OK for classic HTTPS server/client case
> because it's very good to have parsed record before actual call to recvmsg
> or splice_read is done. The code inside the loop in tls_wait_data is slow
> path - maybe just move the check and the __unpause in this slow path?

Yeah, looking closer this problem can arise after we start as well :/

How about we move the __unparse code into the loop tho? Seems like this
could help with latency. Right now AFAICT we get a TCP socket ready
notification, which wakes the waiter for no good reason and makes
strparser queue its work, the work then will call tls_queue() ->
data_ready waking the waiting thread second time this time with the
record actually parsed.

Did I get that right? Should the waiter not cancel the work on the
first wake up and just kick of the parsing itself?
Vadim Fedorenko Nov. 18, 2020, 8:51 p.m. UTC | #6
On 18.11.2020 16:23, Jakub Kicinski wrote:
> On Wed, 18 Nov 2020 02:47:24 +0000 Vadim Fedorenko wrote:
>>>>>> This behavior differs from plain TCP socket and
>>>>>> leads to special treating in user-space. Patch unpauses parser
>>>>>> directly if we have unparsed data in tcp receive queue.
>>>>> Sure, but why is the parser paused? Does it pause itself on FIN?
>>>> No, it doesn't start even once. The trace looks like:
>>>>
>>>> tcp_recvmsg is called
>>>> tcp_recvmsg returns 1 (Change Cipher Spec record data)
>>>> tls_setsockopt is called
>>>> tls_setsockopt returns
>>>> tls_recvmsg is called
>>>> tls_recvmsg returns 0
>>>> __strp_recv is called
>>>> stack
>>>>            __strp_recv+1
>>>>            tcp_read_sock+169
>>>>            strp_read_sock+104
>>>>            strp_work+68
>>>>            process_one_work+436
>>>>            worker_thread+80
>>>>            kthread+276
>>>>            ret_from_fork+34tls_read_size called
>>>>
>>>> So it looks like strp_work was scheduled after tls_recvmsg and
>>>> nothing triggered parser because all the data was received before
>>>> tls_setsockopt ended the configuration process.
>>> Um. That makes me think we need to flush_work() on the strparser after
>>> we configure rx tls, no? Or __unpause at the right time instead of
>>> dealing with the async nature of strp_check_rcv()?
>> I'm not sure that flush_work() will do right way in this case. Because:
>> 1. The work is idle after tls_sw_strparser_arm()
> Not sure what you mean by idle, it queues the work via strp_check_rcv().
>
>> 2. The strparser needs socket lock to do it's work - that could be a
>> deadlock because setsockopt_conf already holds socket lock. I'm not
>> sure that it's a good idea to unlock socket just to wait the strparser.
> Ack, I meant in do_tls_setsockopt() after the lock is released.
>
>> The async nature of parser is OK for classic HTTPS server/client case
>> because it's very good to have parsed record before actual call to recvmsg
>> or splice_read is done. The code inside the loop in tls_wait_data is slow
>> path - maybe just move the check and the __unpause in this slow path?
> Yeah, looking closer this problem can arise after we start as well :/
>
> How about we move the __unparse code into the loop tho? Seems like this
> could help with latency. Right now AFAICT we get a TCP socket ready
> notification, which wakes the waiter for no good reason and makes
> strparser queue its work, the work then will call tls_queue() ->
> data_ready waking the waiting thread second time this time with the
> record actually parsed.
>
> Did I get that right? Should the waiter not cancel the work on the
> first wake up and just kick of the parsing itself?
I was thinking of the same solution too, but simple check of emptyness of
socket's receive queue is not working in case when we have partial record
in queue - __unpause will return without changing ctx->skb and still having
positive value in socket queue and we will have blocked loop until new data
is received or strp_abort_strp is fired because of timeout. If you could
suggest proper condition to break the loop it would be great.

Or probably I misunderstood what loop did you mean exactly?
Jakub Kicinski Nov. 18, 2020, 11:39 p.m. UTC | #7
On Wed, 18 Nov 2020 20:51:30 +0000 Vadim Fedorenko wrote:
> >> The async nature of parser is OK for classic HTTPS server/client case
> >> because it's very good to have parsed record before actual call to recvmsg
> >> or splice_read is done. The code inside the loop in tls_wait_data is slow
> >> path - maybe just move the check and the __unpause in this slow path?  
> > Yeah, looking closer this problem can arise after we start as well :/
> >
> > How about we move the __unparse code into the loop tho? Seems like this
> > could help with latency. Right now AFAICT we get a TCP socket ready
> > notification, which wakes the waiter for no good reason and makes
> > strparser queue its work, the work then will call tls_queue() ->
> > data_ready waking the waiting thread second time this time with the
> > record actually parsed.
> >
> > Did I get that right? Should the waiter not cancel the work on the
> > first wake up and just kick of the parsing itself?  
> I was thinking of the same solution too, but simple check of emptyness of
> socket's receive queue is not working in case when we have partial record
> in queue - __unpause will return without changing ctx->skb and still having
> positive value in socket queue and we will have blocked loop until new data
> is received or strp_abort_strp is fired because of timeout. If you could
> suggest proper condition to break the loop it would be great.
> 
> Or probably I misunderstood what loop did you mean exactly?

Damn, you may be seeing some problem I'm missing again ;) Running
__unparse can be opportunistic, if it doesn't parse anything that's
fine. I was thinking:

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 95ab5545a931..6478bd968506 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1295,6 +1295,10 @@ static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock,
                        return NULL;
                }
 
+               __strp_unpause(&ctx->strp);
+               if (ctx->recv_pkt)
+                       return ctx->recv_pkt;
+
                if (sk->sk_shutdown & RCV_SHUTDOWN)
                        return NULL;
 
Optionally it would be nice if unparse cancelled the work if it managed
to parse the record out.
Vadim Fedorenko Nov. 19, 2020, 12:26 a.m. UTC | #8
On 18.11.2020 23:39, Jakub Kicinski wrote:
> On Wed, 18 Nov 2020 20:51:30 +0000 Vadim Fedorenko wrote:
>>>> The async nature of parser is OK for classic HTTPS server/client case
>>>> because it's very good to have parsed record before actual call to recvmsg
>>>> or splice_read is done. The code inside the loop in tls_wait_data is slow
>>>> path - maybe just move the check and the __unpause in this slow path?
>>> Yeah, looking closer this problem can arise after we start as well :/
>>>
>>> How about we move the __unparse code into the loop tho? Seems like this
>>> could help with latency. Right now AFAICT we get a TCP socket ready
>>> notification, which wakes the waiter for no good reason and makes
>>> strparser queue its work, the work then will call tls_queue() ->
>>> data_ready waking the waiting thread second time this time with the
>>> record actually parsed.
>>>
>>> Did I get that right? Should the waiter not cancel the work on the
>>> first wake up and just kick of the parsing itself?
>> I was thinking of the same solution too, but simple check of emptyness of
>> socket's receive queue is not working in case when we have partial record
>> in queue - __unpause will return without changing ctx->skb and still having
>> positive value in socket queue and we will have blocked loop until new data
>> is received or strp_abort_strp is fired because of timeout. If you could
>> suggest proper condition to break the loop it would be great.
>>
>> Or probably I misunderstood what loop did you mean exactly?
> Damn, you may be seeing some problem I'm missing again ;) Running
> __unparse can be opportunistic, if it doesn't parse anything that's
> fine. I was thinking:
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 95ab5545a931..6478bd968506 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1295,6 +1295,10 @@ static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock,
>                          return NULL;
>                  }
>   
> +               __strp_unpause(&ctx->strp);
> +               if (ctx->recv_pkt)
> +                       return ctx->recv_pkt;
> +
>                  if (sk->sk_shutdown & RCV_SHUTDOWN)
>                          return NULL;
>   
> Optionally it would be nice if unparse cancelled the work if it managed
> to parse the record out.
Oh, simple and fine solution. But is it better to unpause parser conditionally when
there is something in the socket queue? Otherwise this call will be just wasting
cycles. Maybe like this:

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2fe9e2c..97c5f6e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1295,6 +1295,12 @@ static struct sk_buff *tls_wait_data(struct sock *sk, 
struct sk_psock *psock,
                         return NULL;
                 }

+               if (!skb_queue_empty(&sk->sk_receive_queue)) {
+                       __strp_unpause(&ctx->strp);
+                       if (ctx->recv_pkt)
+                               return ctx->recv_pkt;
+               }
+
                 if (sk->sk_shutdown & RCV_SHUTDOWN)
                         return NULL;
Jakub Kicinski Nov. 19, 2020, 12:49 a.m. UTC | #9
On Thu, 19 Nov 2020 00:26:52 +0000 Vadim Fedorenko wrote:
> > Damn, you may be seeing some problem I'm missing again ;) Running
> > __unparse can be opportunistic, if it doesn't parse anything that's
> > fine. I was thinking:
> >
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index 95ab5545a931..6478bd968506 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -1295,6 +1295,10 @@ static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock,
> >                          return NULL;
> >                  }
> >   
> > +               __strp_unpause(&ctx->strp);
> > +               if (ctx->recv_pkt)
> > +                       return ctx->recv_pkt;
> > +
> >                  if (sk->sk_shutdown & RCV_SHUTDOWN)
> >                          return NULL;
> >   
> > Optionally it would be nice if unparse cancelled the work if it managed
> > to parse the record out.  
> Oh, simple and fine solution. But is it better to unpause parser conditionally when
> there is something in the socket queue? Otherwise this call will be just wasting
> cycles. Maybe like this:
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 2fe9e2c..97c5f6e 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1295,6 +1295,12 @@ static struct sk_buff *tls_wait_data(struct sock *sk, 
> struct sk_psock *psock,
>                          return NULL;
>                  }
> 
> +               if (!skb_queue_empty(&sk->sk_receive_queue)) {
> +                       __strp_unpause(&ctx->strp);
> +                       if (ctx->recv_pkt)
> +                               return ctx->recv_pkt;
> +               }
> +
>                  if (sk->sk_shutdown & RCV_SHUTDOWN)
>                          return NULL;
> 

LGTM!
diff mbox series

Patch

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2fe9e2c..4db6943 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1289,6 +1289,9 @@  static struct sk_buff *tls_wait_data(struct sock *sk, struct sk_psock *psock,
 	struct sk_buff *skb;
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 
+	if (!ctx->recv_pkt && skb_queue_empty(&sk->sk_receive_queue))
+		__strp_unpause(&ctx->strp);
+
 	while (!(skb = ctx->recv_pkt) && sk_psock_queue_empty(psock)) {
 		if (sk->sk_err) {
 			*err = sock_error(sk);