diff mbox series

[v4,1/3] libceph: fail the sparse-read if there still has data in socket

Message ID 20240118105047.792879-2-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series libceph: fix sparse-read failure bug | expand

Commit Message

Xiubo Li Jan. 18, 2024, 10:50 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

Once this happens that means there have bugs.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 net/ceph/osd_client.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jeff Layton Jan. 18, 2024, 2:03 p.m. UTC | #1
On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Once this happens that means there have bugs.
> 
> URL: https://tracker.ceph.com/issues/63586
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  net/ceph/osd_client.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 9be80d01c1dc..f8029b30a3fb 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5912,6 +5912,13 @@ static int osd_sparse_read(struct ceph_connection *con,
>  		fallthrough;
>  	case CEPH_SPARSE_READ_DATA:
>  		if (sr->sr_index >= count) {
> +			if (sr->sr_datalen) {
> +				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
> +						    sr->sr_datalen, sr->sr_index,
> +						    count);
> +				return -EREMOTEIO;
> +			}
> +

Ok, so the server has (presumably) sent us a longer value for the
sr_datalen than was in the extent map?

Why should the sparse read engine care about that? It was (presumably)
able to do its job of handling the read. Why not just advance past the
extra junk and try to do another sparse read? Do we really need to fail
the op for this?

>  			sr->sr_state = CEPH_SPARSE_READ_HDR;
>  			goto next_op;
>  		}
> @@ -5919,6 +5926,8 @@ static int osd_sparse_read(struct ceph_connection *con,
>  		eoff = sr->sr_extent[sr->sr_index].off;
>  		elen = sr->sr_extent[sr->sr_index].len;
>  
> +		sr->sr_datalen -= elen;
> +
>  		dout("[%d] ext %d off 0x%llx len 0x%llx\n",
>  		     o->o_osd, sr->sr_index, eoff, elen);
>
Xiubo Li Jan. 19, 2024, 4:07 a.m. UTC | #2
On 1/18/24 22:03, Jeff Layton wrote:
> On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Once this happens that means there have bugs.
>>
>> URL: https://tracker.ceph.com/issues/63586
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   net/ceph/osd_client.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 9be80d01c1dc..f8029b30a3fb 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -5912,6 +5912,13 @@ static int osd_sparse_read(struct ceph_connection *con,
>>   		fallthrough;
>>   	case CEPH_SPARSE_READ_DATA:
>>   		if (sr->sr_index >= count) {
>> +			if (sr->sr_datalen) {
>> +				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
>> +						    sr->sr_datalen, sr->sr_index,
>> +						    count);
>> +				return -EREMOTEIO;
>> +			}
>> +
> Ok, so the server has (presumably) sent us a longer value for the
> sr_datalen than was in the extent map?
>
> Why should the sparse read engine care about that? It was (presumably)
> able to do its job of handling the read. Why not just advance past the
> extra junk and try to do another sparse read? Do we really need to fail
> the op for this?

Hi Jeff,

I saw the problem just when I first debugging the sparse-read bug, and 
the length will be very large, more detail and the logs please see the 
tracker https://tracker.ceph.com/issues/63586:

      11741055 <4>[180940.606488] libceph: sr_datalen 251723776 sr_index 
0 count 0

In this case the request could cause the same request being retrying 
infinitely. While just in other case the when the ceph send a incorrect 
data length, how should we do ? Should we retry it ? How could we skip 
it if the length is so large ?

Thanks

- Xiubo


>
>>   			sr->sr_state = CEPH_SPARSE_READ_HDR;
>>   			goto next_op;
>>   		}
>> @@ -5919,6 +5926,8 @@ static int osd_sparse_read(struct ceph_connection *con,
>>   		eoff = sr->sr_extent[sr->sr_index].off;
>>   		elen = sr->sr_extent[sr->sr_index].len;
>>   
>> +		sr->sr_datalen -= elen;
>> +
>>   		dout("[%d] ext %d off 0x%llx len 0x%llx\n",
>>   		     o->o_osd, sr->sr_index, eoff, elen);
>>
Jeff Layton Jan. 19, 2024, 11:03 a.m. UTC | #3
On Fri, 2024-01-19 at 12:07 +0800, Xiubo Li wrote:
> On 1/18/24 22:03, Jeff Layton wrote:
> > On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > Once this happens that means there have bugs.
> > > 
> > > URL: https://tracker.ceph.com/issues/63586
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   net/ceph/osd_client.c | 9 +++++++++
> > >   1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > > index 9be80d01c1dc..f8029b30a3fb 100644
> > > --- a/net/ceph/osd_client.c
> > > +++ b/net/ceph/osd_client.c
> > > @@ -5912,6 +5912,13 @@ static int osd_sparse_read(struct ceph_connection *con,
> > >   		fallthrough;
> > >   	case CEPH_SPARSE_READ_DATA:
> > >   		if (sr->sr_index >= count) {
> > > +			if (sr->sr_datalen) {
> > > +				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
> > > +						    sr->sr_datalen, sr->sr_index,
> > > +						    count);
> > > +				return -EREMOTEIO;
> > > +			}
> > > +
> > Ok, so the server has (presumably) sent us a longer value for the
> > sr_datalen than was in the extent map?
> > 
> > Why should the sparse read engine care about that? It was (presumably)
> > able to do its job of handling the read. Why not just advance past the
> > extra junk and try to do another sparse read? Do we really need to fail
> > the op for this?
> 
> Hi Jeff,
> 
> I saw the problem just when I first debugging the sparse-read bug, and 
> the length will be very large, more detail and the logs please see the 
> tracker https://tracker.ceph.com/issues/63586:
> 
>       11741055 <4>[180940.606488] libceph: sr_datalen 251723776 sr_index 
> 0 count 0
> 
> In this case the request could cause the same request being retrying 
> infinitely. While just in other case the when the ceph send a incorrect 
> data length, how should we do ? Should we retry it ? How could we skip 
> it if the length is so large ?
> 
> 

If the sparse_read datalen is so long that it goes beyond the length of
the entire frame, then it's clearly malformed and we have to reject it.

I do wonder though whether this is the right place to do that. It seems
like the client ought to do that sort of vetting of lengths before it
starts dealing with the read data.

IOW, maybe there should be a simple check in the
CEPH_SPARSE_READ_DATA_LEN case that validates that the sr_datalen fits
inside the "data_len" of the entire frame?

> 
> 
> > 
> > >   			sr->sr_state = CEPH_SPARSE_READ_HDR;
> > >   			goto next_op;
> > >   		}
> > > @@ -5919,6 +5926,8 @@ static int osd_sparse_read(struct ceph_connection *con,
> > >   		eoff = sr->sr_extent[sr->sr_index].off;
> > >   		elen = sr->sr_extent[sr->sr_index].len;
> > >   
> > > +		sr->sr_datalen -= elen;
> > > +
> > >   		dout("[%d] ext %d off 0x%llx len 0x%llx\n",
> > >   		     o->o_osd, sr->sr_index, eoff, elen);
> > >   
>
Xiubo Li Jan. 22, 2024, 3:17 a.m. UTC | #4
On 1/19/24 19:03, Jeff Layton wrote:
> On Fri, 2024-01-19 at 12:07 +0800, Xiubo Li wrote:
>> On 1/18/24 22:03, Jeff Layton wrote:
>>> On Thu, 2024-01-18 at 18:50 +0800, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> Once this happens that means there have bugs.
>>>>
>>>> URL: https://tracker.ceph.com/issues/63586
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    net/ceph/osd_client.c | 9 +++++++++
>>>>    1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>>> index 9be80d01c1dc..f8029b30a3fb 100644
>>>> --- a/net/ceph/osd_client.c
>>>> +++ b/net/ceph/osd_client.c
>>>> @@ -5912,6 +5912,13 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>>    		fallthrough;
>>>>    	case CEPH_SPARSE_READ_DATA:
>>>>    		if (sr->sr_index >= count) {
>>>> +			if (sr->sr_datalen) {
>>>> +				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
>>>> +						    sr->sr_datalen, sr->sr_index,
>>>> +						    count);
>>>> +				return -EREMOTEIO;
>>>> +			}
>>>> +
>>> Ok, so the server has (presumably) sent us a longer value for the
>>> sr_datalen than was in the extent map?
>>>
>>> Why should the sparse read engine care about that? It was (presumably)
>>> able to do its job of handling the read. Why not just advance past the
>>> extra junk and try to do another sparse read? Do we really need to fail
>>> the op for this?
>> Hi Jeff,
>>
>> I saw the problem just when I first debugging the sparse-read bug, and
>> the length will be very large, more detail and the logs please see the
>> tracker https://tracker.ceph.com/issues/63586:
>>
>>        11741055 <4>[180940.606488] libceph: sr_datalen 251723776 sr_index
>> 0 count 0
>>
>> In this case the request could cause the same request being retrying
>> infinitely. While just in other case the when the ceph send a incorrect
>> data length, how should we do ? Should we retry it ? How could we skip
>> it if the length is so large ?
>>
>>
> If the sparse_read datalen is so long that it goes beyond the length of
> the entire frame, then it's clearly malformed and we have to reject it.
>
> I do wonder though whether this is the right place to do that. It seems
> like the client ought to do that sort of vetting of lengths before it
> starts dealing with the read data.
>
> IOW, maybe there should be a simple check in the
> CEPH_SPARSE_READ_DATA_LEN case that validates that the sr_datalen fits
> inside the "data_len" of the entire frame?

Well it should be in CEPH_SPARSE_READ_DATA_PRE instead.

I can improve this.

Thanks

- Xiubo

>>
>>>>    			sr->sr_state = CEPH_SPARSE_READ_HDR;
>>>>    			goto next_op;
>>>>    		}
>>>> @@ -5919,6 +5926,8 @@ static int osd_sparse_read(struct ceph_connection *con,
>>>>    		eoff = sr->sr_extent[sr->sr_index].off;
>>>>    		elen = sr->sr_extent[sr->sr_index].len;
>>>>    
>>>> +		sr->sr_datalen -= elen;
>>>> +
>>>>    		dout("[%d] ext %d off 0x%llx len 0x%llx\n",
>>>>    		     o->o_osd, sr->sr_index, eoff, elen);
>>>>
diff mbox series

Patch

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 9be80d01c1dc..f8029b30a3fb 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5912,6 +5912,13 @@  static int osd_sparse_read(struct ceph_connection *con,
 		fallthrough;
 	case CEPH_SPARSE_READ_DATA:
 		if (sr->sr_index >= count) {
+			if (sr->sr_datalen) {
+				pr_warn_ratelimited("sr_datalen %u sr_index %d count %u\n",
+						    sr->sr_datalen, sr->sr_index,
+						    count);
+				return -EREMOTEIO;
+			}
+
 			sr->sr_state = CEPH_SPARSE_READ_HDR;
 			goto next_op;
 		}
@@ -5919,6 +5926,8 @@  static int osd_sparse_read(struct ceph_connection *con,
 		eoff = sr->sr_extent[sr->sr_index].off;
 		elen = sr->sr_extent[sr->sr_index].len;
 
+		sr->sr_datalen -= elen;
+
 		dout("[%d] ext %d off 0x%llx len 0x%llx\n",
 		     o->o_osd, sr->sr_index, eoff, elen);