diff mbox

Btrfs: use larger limit for transition of logical to inode

Message ID 1345712189-6455-2-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Liu Bo Aug. 23, 2012, 8:56 a.m. UTC
This is the change of the kernel side.

Transition of logical to inode used to have a limit 4096 on inode container's
size, but the limit is not large enough for a data with a great many of refs,
so when resolving logical address, we can end up with
"ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493"

This changes to regard 4096 as the lowest limit.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/ioctl.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Jan Schmidt Aug. 23, 2012, 10:07 a.m. UTC | #1
On Thu, August 23, 2012 at 10:56 (+0200), Liu Bo wrote:
> This is the change of the kernel side.
> 
> Transition of logical to inode used to have a limit 4096 on inode container's
> size, but the limit is not large enough for a data with a great many of refs,
> so when resolving logical address, we can end up with
> "ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493"
> 
> This changes to regard 4096 as the lowest limit.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/ioctl.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9449b84..525915f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3232,7 +3232,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
>  		goto out;
>  	}
>  
> -	size = min_t(u32, loi->size, 4096);
> +	size = max_t(u32, loi->size, 4096);

Hum. I added this because I wanted to avoid allocations > PAGE_SIZE. We're doing
kmalloc GFP_NOFS with whatever one enters as size, I'm not sure that's a good
idea without any sanitizing.

Second, we should probably add a fall back option to vmalloc, in case kmalloc
fails? Or should we even go for vmalloc directly, what do you think?

Thanks,
-Jan

>  	inodes = init_data_container(size);
>  	if (IS_ERR(inodes)) {
>  		ret = PTR_ERR(inodes);

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Aug. 23, 2012, 10:24 a.m. UTC | #2
On 08/23/2012 06:07 PM, Jan Schmidt wrote:
> On Thu, August 23, 2012 at 10:56 (+0200), Liu Bo wrote:
>> This is the change of the kernel side.
>>
>> Transition of logical to inode used to have a limit 4096 on inode container's
>> size, but the limit is not large enough for a data with a great many of refs,
>> so when resolving logical address, we can end up with
>> "ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493"
>>
>> This changes to regard 4096 as the lowest limit.
>>
>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
>> ---
>>  fs/btrfs/ioctl.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 9449b84..525915f 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -3232,7 +3232,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
>>  		goto out;
>>  	}
>>  
>> -	size = min_t(u32, loi->size, 4096);
>> +	size = max_t(u32, loi->size, 4096);
> 
> Hum. I added this because I wanted to avoid allocations > PAGE_SIZE. We're doing
> kmalloc GFP_NOFS with whatever one enters as size, I'm not sure that's a good
> idea without any sanitizing.
> 

Yeah, I agree.

So we do need to make some sanity checks, according to my tests,
we need about 30k to resolve a file shared by 4000 snapshots.

What about 32k as a upside limit?

> Second, we should probably add a fall back option to vmalloc, in case kmalloc
> fails? Or should we even go for vmalloc directly, what do you think?
>

Given loi->size is not reliable, going for vmalloc for an ioctl is reasonable.

thanks,
liubo

> Thanks,
> -Jan
> 
>>  	inodes = init_data_container(size);
>>  	if (IS_ERR(inodes)) {
>>  		ret = PTR_ERR(inodes);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Aug. 23, 2012, 10:30 a.m. UTC | #3
On 08/23/2012 06:24 PM, Liu Bo wrote:
> On 08/23/2012 06:07 PM, Jan Schmidt wrote:
>> On Thu, August 23, 2012 at 10:56 (+0200), Liu Bo wrote:
>>> This is the change of the kernel side.
>>>
>>> Transition of logical to inode used to have a limit 4096 on inode container's
>>> size, but the limit is not large enough for a data with a great many of refs,
>>> so when resolving logical address, we can end up with
>>> "ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493"
>>>
>>> This changes to regard 4096 as the lowest limit.
>>>
>>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
>>> ---
>>>  fs/btrfs/ioctl.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 9449b84..525915f 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -3232,7 +3232,7 @@ static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
>>>  		goto out;
>>>  	}
>>>  
>>> -	size = min_t(u32, loi->size, 4096);
>>> +	size = max_t(u32, loi->size, 4096);
>>
>> Hum. I added this because I wanted to avoid allocations > PAGE_SIZE. We're doing
>> kmalloc GFP_NOFS with whatever one enters as size, I'm not sure that's a good
>> idea without any sanitizing.
>>
> 
> Yeah, I agree.
> 
> So we do need to make some sanity checks, according to my tests,
> we need about 30k to resolve a file shared by 4000 snapshots.
> 

s/4000/2000/g

> What about 32k as a upside limit?
> 
>> Second, we should probably add a fall back option to vmalloc, in case kmalloc
>> fails? Or should we even go for vmalloc directly, what do you think?
>>
> 
> Given loi->size is not reliable, going for vmalloc for an ioctl is reasonable.
> 
> thanks,
> liubo
> 
>> Thanks,
>> -Jan
>>
>>>  	inodes = init_data_container(size);
>>>  	if (IS_ERR(inodes)) {
>>>  		ret = PTR_ERR(inodes);
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Aug. 24, 2012, 4:46 p.m. UTC | #4
On Thu, Aug 23, 2012 at 06:24:02PM +0800, Liu Bo wrote:
> > Hum. I added this because I wanted to avoid allocations > PAGE_SIZE. We're doing
> > kmalloc GFP_NOFS with whatever one enters as size, I'm not sure that's a good
> > idea without any sanitizing.
> 
> Yeah, I agree.
> 
> So we do need to make some sanity checks, according to my tests,
> we need about 30k to resolve a file shared by 4000 snapshots.
> 
> What about 32k as a upside limit?

64k? As it's an upper limit, let's make it a bit higher than what one
can reach normally.

> > Second, we should probably add a fall back option to vmalloc, in case kmalloc
> > fails? Or should we even go for vmalloc directly, what do you think?
> 
> Given loi->size is not reliable, going for vmalloc for an ioctl is reasonable.

Yeah, vmalloc is ok here, we can safely afford to return ENOMEM.

AFAICS, we could possibly reuse the userspace buffer, iff build_ino_list
uses copy_to_user instead of directly writing to the data buffer. This
would eliminate the kernel-side ENOMEM completely at the cost of
function call overhead and the access_ok checks. And compared to the
simplicity of just vmalloc with rare occurence of ENOMEM, I don't think
it's worth the work.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9449b84..525915f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3232,7 +3232,7 @@  static long btrfs_ioctl_logical_to_ino(struct btrfs_root *root,
 		goto out;
 	}
 
-	size = min_t(u32, loi->size, 4096);
+	size = max_t(u32, loi->size, 4096);
 	inodes = init_data_container(size);
 	if (IS_ERR(inodes)) {
 		ret = PTR_ERR(inodes);