diff mbox

Re: [PATCH-v2 2/2] Initialize mempool and elevator only for request-based dm devices

Message ID 200908111435.12020.knikanth@suse.de (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Nikanth Karthikesan Aug. 11, 2009, 9:05 a.m. UTC
On Tuesday 11 August 2009 13:36:24 Kiyoshi Ueda wrote:
> Hi Nikanth,
>
> On 08/10/2009 07:48 PM +0900, Nikanth Karthikesan wrote:
> > Intialize the request_queue and elevator only when the device is marked
> > as a request-based device. This avoids unnecessary creation of mempool
> > for requests. Also we wrongly initialize the elevator even for bio-based
> > devices. As the /sys/block/dm-*/queue/scheduler is exported for
> > device-mapper devices, it is possible to confuse with scheduler options
> > for bio-based devices where scheduler is not at all used.
>
> Thank you for working on this.
> Actually, I had tried this delayed allocation thing before,
> but I chose the current implementation since I couldn't solve
> some problems, which your patch also has.
> Please see my comment below.
>

Thanks for the review & comments.

> > @@ -2203,6 +2199,25 @@ int dm_swap_table(struct mapped_device *md, struct
> > dm_table *table) goto out;
> >  	}
> >
> > +	/* new device is being marked as request-based */
> > +	if (!md->map && dm_table_request_based(table)) {
> > +		/* initialize queue for request-based dm */
> > +		r = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > +		if (r)
> > +			goto out;
>
> Generally, dm must not allocate memory during resume because
> it may cause a deadlock in no memory situation.
> However, there is no I/O on this device at this point,
> so the allocation should be ok for this special case.
> I think some comments are needed here to describe that.
>

Ok. This comment can be added.

> > +
> > +		/*
> > +		 * reinitialize make_request_fn as it was reset to the
> > +		 * default __make_request by blk_init_allocate_queue
> > +		 */
> > +		md->saved_make_request_fn = md->queue->make_request_fn;
> > +		blk_queue_make_request(md->queue, dm_request);
> > +
> > +		blk_queue_softirq_done(md->queue, dm_softirq_done);
> > +		blk_queue_prep_rq(md->queue, dm_prep_fn);
> > +		blk_queue_lld_busy(md->queue, dm_lld_busy);
> > +	}
> > +
> >  	__unbind(md);
> >  	r = __bind(md, table, &limits);
>
> The queue has been registered at the device creation time by
> add_disk() in alloc_dev().
> Since the queue is reconfigured (elevator is attached), you have to
> update the queue registration (e.g. unregister, then re-register).
> But it may not be easy.  At least, there is no exported interface to
> unregister/re-register queue.

Ah, yes. The scheduler attributes will not be exported in 
/sys/block/dm*/queue/iosched. Exporting elv_register_queue() and calling it 
here solves it. Something like..

@@ -2203,6 +2199,29 @@ int dm_swap_table(struct mapped_device *md, struct 
dm_table *table)
 		goto out;
 	}
 
+	/* new device is being marked as request-based */
+	if (!md->map && dm_table_request_based(table)) {
+		/* initialize queue for request-based dm */
+		r = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
+		if (r)
+			goto out;
+
+		r = elv_register_queue(md->queue);
+		/* if (r)
+		 *	 goto out; Better to ignore, just like add_disk does ;-)
+		 */
+		/*
+		 * reinitialize make_request_fn as it was reset to the
+		 * default __make_request by blk_init_allocate_queue
+		 */
+		md->saved_make_request_fn = md->queue->make_request_fn;
+		blk_queue_make_request(md->queue, dm_request);
+
+		blk_queue_softirq_done(md->queue, dm_softirq_done);
+		blk_queue_prep_rq(md->queue, dm_prep_fn);
+		blk_queue_lld_busy(md->queue, dm_lld_busy);
+	}
+
 	__unbind(md);
 	r = __bind(md, table, &limits);
 
I would post the v3 of the patches with this change. Do you see any problems 
in this?

This can also be solved by initializing the queue for new devices and then 
unregistering the elevator, if it is a bio-based device at table load time. 
Like...

 out:

But, I think, delaying the initialization, is the best solution.

Thanks
Nikanth

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Kiyoshi Ueda Aug. 12, 2009, 2:15 a.m. UTC | #1
Hi Nikanth,

On 08/11/2009 06:05 PM +0900, Nikanth Karthikesan wrote:
> On Tuesday 11 August 2009 13:36:24 Kiyoshi Ueda wrote:
>> On 08/10/2009 07:48 PM +0900, Nikanth Karthikesan wrote:
>>> +
>>> +		/*
>>> +		 * reinitialize make_request_fn as it was reset to the
>>> +		 * default __make_request by blk_init_allocate_queue
>>> +		 */
>>> +		md->saved_make_request_fn = md->queue->make_request_fn;
>>> +		blk_queue_make_request(md->queue, dm_request);
>>> +
>>> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
>>> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
>>> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
>>> +	}
>>> +
>>>  	__unbind(md);
>>>  	r = __bind(md, table, &limits);
>> The queue has been registered at the device creation time by
>> add_disk() in alloc_dev().
>> Since the queue is reconfigured (elevator is attached), you have to
>> update the queue registration (e.g. unregister, then re-register).
>> But it may not be easy.  At least, there is no exported interface to
>> unregister/re-register queue.
> 
> Ah, yes. The scheduler attributes will not be exported in 
> /sys/block/dm*/queue/iosched. Exporting elv_register_queue() and calling it 
> here solves it. Something like..
> 
> @@ -2203,6 +2199,29 @@ int dm_swap_table(struct mapped_device *md, struct 
> dm_table *table)
>  		goto out;
>  	}
>  
> +	/* new device is being marked as request-based */
> +	if (!md->map && dm_table_request_based(table)) {
> +		/* initialize queue for request-based dm */
> +		r = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> +		if (r)
> +			goto out;
> +
> +		r = elv_register_queue(md->queue);
> +		/* if (r)
> +		 *	 goto out; Better to ignore, just like add_disk does ;-)
> +		 */
> +		/*
> +		 * reinitialize make_request_fn as it was reset to the
> +		 * default __make_request by blk_init_allocate_queue
> +		 */
> +		md->saved_make_request_fn = md->queue->make_request_fn;
> +		blk_queue_make_request(md->queue, dm_request);
> +
> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
> +	}
> +
>  	__unbind(md);
>  	r = __bind(md, table, &limits);
>  
> I would post the v3 of the patches with this change. Do you see any problems 
> in this?

Humm, it might work for now, but I disagree with that.

Since elevator is block internal and dm doesn't really care
(its initialization is actually hidden in blk_init_allocated_queue()),
directly calling elv_register_queue() from dm seems not right.
It will likely introduce a bug by future changes in block layer.

I think the right approach is to define a proper block layer interface
to reflect the queue configuration change.
That's why I said "Updating the queue registration may not be easy".

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Nikanth Karthikesan Aug. 12, 2009, 8:47 a.m. UTC | #2
Hi Kiyoshi Ueda,

On Wednesday 12 August 2009 07:45:56 Kiyoshi Ueda wrote:
> Hi Nikanth,
>
> On 08/11/2009 06:05 PM +0900, Nikanth Karthikesan wrote:
> > On Tuesday 11 August 2009 13:36:24 Kiyoshi Ueda wrote:
> >> On 08/10/2009 07:48 PM +0900, Nikanth Karthikesan wrote:
> >>> +
> >>> +		/*
> >>> +		 * reinitialize make_request_fn as it was reset to the
> >>> +		 * default __make_request by blk_init_allocate_queue
> >>> +		 */
> >>> +		md->saved_make_request_fn = md->queue->make_request_fn;
> >>> +		blk_queue_make_request(md->queue, dm_request);
> >>> +
> >>> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
> >>> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
> >>> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
> >>> +	}
> >>> +
> >>>  	__unbind(md);
> >>>  	r = __bind(md, table, &limits);
> >>
> >> The queue has been registered at the device creation time by
> >> add_disk() in alloc_dev().
> >> Since the queue is reconfigured (elevator is attached), you have to
> >> update the queue registration (e.g. unregister, then re-register).
> >> But it may not be easy.  At least, there is no exported interface to
> >> unregister/re-register queue.
> >
> > Ah, yes. The scheduler attributes will not be exported in
> > /sys/block/dm*/queue/iosched. Exporting elv_register_queue() and calling
> > it here solves it. Something like..
> >
> > @@ -2203,6 +2199,29 @@ int dm_swap_table(struct mapped_device *md, struct
> > dm_table *table)
> >  		goto out;
> >  	}
> >
> > +	/* new device is being marked as request-based */
> > +	if (!md->map && dm_table_request_based(table)) {
> > +		/* initialize queue for request-based dm */
> > +		r = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
> > +		if (r)
> > +			goto out;
> > +
> > +		r = elv_register_queue(md->queue);
> > +		/* if (r)
> > +		 *	 goto out; Better to ignore, just like add_disk does ;-)
> > +		 */
> > +		/*
> > +		 * reinitialize make_request_fn as it was reset to the
> > +		 * default __make_request by blk_init_allocate_queue
> > +		 */
> > +		md->saved_make_request_fn = md->queue->make_request_fn;
> > +		blk_queue_make_request(md->queue, dm_request);
> > +
> > +		blk_queue_softirq_done(md->queue, dm_softirq_done);
> > +		blk_queue_prep_rq(md->queue, dm_prep_fn);
> > +		blk_queue_lld_busy(md->queue, dm_lld_busy);
> > +	}
> > +
> >  	__unbind(md);
> >  	r = __bind(md, table, &limits);
> >
> > I would post the v3 of the patches with this change. Do you see any
> > problems in this?
>
> Humm, it might work for now, but I disagree with that.
>
> Since elevator is block internal and dm doesn't really care
> (its initialization is actually hidden in blk_init_allocated_queue()),
> directly calling elv_register_queue() from dm seems not right.
> It will likely introduce a bug by future changes in block layer.
>
> I think the right approach is to define a proper block layer interface
> to reflect the queue configuration change.
> That's why I said "Updating the queue registration may not be easy".
>

I do not see too much of overhead in the future with this approach, atleast no 
more than "proper block layer interface". IMHO, unregistering the queue and 
registering the queue again with the elevator, is basically wasting CPU cycles 
and possibly would confuse the user-space, which may be watching the sysfs... 
Or asking block layer to recheck and find what we have changed in the 
request_queue. It does not sound like the best solution.

It is better to tell the block-layer that we have added a q->request_fn 
function, so initialize the elevator.

If block layer, exports elv_register_queue() and document it, it would become 
a proper block layer interface, right? Device-mapper would always depend on 
internals of block-layer to some extent. ;-)

Thanks
Nikanth

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Kiyoshi Ueda Aug. 14, 2009, 7:01 a.m. UTC | #3
Hi Nikanth,

On 08/12/2009 05:47 PM +0900, Nikanth Karthikesan wrote:
> Hi Kiyoshi Ueda,
> 
> On Wednesday 12 August 2009 07:45:56 Kiyoshi Ueda wrote:
>> Hi Nikanth,
>>
>> On 08/11/2009 06:05 PM +0900, Nikanth Karthikesan wrote:
>>> On Tuesday 11 August 2009 13:36:24 Kiyoshi Ueda wrote:
>>>> On 08/10/2009 07:48 PM +0900, Nikanth Karthikesan wrote:
>>>>> +
>>>>> +		/*
>>>>> +		 * reinitialize make_request_fn as it was reset to the
>>>>> +		 * default __make_request by blk_init_allocate_queue
>>>>> +		 */
>>>>> +		md->saved_make_request_fn = md->queue->make_request_fn;
>>>>> +		blk_queue_make_request(md->queue, dm_request);
>>>>> +
>>>>> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
>>>>> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
>>>>> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
>>>>> +	}
>>>>> +
>>>>>  	__unbind(md);
>>>>>  	r = __bind(md, table, &limits);
>>>> The queue has been registered at the device creation time by
>>>> add_disk() in alloc_dev().
>>>> Since the queue is reconfigured (elevator is attached), you have to
>>>> update the queue registration (e.g. unregister, then re-register).
>>>> But it may not be easy.  At least, there is no exported interface to
>>>> unregister/re-register queue.
>>> Ah, yes. The scheduler attributes will not be exported in
>>> /sys/block/dm*/queue/iosched. Exporting elv_register_queue() and calling
>>> it here solves it. Something like..
>>>
>>> @@ -2203,6 +2199,29 @@ int dm_swap_table(struct mapped_device *md, struct
>>> dm_table *table)
>>>  		goto out;
>>>  	}
>>>
>>> +	/* new device is being marked as request-based */
>>> +	if (!md->map && dm_table_request_based(table)) {
>>> +		/* initialize queue for request-based dm */
>>> +		r = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
>>> +		if (r)
>>> +			goto out;
>>> +
>>> +		r = elv_register_queue(md->queue);
>>> +		/* if (r)
>>> +		 *	 goto out; Better to ignore, just like add_disk does ;-)
>>> +		 */
>>> +		/*
>>> +		 * reinitialize make_request_fn as it was reset to the
>>> +		 * default __make_request by blk_init_allocate_queue
>>> +		 */
>>> +		md->saved_make_request_fn = md->queue->make_request_fn;
>>> +		blk_queue_make_request(md->queue, dm_request);
>>> +
>>> +		blk_queue_softirq_done(md->queue, dm_softirq_done);
>>> +		blk_queue_prep_rq(md->queue, dm_prep_fn);
>>> +		blk_queue_lld_busy(md->queue, dm_lld_busy);
>>> +	}
>>> +
>>>  	__unbind(md);
>>>  	r = __bind(md, table, &limits);
>>>
>>> I would post the v3 of the patches with this change. Do you see any
>>> problems in this?
>> Humm, it might work for now, but I disagree with that.
>>
>> Since elevator is block internal and dm doesn't really care
>> (its initialization is actually hidden in blk_init_allocated_queue()),
>> directly calling elv_register_queue() from dm seems not right.
>> It will likely introduce a bug by future changes in block layer.
>>
>> I think the right approach is to define a proper block layer interface
>> to reflect the queue configuration change.
>> That's why I said "Updating the queue registration may not be easy".
> 
> I do not see too much of overhead in the future with this approach,
> atleast no more than "proper block layer interface".

I don't think so.
Just exporting elv_register_queue() will cause some maintenance costs
to request-based dm developers as below.

Although currently only elevator is the queue's feature which is
needed for only request-based dm, such other features may be added
to queue in the future.
Then, the developer who added the feature may not notice that
request-based dm needs to register the feature here, if there
is no critical problem (e.g. compile error or panic) without it.
That causes the lack of such features only in request-based dm.
Therefore, request-based dm developers always have to watch
the change of the block-layer and make the registration related code.
I think it's a sort of big maintenance cost.

So we should make the code as the change of the block-layer becomes
effective automatically in request-based dm, too, as mush as possible.
In this case, you should make/call an interface for the whole queue,
not only for the elevator, since dm can't/shouldn't know how
blk_init_allocated_queue() initializes the queue.
(And the interface should be used in other generic paths (e.g. add_disk()))
That's a proper block-layer interface which I mentioned, and this
approach should have less overhead than your approach from view point
of longer period.


> IMHO, unregistering the queue and registering the queue again with
> the elevator, is basically wasting CPU cycles and possibly would
> confuse the user-space, which may be watching the sysfs... 

Right, so I said "Updating may not be easy."
(By the way, wasting CPU cycles doesn't matter here, since it happens
 only when we initialize the device and it shouldn't too much.)


> Or asking block layer to recheck and find what we have changed
> in the request_queue. It does not sound like the best solution.

I think this is a better solution than exposing a part of queue
internals as I described above.


> It is better to tell the block-layer that we have added a q->request_fn 
> function, so initialize the elevator.

I don't think it's better as I described above.
(dm can't/shouldn't know how blk_init_allocated_queue() initializes
 the queue.)



By the way, another approach to optimizing the memory usage would be
to determine whether the dm device is bio-based or request-based
at the device creation time, instead of the table binding time.
We want the delayed allocation, since kernel can't decide the device
type until the first table is bound because of the auto-detection
mechanism.  The auto-detection is good for keeping compatibility with
existing user-space tools.  But once user-space tools are changed to
specify device type at the device creation time, we can eventually
remove the auto-detection.
Then, kernel can decide device type in alloc_dev(), so
the initialization code in kernel will become very simple.

FYI, actually, I had this approach in a very early stage of
request-based dm development:
    [kernel]     http://marc.info/?l=dm-devel&m=116656637419846&w=2
    [kernel]     http://marc.info/?l=dm-devel&m=116656689701459&w=2
    [kernel]     http://marc.info/?l=dm-devel&m=116656689707043&w=2
    [user-space] http://marc.info/?l=dm-devel&m=116656689906056&w=2
Now, you can change user-space first before kernel, since
request-based dm is already available.

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/elevator.c b/block/elevator.c
index 2d511f9..864dd29 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -939,9 +939,10 @@  static void __elv_unregister_queue(struct elevator_queue 
*e)
 
 void elv_unregister_queue(struct request_queue *q)
 {
-	if (q)
+	if (q && q->elevator)
 		__elv_unregister_queue(q->elevator);
 }
+EXPORT_SYMBOL(elv_unregister_queue);
 
 void elv_register(struct elevator_type *e)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8a311ea..f6f77ea 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2203,7 +2203,28 @@  int dm_swap_table(struct mapped_device *md, struct 
dm_table *table)
 		goto out;
 	}
 
-	__unbind(md);
+	if (md->map)
+		__unbind(md);
+	else if (!dm_table_request_based(table)) {
+	/*
+	 * This is a new bio-based device, and doesnt use the elevator
+	 * and requests.
+	 */
+		struct request_queue *q;
+		q = md->queue;
+		if (q->elevator) {
+			struct request_list *rl = &q->rq;
+			elevator_exit(q->elevator);
+			elv_unregister_queue(q);
+			q->elevator = 0;
+			if (rl->rq_pool) {
+				mempool_destroy(rl->rq_pool);
+				rl->rq_pool = 0;
+			}
+		}
+
+	}
+
 	r = __bind(md, table, &limits);