Message ID | 20140319014557.GA1631@kernel.org (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Tue, Mar 18 2014 at 9:45pm -0400, Shaohua Li <shli@kernel.org> wrote: > On Tue, Mar 18, 2014 at 05:28:43PM -0400, Mike Snitzer wrote: > > On Tue, Mar 18 2014 at 3:41am -0400, > > Shaohua Li <shli@kernel.org> wrote: > > > > > On Mon, Mar 17, 2014 at 04:00:40PM -0400, Mike Snitzer wrote: > > > > > > > > I folded your changes in, and then committed a patch ontop that cleans > > > > some code up. But added 2 FIXMEs that still speak to pretty fundamental > > > > problems with the architecture of the dm-insitu-comp target, see: > > > > https://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=for-3.15-insitu-comp&id=8565ab6b04837591d03c94851c2f9f9162ce12f4 > > > > > > > > Unfortunately the single insitu_comp_wq workqueue that all insitu-comp > > > > targets are to share isn't a workable solution. Each target needs to > > > > have resource isolation from other targets (imagine insitu-comp used for > > > > multiple SSDs). This is important for suspend too because you'll need > > > > to flush/stop the workqueue. > > > > > > Is this just because of suspend? I didn't see fundamental reason why the > > > workqueue can't be shared even for several targets. > > > > I'm not seeing how you are guaranteeing that all queued work is > > completed during suspend. insitu_comp_queue_req() just calls > > queue_work_on(). > > > > BTW, queue_work_on()'s comment above its implementation says: > > "We queue the work to a specific CPU, the caller must ensure it can't go > > away." -- you're not able to insure a cpu isn't hotplugged so... but I > > also see you've used it in your raid5 perf improvement changes so you > > obviously have experience with using this interface. > > Good point, I did miss this. the raid5 case hold a lock, while this case > doesn't. A fix is attached below. I've pushed it to the branch. > > > > You introduced a state machine for tracking suspending, suspended, > > > > resumed. This really isn't necessary. During suspend you need to > > > > flush_workqueue(). On resume you shouldn't need to do anything special. > > > > > > > > As I noted in the commit, the thin and cache targets can serve as > > > > references for how you can manage the workqueue across suspend/resume > > > > and the lifetime of these workqueues relative to .ctr and .dtr. > > > > > > As far as I checking the code, .postsuspend is called after all requests are > > > finished. This already guarantees no pending requests running in insitu-comp > > > workqueue. > > > > I could easily be missing something obvious, but I don't see where that > > guarantee is implemented. > > Alright, so this is the divergence. dm_suspend() calls dm_wait_for_completion() > and then dm_table_postsuspend_targets(). As far as I understand, > dm_wait_for_completion() will wait all pending requests to finish. The comments > declaim this too. Am I missing anything? > > Basically the two kinds of IO. IO requests from upper layer, which > dm_wait_for_completion() will guarantee they are finished. Metadata IO > requests, which .postsuspend should make sure they are finished. OK, I see. You don't have the notion of a transaction, that I can see, so this this begs the question: what kind of crash consistency/recovery are you providing? Seems that the metadata writeback support is allowing for more potential for lost data on a crash. Also, it isn't clear how you're coping with the potential for a crash while updating a extent (when metadata bits are borrowed from the tail, etc). Without transaction a block (or extent of blocks) could be partially updated, how are you guaranteeing the corresponding data is either entirely old or new? The compressed nature of this data makes a requirement for atomic updates to occur here. Are you somehow leveraging Fusion-io SSD to provide such guarantee? So effectively there are concerns about data integrity of this target that need answering. Unfortunately I'm running low on time I can dedicate to continued review of this target and need to transition to other priorities. > > > Doing a workqueue flush isn't required. The writeback thread is > > > running in background and waiting for requests completion can't guarantee the > > > thread isn't running, so we must make sure it is safely parked. > > > > Sure, but you don't need a state machine to do that. The DM core takes > > care of calling these hooks, so you just need to stop the writeback > > thread during suspend and (re)start/kick it on resume (preresume). > > Yep, I need wait the writeback thread finish all pending metadata IO, the state > machine works well here. I see what you've done, I get that you're using the state machine to wait but I still contend it isn't needed. Like I said before, just stop writeback in suspend and restart on resume. No state is needed. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Index: linux/drivers/md/dm-insitu-comp.c =================================================================== --- linux.orig/drivers/md/dm-insitu-comp.c 2014-03-17 17:40:01.106660303 +0800 +++ linux/drivers/md/dm-insitu-comp.c 2014-03-19 08:49:12.314627050 +0800 @@ -704,14 +704,19 @@ static void insitu_comp_queue_req(struct struct insitu_comp_req *req) { unsigned long flags; - struct insitu_comp_io_worker *worker = - &insitu_comp_io_workers[req->cpu]; + struct insitu_comp_io_worker *worker; + + preempt_disable(); + if (!cpu_online(req->cpu)) + req->cpu = cpumask_any(cpu_online_mask); + worker = &insitu_comp_io_workers[req->cpu]; spin_lock_irqsave(&worker->lock, flags); list_add_tail(&req->sibling, &worker->pending); spin_unlock_irqrestore(&worker->lock, flags); queue_work_on(req->cpu, insitu_comp_wq, &worker->work); + preempt_enable(); } static void insitu_comp_queue_req_list(struct insitu_comp_info *info,