From patchwork Wed Aug 10 02:19:49 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liu Yuan X-Patchwork-Id: 1051612 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p7A2KkH9023777 for ; Wed, 10 Aug 2011 02:21:10 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751907Ab1HJCUA (ORCPT ); Tue, 9 Aug 2011 22:20:00 -0400 Received: from mail-gw0-f46.google.com ([74.125.83.46]:65425 "EHLO mail-gw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751887Ab1HJCT6 (ORCPT ); Tue, 9 Aug 2011 22:19:58 -0400 Received: by gwaa12 with SMTP id a12so414363gwa.19 for ; Tue, 09 Aug 2011 19:19:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=ebEW/GO5FN2SOVbZA8aZqu6TOyqyKOyu+CFEPO2iUEs=; b=EAI87WWJfpNqkEr0Q2UXGFOlqStJLRmkud8rjjhXGmq6JA+69jHfb7DiF//z2C6gSW D9jFKQHaIJcZmuYzJKT9A2HMKrEZDCYDhM7jKq9ngDPY7LnkYluWQzBagdmInqIJ4keE dG4PfFSh1S2C8p8WKzFvrvOHFlE/piTm9k25s= Received: by 10.142.148.2 with SMTP id v2mr842892wfd.368.1312942796779; Tue, 09 Aug 2011 19:19:56 -0700 (PDT) Received: from [10.32.105.92] ([182.92.247.2]) by mx.google.com with ESMTPS id f16sm236987wfg.22.2011.08.09.19.19.51 (version=SSLv3 cipher=OTHER); Tue, 09 Aug 2011 19:19:55 -0700 (PDT) Message-ID: <4E41EAC5.8060001@gmail.com> Date: Wed, 10 Aug 2011 10:19:49 +0800 From: Liu Yuan User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: Badari Pulavarty CC: Stefan Hajnoczi , "Michael S. Tsirkin" , Rusty Russell , Avi Kivity , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Khoa Huynh Subject: Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device References: <1311863346-4338-1-git-send-email-namei.unix@gmail.com> <4E325F98.5090308@gmail.com> <4E32F7F2.4080607@us.ibm.com> <4E363DB9.70801@gmail.com> <1312495132.9603.4.camel@badari-desktop> <4E3BCE4D.7090809@gmail.com> <4E3C302A.3040500@us.ibm.com> <4E3F3D4E.70104@gmail.com> <4E3F6E72.1000907@us.ibm.com> <4E3F90E3.9080600@gmail.com> <4E4019E1.2090508@us.ibm.com> In-Reply-To: <4E4019E1.2090508@us.ibm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Wed, 10 Aug 2011 02:32:26 +0000 (UTC) On 08/09/2011 01:16 AM, Badari Pulavarty wrote: > On 8/8/2011 12:31 AM, Liu Yuan wrote: >> On 08/08/2011 01:04 PM, Badari Pulavarty wrote: >>> On 8/7/2011 6:35 PM, Liu Yuan wrote: >>>> On 08/06/2011 02:02 AM, Badari Pulavarty wrote: >>>>> On 8/5/2011 4:04 AM, Liu Yuan wrote: >>>>>> On 08/05/2011 05:58 AM, Badari Pulavarty wrote: >>>>>>> Hi Liu Yuan, >>>>>>> >>>>>>> I started testing your patches. I applied your kernel patch to 3.0 >>>>>>> and applied QEMU to latest git. >>>>>>> >>>>>>> I passed 6 blockdevices from the host to guest (4 vcpu, 4GB RAM). >>>>>>> I ran simple "dd" read tests from the guest on all block devices >>>>>>> (with various blocksizes, iflag=direct). >>>>>>> >>>>>>> Unfortunately, system doesn't stay up. I immediately get into >>>>>>> panic on the host. I didn't get time to debug the problem. >>>>>>> Wondering >>>>>>> if you have seen this issue before and/or you have new patchset >>>>>>> to try ? >>>>>>> >>>>>>> Let me know. >>>>>>> >>>>>>> Thanks, >>>>>>> Badari >>>>>>> >>>>>> >>>>>> Okay, it is actually a bug pointed out by MST on the other >>>>>> thread, that it needs a mutex for completion thread. >>>>>> >>>>>> Now would you please this attachment?This patch only applies to >>>>>> kernel part, on top of v1 kernel patch. >>>>>> >>>>>> This patch mainly moves completion thread into vhost thread as a >>>>>> function. As a result, both requests submitting and completion >>>>>> signalling is in the same thread. >>>>>> >>>>>> Yuan >>>>> >>>>> Unfortunately, "dd" tests (4 out of 6) in the guest hung. I see >>>>> following messages >>>>> >>>>> virtio_blk virtio2: requests: id 0 is not a head ! >>>>> virtio_blk virtio3: requests: id 1 is not a head ! >>>>> virtio_blk virtio5: requests: id 1 is not a head ! >>>>> virtio_blk virtio1: requests: id 1 is not a head ! >>>>> >>>>> I still see host panics. I will collect the host panic and see if >>>>> its still same or not. >>>>> >>>>> Thanks, >>>>> Badari >>>>> >>>>> >>>> Would you please show me how to reproduce it step by step? I tried >>>> dd with two block device attached, but didn't get hung nor panic. >>>> >>>> Yuan >>> >>> I did 6 "dd"s on 6 block devices.. >>> >>> dd if=/dev/vdb of=/dev/null bs=1M iflag=direct & >>> dd if=/dev/vdc of=/dev/null bs=1M iflag=direct & >>> dd if=/dev/vdd of=/dev/null bs=1M iflag=direct & >>> dd if=/dev/vde of=/dev/null bs=1M iflag=direct & >>> dd if=/dev/vdf of=/dev/null bs=1M iflag=direct & >>> dd if=/dev/vdg of=/dev/null bs=1M iflag=direct & >>> >>> I can reproduce the problem with in 3 minutes :( >>> >>> Thanks, >>> Badari >>> >>> >> Ah...I made an embarrassing mistake that I tried to 'free()' an >> kmem_cache object. >> >> Would you please revert the vblk-for-kernel-2 patch and apply the new >> one attached in this letter? >> > Hmm.. My version of the code seems to have kzalloc() for used_info. I > don't have a version > that is using kmem_cache_alloc(). Would it be possible for you to send > out complete patch > (with all the fixes applied) for me to try ? This will avoid all the > confusion .. > > Thanks, > Badari > > Okay, please apply the attached patch to the vanilla kernel. :) Thanks, Yuan diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index 72dd020..31f8b2e 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -1,2 +1,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o +obj-m += vhost_blk.o + vhost_net-y := vhost.o net.o +vhost_blk-y := vhost.o blk.o diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c new file mode 100644 index 0000000..c372011 --- /dev/null +++ b/drivers/vhost/blk.c @@ -0,0 +1,530 @@ +/* Copyright (C) 2011 Taobao, Inc. + * Author: Liu Yuan + * + * This work is licensed under the terms of the GNU GPL, version 2. + * + * Vhost-blk driver is an in-kernel accelerator, intercepting the + * IO requests from KVM virtio-capable guests. It is based on the + * vhost infrastructure. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vhost.h" + +#define DEBUG 0 + +#if DEBUG > 0 +#define dprintk printk +#else +#define dprintk(x...) do { ; } while (0) +#endif + +enum { + virtqueue_max = 1, +}; + +#define MAX_EVENTS 128 + +struct vhost_blk { + struct vhost_virtqueue vq; + struct vhost_dev dev; + int should_stop; + struct kioctx *ioctx; + struct eventfd_ctx *ectx; + struct file *efile; + struct task_struct *worker; + struct vhost_poll poll; +}; + +struct used_info { + void *status; + int head; + int len; +}; + +static struct io_event events[MAX_EVENTS]; + +static struct kmem_cache *used_info_cachep; + +static void blk_flush(struct vhost_blk *blk) +{ + vhost_poll_flush(&blk->vq.poll); + vhost_poll_flush(&blk->poll); +} + +static long blk_set_features(struct vhost_blk *blk, u64 features) +{ + blk->dev.acked_features = features; + return 0; +} + +static void blk_stop(struct vhost_blk *blk) +{ + struct vhost_virtqueue *vq = &blk->vq; + struct file *f; + + mutex_lock(&vq->mutex); + f = rcu_dereference_protected(vq->private_data, + lockdep_is_held(&vq->mutex)); + rcu_assign_pointer(vq->private_data, NULL); + mutex_unlock(&vq->mutex); + + if (f) + fput(f); +} + +static long blk_set_backend(struct vhost_blk *blk, struct vhost_vring_file *backend) +{ + int idx = backend->index; + struct vhost_virtqueue *vq = &blk->vq; + struct file *file, *oldfile; + int ret; + + mutex_lock(&blk->dev.mutex); + ret = vhost_dev_check_owner(&blk->dev); + if (ret) + goto err_dev; + if (idx >= virtqueue_max) { + ret = -ENOBUFS; + goto err_dev; + } + + mutex_lock(&vq->mutex); + + if (!vhost_vq_access_ok(vq)) { + ret = -EFAULT; + goto err_vq; + } + + file = fget(backend->fd); + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto err_vq; + } + + oldfile = rcu_dereference_protected(vq->private_data, + lockdep_is_held(&vq->mutex)); + if (file != oldfile) + rcu_assign_pointer(vq->private_data, file); + + mutex_unlock(&vq->mutex); + + if (oldfile) { + blk_flush(blk); + fput(oldfile); + } + + mutex_unlock(&blk->dev.mutex); + return 0; +err_vq: + mutex_unlock(&vq->mutex); +err_dev: + mutex_unlock(&blk->dev.mutex); + return ret; +} + +static long blk_reset_owner(struct vhost_blk *b) +{ + int ret; + + mutex_lock(&b->dev.mutex); + ret = vhost_dev_check_owner(&b->dev); + if (ret) + goto err; + blk_stop(b); + blk_flush(b); + ret = vhost_dev_reset_owner(&b->dev); +err: + mutex_unlock(&b->dev.mutex); + return ret; +} + +static int kernel_io_setup(unsigned nr_events, struct kioctx **ioctx) +{ + int ret = 0; + *ioctx = ioctx_alloc(nr_events); + if (IS_ERR(ioctx)) + ret = PTR_ERR(ioctx); + return ret; +} + +static inline int kernel_read_events(struct kioctx *ctx, long min_nr, long nr, struct io_event *event, + struct timespec *ts) +{ + mm_segment_t old_fs; + int ret; + + old_fs = get_fs(); + set_fs(get_ds()); + ret = read_events(ctx, min_nr, nr, event, ts); + set_fs(old_fs); + + return ret; +} + +static inline ssize_t io_event_ret(struct io_event *ev) +{ + return (ssize_t)(((uint64_t)ev->res2 << 32) | ev->res); +} + +static inline void aio_prep_req(struct kiocb *iocb, struct eventfd_ctx *ectx, struct file *file, + struct iovec *iov, int nvecs, u64 offset, int opcode, struct used_info *ui) +{ + iocb->ki_filp = file; + iocb->ki_eventfd = ectx; + iocb->ki_pos = offset; + iocb->ki_buf = (void *)iov; + iocb->ki_left = iocb->ki_nbytes = nvecs; + iocb->ki_opcode = opcode; + iocb->ki_obj.user = ui; +} + +static inline int kernel_io_submit(struct vhost_blk *blk, struct iovec *iov, u64 nvecs, loff_t pos, int opcode, int head, int len) +{ + int ret = -EAGAIN; + struct kiocb *req; + struct kioctx *ioctx = blk->ioctx; + struct used_info *ui = kmem_cache_zalloc(used_info_cachep, GFP_KERNEL); + struct file *f = blk->vq.private_data; + + try_get_ioctx(ioctx); + atomic_long_inc_not_zero(&f->f_count); + eventfd_ctx_get(blk->ectx); + + + req = aio_get_req(ioctx); /* return 2 refs of req*/ + if (unlikely(!req)) + goto out; + + ui->head = head; + ui->status = blk->vq.iov[nvecs + 1].iov_base; + ui->len = len; + aio_prep_req(req, blk->ectx, f, iov, nvecs, pos, opcode, ui); + + ret = aio_setup_iocb(req, 0); + if (unlikely(ret)) + goto out_put_req; + + spin_lock_irq(&ioctx->ctx_lock); + if (unlikely(ioctx->dead)) { + spin_unlock_irq(&ioctx->ctx_lock); + ret = -EINVAL; + goto out_put_req; + } + + aio_run_iocb(req); + if (!list_empty(&ioctx->run_list)) { + while (__aio_run_iocbs(ioctx)) + ; + } + spin_unlock_irq(&ioctx->ctx_lock); + + aio_put_req(req); + put_ioctx(blk->ioctx); + + return ret; + +out_put_req: + aio_put_req(req); + aio_put_req(req); +out: + put_ioctx(blk->ioctx); + return ret; +} + +static long vhost_blk_ioctl(struct file *f, unsigned int ioctl, + unsigned long arg) +{ + struct vhost_blk *blk = f->private_data; + struct vhost_vring_file backend; + u64 features = VHOST_BLK_FEATURES; + int ret = -EFAULT; + + switch (ioctl) { + case VHOST_NET_SET_BACKEND: + if(copy_from_user(&backend, (void __user *)arg, sizeof backend)) + break; + ret = blk_set_backend(blk, &backend); + break; + case VHOST_GET_FEATURES: + features = VHOST_BLK_FEATURES; + if (copy_to_user((void __user *)arg , &features, sizeof features)) + break; + ret = 0; + break; + case VHOST_SET_FEATURES: + if (copy_from_user(&features, (void __user *)arg, sizeof features)) + break; + if (features & ~VHOST_BLK_FEATURES) { + ret = -EOPNOTSUPP; + break; + } + ret = blk_set_features(blk, features); + break; + case VHOST_RESET_OWNER: + ret = blk_reset_owner(blk); + break; + default: + mutex_lock(&blk->dev.mutex); + ret = vhost_dev_ioctl(&blk->dev, ioctl, arg); + blk_flush(blk); + mutex_unlock(&blk->dev.mutex); + break; + } + return ret; +} + +#define BLK_HDR 0 +#define BLK_HDR_LEN 16 + +static inline int do_request(struct vhost_virtqueue *vq, struct virtio_blk_outhdr *hdr, + u64 nr_vecs, int head) +{ + struct file *f = vq->private_data; + struct vhost_blk *blk = container_of(vq->dev, struct vhost_blk, dev); + struct iovec *iov = &vq->iov[BLK_HDR + 1]; + loff_t pos = hdr->sector << 9; + int ret = 0, len = 0, status; + // int i; + + dprintk("sector %llu, num %lu, type %d\n", hdr->sector, iov->iov_len / 512, hdr->type); + //Guest virtio-blk driver dosen't use len currently. + //for (i = 0; i < nr_vecs; i++) { + // len += iov[i].iov_len; + //} + switch (hdr->type) { + case VIRTIO_BLK_T_OUT: + kernel_io_submit(blk, iov, nr_vecs, pos, IOCB_CMD_PWRITEV, head, len); + break; + case VIRTIO_BLK_T_IN: + kernel_io_submit(blk, iov, nr_vecs, pos, IOCB_CMD_PREADV, head, len); + break; + case VIRTIO_BLK_T_FLUSH: + ret = vfs_fsync(f, 1); + /* fall through */ + case VIRTIO_BLK_T_GET_ID: + status = ret < 0 ? VIRTIO_BLK_S_IOERR :VIRTIO_BLK_S_OK; + if ((vq->iov[nr_vecs + 1].iov_len != 1)) + BUG(); + + if (copy_to_user(vq->iov[nr_vecs + 1].iov_base, &status, sizeof status)) { + vq_err(vq, "%s failed to write status!\n", __func__); + vhost_discard_vq_desc(vq, 1); + ret = -EFAULT; + break; + } + + vhost_add_used_and_signal(&blk->dev, vq, head, ret); + break; + default: + pr_info("%s, unsupported request type %d\n", __func__, hdr->type); + vhost_discard_vq_desc(vq, 1); + ret = -EFAULT; + break; + } + return ret; +} + +static inline void handle_kick(struct vhost_blk *blk) +{ + struct vhost_virtqueue *vq = &blk->vq; + struct virtio_blk_outhdr hdr; + u64 nr_vecs; + int in, out, head; + struct blk_plug plug; + + mutex_lock(&vq->mutex); + vhost_disable_notify(&blk->dev, vq); + + blk_start_plug(&plug); + for (;;) { + head = vhost_get_vq_desc(&blk->dev, vq, vq->iov, + ARRAY_SIZE(vq->iov), + &out, &in, NULL, NULL); + /* No awailable descriptors from Guest? */ + if (head == vq->num) { + if (unlikely(vhost_enable_notify(&blk->dev, vq))) { + vhost_disable_notify(&blk->dev, vq); + continue; + } + break; + } + if (unlikely(head < 0)) + break; + + dprintk("head %d, in %d, out %d\n", head, in, out); + if(unlikely(vq->iov[BLK_HDR].iov_len != BLK_HDR_LEN)) { + vq_err(vq, "%s bad block header lengh!\n", __func__); + vhost_discard_vq_desc(vq, 1); + break; + } + + if (copy_from_user(&hdr, vq->iov[BLK_HDR].iov_base, sizeof hdr)) { + vq_err(vq, "%s failed to get block header!\n", __func__); + vhost_discard_vq_desc(vq, 1); + break; + } + + if (hdr.type == VIRTIO_BLK_T_IN || hdr.type == VIRTIO_BLK_T_GET_ID) + nr_vecs = in - 1; + else + nr_vecs = out - 1; + + if (do_request(vq, &hdr, nr_vecs, head) < 0) + break; + } + blk_finish_plug(&plug); + mutex_unlock(&vq->mutex); +} + +static void handle_guest_kick(struct vhost_work *work) +{ + struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, poll.work); + struct vhost_blk *blk = container_of(vq->dev, struct vhost_blk, dev); + handle_kick(blk); +} + +static void handle_completion(struct vhost_work* work) +{ + struct vhost_blk *blk = container_of(work, struct vhost_blk, poll.work); + struct vhost_virtqueue *vq = &blk->vq; + struct timespec ts = { 0 }; + int ret, i, nr; + u64 count; + + mutex_lock(&vq->mutex); + do { + ret = eventfd_ctx_read(blk->ectx, 1, &count); + } while (unlikely(ret == -ERESTARTSYS)); + + do { + nr = kernel_read_events(blk->ioctx, count, MAX_EVENTS, events, &ts); + } while (unlikely(nr == -EINTR)); + dprintk("%s, count %llu, nr %d\n", __func__, count, nr); + + if (unlikely(nr <= 0)) { + mutex_unlock(&vq->mutex); + return; + } + + for (i = 0; i < nr; i++) { + struct used_info *u = (struct used_info *)events[i].obj; + int len, status; + + dprintk("%s, head %d complete in %d\n", __func__, u->head, i); + len = io_event_ret(&events[i]); + //status = u->len == len ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR; + status = len > 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR; + if (copy_to_user(u->status, &status, sizeof status)) { + vq_err(&blk->vq, "%s failed to write status\n", __func__); + BUG(); /* FIXME: maybe a bit radical? */ + } + vhost_add_used(&blk->vq, u->head, u->len); + kmem_cache_free(used_info_cachep, u); + } + + vhost_signal(&blk->dev, &blk->vq); + mutex_unlock(&vq->mutex); +} + +static void eventfd_setup(struct vhost_blk *blk) +{ + blk->efile = eventfd_file_create(0, 0); + blk->ectx = eventfd_ctx_fileget(blk->efile); + vhost_poll_init(&blk->poll, handle_completion, POLLIN, &blk->dev); + vhost_poll_start(&blk->poll, blk->efile); +} + +static int vhost_blk_open(struct inode *inode, struct file *f) +{ + int ret = -ENOMEM; + struct vhost_blk *blk = kmalloc(sizeof *blk, GFP_KERNEL); + if (!blk) + goto err; + + blk->vq.handle_kick = handle_guest_kick; + ret = vhost_dev_init(&blk->dev, &blk->vq, virtqueue_max); + if (ret < 0) + goto err_init; + + ret = kernel_io_setup(MAX_EVENTS, &blk->ioctx); + if (ret < 0) + goto err_io_setup; + + eventfd_setup(blk); + f->private_data = blk; + used_info_cachep = KMEM_CACHE(used_info, SLAB_HWCACHE_ALIGN | SLAB_PANIC); + return ret; +err_init: +err_io_setup: + kfree(blk); +err: + return ret; +} + +static void eventfd_destroy(struct vhost_blk *blk) +{ + eventfd_ctx_put(blk->ectx); + fput(blk->efile); +} + +static int vhost_blk_release(struct inode *inode, struct file *f) +{ + struct vhost_blk *blk = f->private_data; + + blk_stop(blk); + blk_flush(blk); + vhost_dev_cleanup(&blk->dev); + /* Yet another flush? See comments in vhost_net_release() */ + blk_flush(blk); + eventfd_destroy(blk); + kfree(blk); + + return 0; +} + +const static struct file_operations vhost_blk_fops = { + .owner = THIS_MODULE, + .release = vhost_blk_release, + .open = vhost_blk_open, + .unlocked_ioctl = vhost_blk_ioctl, + .llseek = noop_llseek, +}; + + +static struct miscdevice vhost_blk_misc = { + 234, + "vhost-blk", + &vhost_blk_fops, +}; + +int vhost_blk_init(void) +{ + return misc_register(&vhost_blk_misc); +} +void vhost_blk_exit(void) +{ + misc_deregister(&vhost_blk_misc); +} + +module_init(vhost_blk_init); +module_exit(vhost_blk_exit); + +MODULE_VERSION("0.0.1"); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Liu Yuan"); +MODULE_DESCRIPTION("Host kernel accelerator for virtio_blk"); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 8e03379..9e17152 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -12,6 +12,7 @@ #include #include #include +#include struct vhost_device; @@ -174,6 +175,16 @@ enum { (1ULL << VHOST_F_LOG_ALL) | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF), + + VHOST_BLK_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | + (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | + (1ULL << VIRTIO_RING_F_EVENT_IDX) | + (1ULL << VIRTIO_BLK_F_SEG_MAX) | + (1ULL << VIRTIO_BLK_F_GEOMETRY) | + (1ULL << VIRTIO_BLK_F_TOPOLOGY) | + (1ULL << VIRTIO_BLK_F_SCSI) | + (1ULL << VIRTIO_BLK_F_BLK_SIZE), + }; static inline int vhost_has_feature(struct vhost_dev *dev, int bit) diff --git a/fs/aio.c b/fs/aio.c index e29ec48..534d396 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -215,7 +215,7 @@ static void ctx_rcu_free(struct rcu_head *head) * Called when the last user of an aio context has gone away, * and the struct needs to be freed. */ -static void __put_ioctx(struct kioctx *ctx) +void __put_ioctx(struct kioctx *ctx) { BUG_ON(ctx->reqs_active); @@ -227,29 +227,12 @@ static void __put_ioctx(struct kioctx *ctx) pr_debug("__put_ioctx: freeing %p\n", ctx); call_rcu(&ctx->rcu_head, ctx_rcu_free); } - -static inline void get_ioctx(struct kioctx *kioctx) -{ - BUG_ON(atomic_read(&kioctx->users) <= 0); - atomic_inc(&kioctx->users); -} - -static inline int try_get_ioctx(struct kioctx *kioctx) -{ - return atomic_inc_not_zero(&kioctx->users); -} - -static inline void put_ioctx(struct kioctx *kioctx) -{ - BUG_ON(atomic_read(&kioctx->users) <= 0); - if (unlikely(atomic_dec_and_test(&kioctx->users))) - __put_ioctx(kioctx); -} +EXPORT_SYMBOL(__put_ioctx); /* ioctx_alloc * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. */ -static struct kioctx *ioctx_alloc(unsigned nr_events) +struct kioctx *ioctx_alloc(unsigned nr_events) { struct mm_struct *mm; struct kioctx *ctx; @@ -327,6 +310,7 @@ out_freectx: dprintk("aio: error allocating ioctx %p\n", ctx); return ctx; } +EXPORT_SYMBOL(ioctx_alloc); /* aio_cancel_all * Cancels all outstanding aio requests on an aio context. Used @@ -437,7 +421,7 @@ void exit_aio(struct mm_struct *mm) * This prevents races between the aio code path referencing the * req (after submitting it) and aio_complete() freeing the req. */ -static struct kiocb *__aio_get_req(struct kioctx *ctx) +struct kiocb *__aio_get_req(struct kioctx *ctx) { struct kiocb *req = NULL; struct aio_ring *ring; @@ -480,7 +464,7 @@ static struct kiocb *__aio_get_req(struct kioctx *ctx) return req; } -static inline struct kiocb *aio_get_req(struct kioctx *ctx) +struct kiocb *aio_get_req(struct kioctx *ctx) { struct kiocb *req; /* Handle a potential starvation case -- should be exceedingly rare as @@ -494,6 +478,7 @@ static inline struct kiocb *aio_get_req(struct kioctx *ctx) } return req; } +EXPORT_SYMBOL(aio_get_req); static inline void really_put_req(struct kioctx *ctx, struct kiocb *req) { @@ -659,7 +644,7 @@ static inline int __queue_kicked_iocb(struct kiocb *iocb) * simplifies the coding of individual aio operations as * it avoids various potential races. */ -static ssize_t aio_run_iocb(struct kiocb *iocb) +ssize_t aio_run_iocb(struct kiocb *iocb) { struct kioctx *ctx = iocb->ki_ctx; ssize_t (*retry)(struct kiocb *); @@ -753,6 +738,7 @@ out: } return ret; } +EXPORT_SYMBOL(aio_run_iocb); /* * __aio_run_iocbs: @@ -761,7 +747,7 @@ out: * Assumes it is operating within the aio issuer's mm * context. */ -static int __aio_run_iocbs(struct kioctx *ctx) +int __aio_run_iocbs(struct kioctx *ctx) { struct kiocb *iocb; struct list_head run_list; @@ -784,6 +770,7 @@ static int __aio_run_iocbs(struct kioctx *ctx) return 1; return 0; } +EXPORT_SYMBOL(__aio_run_iocbs); static void aio_queue_work(struct kioctx * ctx) { @@ -1074,7 +1061,7 @@ static inline void clear_timeout(struct aio_timeout *to) del_singleshot_timer_sync(&to->timer); } -static int read_events(struct kioctx *ctx, +int read_events(struct kioctx *ctx, long min_nr, long nr, struct io_event __user *event, struct timespec __user *timeout) @@ -1190,11 +1177,12 @@ out: destroy_timer_on_stack(&to.timer); return i ? i : ret; } +EXPORT_SYMBOL(read_events); /* Take an ioctx and remove it from the list of ioctx's. Protects * against races with itself via ->dead. */ -static void io_destroy(struct kioctx *ioctx) +void io_destroy(struct kioctx *ioctx) { struct mm_struct *mm = current->mm; int was_dead; @@ -1221,6 +1209,7 @@ static void io_destroy(struct kioctx *ioctx) wake_up_all(&ioctx->wait); put_ioctx(ioctx); /* once for the lookup */ } +EXPORT_SYMBOL(io_destroy); /* sys_io_setup: * Create an aio_context capable of receiving at least nr_events. @@ -1423,7 +1412,7 @@ static ssize_t aio_setup_single_vector(struct kiocb *kiocb) * Performs the initial checks and aio retry method * setup for the kiocb at the time of io submission. */ -static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) +ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) { struct file *file = kiocb->ki_filp; ssize_t ret = 0; @@ -1513,6 +1502,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) return 0; } +EXPORT_SYMBOL(aio_setup_iocb); static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, struct iocb *iocb, bool compat) diff --git a/fs/eventfd.c b/fs/eventfd.c index d9a5917..6343bc9 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -406,6 +406,7 @@ struct file *eventfd_file_create(unsigned int count, int flags) return file; } +EXPORT_SYMBOL_GPL(eventfd_file_create); SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { diff --git a/include/linux/aio.h b/include/linux/aio.h index 7a8db41..d63bc04 100644 --- a/include/linux/aio.h +++ b/include/linux/aio.h @@ -214,6 +214,37 @@ struct mm_struct; extern void exit_aio(struct mm_struct *mm); extern long do_io_submit(aio_context_t ctx_id, long nr, struct iocb __user *__user *iocbpp, bool compat); +extern void __put_ioctx(struct kioctx *ctx); +extern struct kioctx *ioctx_alloc(unsigned nr_events); +extern struct kiocb *aio_get_req(struct kioctx *ctx); +extern ssize_t aio_run_iocb(struct kiocb *iocb); +extern int __aio_run_iocbs(struct kioctx *ctx); +extern int read_events(struct kioctx *ctx, + long min_nr, long nr, + struct io_event __user *event, + struct timespec __user *timeout); +extern void io_destroy(struct kioctx *ioctx); +extern ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat); +extern void __put_ioctx(struct kioctx *ctx); + +static inline void get_ioctx(struct kioctx *kioctx) +{ + BUG_ON(atomic_read(&kioctx->users) <= 0); + atomic_inc(&kioctx->users); +} + +static inline int try_get_ioctx(struct kioctx *kioctx) +{ + return atomic_inc_not_zero(&kioctx->users); +} + +static inline void put_ioctx(struct kioctx *kioctx) +{ + BUG_ON(atomic_read(&kioctx->users) <= 0); + if (unlikely(atomic_dec_and_test(&kioctx->users))) + __put_ioctx(kioctx); +} + #else static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; } static inline int aio_put_req(struct kiocb *iocb) { return 0; }