Message ID | 20130506091822.GA26022@upset.ux.pdb.fsc.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2013/5/6 Andreas Friedrich <andreas.friedrich@ts.fujitsu.com>: > To enable the LD_PRELOAD mechanism for the Ceph daemons only, a little > generic extension in the global section of /etc/ceph/ceph.conf would > be helpful, e.g.: > > [global] > environment = LD_PRELOAD=/usr/lib64/libsdp.so.1 > > The appending patch adds 5 lines in the Bobtail (0.56.6) init script. > The init script will then read the environment setting and - if present - > call the Ceph daemons with the preceding environment string. Cool! We are planning the same infrastructure with IB as networking. Could you share more details about this? - which performance are you having and on which hardware? - any issue with IB and SDP? - are you able to use the newer rsockets and not SDP? It also has a preloader library and is still developed (SDP is deprecated) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/06/2013 11:18 AM, Andreas Friedrich wrote: > Hello, > > we are using Infiniband instead of Ethernet for cluster interconnection. > Instead of IPoIB (IP-over-InfiniBand Protocol) we want to use SDP > (Sockets Direct Protocol) as a mid layer protocol. > > To connect the Ceph daemons to SDP without changing the Ceph code, the > LD_PRELOAD mechanism can be used. > > To enable the LD_PRELOAD mechanism for the Ceph daemons only, a little > generic extension in the global section of /etc/ceph/ceph.conf would > be helpful, e.g.: > > [global] > environment = LD_PRELOAD=/usr/lib64/libsdp.so.1 > > The appending patch adds 5 lines in the Bobtail (0.56.6) init script. > The init script will then read the environment setting and - if present - > call the Ceph daemons with the preceding environment string. > You can also submit this via a Pull request on Github. That way the author stays preserved and you get all the credits for this patch :) > With best regards > Andreas Friedrich > ---------------------------------------------------------------------- > FUJITSU > Fujitsu Technology Solutions GmbH > Heinz-Nixdorf-Ring 1, 33106 Paderborn, Germany > Tel: +49 (5251) 525-1512 > Fax: +49 (5251) 525-321512 > Email: andreas.friedrich@ts.fujitsu.com > Web: ts.fujitsu.com > Company details: de.ts.fujitsu.com/imprint > ---------------------------------------------------------------------- >
On 05/06/2013 07:36 AM, Gandalf Corvotempesta wrote: > 2013/5/6 Andreas Friedrich <andreas.friedrich@ts.fujitsu.com>: >> To enable the LD_PRELOAD mechanism for the Ceph daemons only, a little >> generic extension in the global section of /etc/ceph/ceph.conf would >> be helpful, e.g.: >> >> [global] >> environment = LD_PRELOAD=/usr/lib64/libsdp.so.1 >> >> The appending patch adds 5 lines in the Bobtail (0.56.6) init script. >> The init script will then read the environment setting and - if present - >> call the Ceph daemons with the preceding environment string. > > Cool! We are planning the same infrastructure with IB as networking. > Could you share more details about this? > - which performance are you having and on which hardware? > - any issue with IB and SDP? > - are you able to use the newer rsockets and not SDP? It also has a > preloader library and is still developed (SDP is deprecated) Yeah, that's more or less the conclusion I came to as well. With SDP being deprecated, rsockets is looking like an attractive potential alternative. I'll let Sage or someone comment on the patch though. It would be very interesting to hear how SDP does. With IPoIB I've gotten about 2GB/s on QDR with Ceph, which is roughly also what I can get in an ideal round-robin setup with 2 bonded 10GbE links. Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2013/5/6 Mark Nelson <mark.nelson@inktank.com>: > It would be very interesting to hear how SDP does. With IPoIB I've gotten > about 2GB/s on QDR with Ceph, which is roughly also what I can get in an > ideal round-robin setup with 2 bonded 10GbE links. Yes, but IB costs 1/4 than 10GbE and will be much expandible in future. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/06/2013 08:35 AM, Gandalf Corvotempesta wrote: > 2013/5/6 Mark Nelson <mark.nelson@inktank.com>: >> It would be very interesting to hear how SDP does. With IPoIB I've gotten >> about 2GB/s on QDR with Ceph, which is roughly also what I can get in an >> ideal round-robin setup with 2 bonded 10GbE links. > > Yes, but IB costs 1/4 than 10GbE and will be much expandible in future. > QDR shouldn't be that much cheaper. Maybe SDR or DDR. But I agree with your general sentiment. I think rsockets may be a really good benefit/cost solution for the short term to get IB support into Ceph. It sounds like there is some work planned for a kernel implementation which would be fantastic on the file system side as well. Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi,
> Yes, but IB costs 1/4 than 10GbE and will be much expandible in future.
I'm testing Ceph currently with bonded 1 GbE links and contemplating
moving to 10 GbE or IB. I have to pay for the costs out of my own
pocket, so price is a major factor.
I have noted that just recently it has become possible to buy 10 GbE
switches for a lot less than before. I don't know what IB equipment
costs, mainly because I don't know much about IB and hence don't know
which equipment to buy.
My costs for a cobber-based 10 GbE setup per server would be approximate:
Switch: 115$
NIC: 345$
Cable: 2$
Total: 462$ (per server)
If anyone could comment on how that compares to IB pricing, I would
appreciate it.
Is it possible to quantify how much Ceph would benefit of the improved
latency between 1 GbE and 10 GbE? (i.e. assuming that 1 GbE gives me
enough bandwidth, would I see any gains from lower latency?)
And similarly, would there be a significant gain from lower latency when
comparing 10 GbE to IB? (I assume that IB has lower latency than 10 GbE)
2013/5/6 Jens Kristian Søgaard <jens@mermaidconsulting.dk>: > My costs for a cobber-based 10 GbE setup per server would be approximate: > > Switch: 115$ > NIC: 345$ > Cable: 2$ 115$ for a 10GB switch? Which kind of switch? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi again,
> 115$ for a 10GB switch? Which kind of switch?
No no - not 115$ for a switch. The cost was per port!
So an 8 port switch would be approx. 920$ USD.
I'm looking at just a bare-bones switchs that does VLANs, jumbo frames
and port trunking. The network would be used exclusively for Ceph.
2013/5/6 Jens Kristian Søgaard <jens@mermaidconsulting.dk>: > So an 8 port switch would be approx. 920$ USD. > > I'm looking at just a bare-bones switchs that does VLANs, jumbo frames and > port trunking. The network would be used exclusively for Ceph. You should also consider 10GbE for the public network, and there you should need more features. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, >> I'm looking at just a bare-bones switchs that does VLANs, jumbo frames and >> port trunking. The network would be used exclusively for Ceph. > You should also consider 10GbE for the public network, and there you > should need more features. I was actually going to put both the public and private network on the same 10 GbE. Why do you think I need more features? My public and private networks are completely dedicated to Ceph - so nothing else takes place on them. I have a third network which is just plain 1 GbE which handles all other communication between the servers (and the Internet).
2013/5/6 Jens Kristian Søgaard <jens@mermaidconsulting.dk>: > I was actually going to put both the public and private network on the same > 10 GbE. Why do you think I need more features? I'm just supposing, i'm also evaluating the same network topology like you. There is a lowcost. 12x 10GbaseT switch from Netgear. I'm also evaluating to create a full IB network dedicated to ceph (IB on cluster network and IB to public network). Do you think will be possible to use all ceph services (RBD, RGW, CephFS, QEMU, ...) via SDP/rsockets ? On ebay there are many IB switches sold at 1000-1500$ with 24 or 36 DDR ports. If used with SDP/rsockets you should archieve more or less 18Gbps -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, > I'm just supposing, i'm also evaluating the same network topology > like you. I have tested my current setup for a while with just triple-bonded GbE, and haven't found the need for more features. I might be wrong ofcourse :-) > There is a lowcost. 12x 10GbaseT switch from Netgear. I was looking at the Netgear XS708E which is very low cost compared to what the prices were 6 months ago when I looked at it last. What I don't know is how those prices compare to IB pricing. > on cluster network and IB to public network). Do you think will be > possible to use all ceph services (RBD, RGW, CephFS, QEMU, ...) via > SDP/rsockets ? I haven't tried SDP/rsocket, so I don't know. From what I have experienced in the past, these things doesn't seem to be completely "drop in and forget about it" - there's always something that is not completely compatible with ordinary TCP/IP sockets. > On ebay there are many IB switches sold at 1000-1500$ with 24 or 36 > DDR ports. If used with SDP/rsockets you should archieve more or less > 18Gbps That sounds cheap yes, but I assume that is used equipment? I was looking at costs for new equipment.
On 05/06/2013 09:18 AM, Jens Kristian Søgaard wrote: > Hi, > >> I'm just supposing, i'm also evaluating the same network topology >> like you. > > I have tested my current setup for a while with just triple-bonded GbE, > and haven't found the need for more features. I might be wrong ofcourse :-) > >> There is a lowcost. 12x 10GbaseT switch from Netgear. > > I was looking at the Netgear XS708E which is very low cost compared to > what the prices were 6 months ago when I looked at it last. I'm really not up to speed on the capabilities of cheap 10GbE switches. I'm sure other people will have comments about what features are worthwhile, etc. Just from a raw performance perspective, you might want to make sure that the switch can handle lots of randomly distributed traffic between all of the ports well. I'd expect that it shouldn't be too terrible, but who knows. I expect most IB switches you run into should deal with this kind of pattern competently enough for Ceph workloads to do OK. On the front end portion of the network you'll always have client<->server communication, so the pattern there will be less all-to-all than the backend traffic (more 1st half <-> 2nd half). On really big deployments, static routing becomes a pretty big issue. Dynamic routing, or at least well optimized routes can make a huge difference. ORNL has done some work in this area for their Lustre IB networks and I think LLNL is investigating it as well. Mark > > What I don't know is how those prices compare to IB pricing. > >> on cluster network and IB to public network). Do you think will be >> possible to use all ceph services (RBD, RGW, CephFS, QEMU, ...) via >> SDP/rsockets ? > > I haven't tried SDP/rsocket, so I don't know. From what I have > experienced in the past, these things doesn't seem to be completely > "drop in and forget about it" - there's always something that is not > completely compatible with ordinary TCP/IP sockets. > >> On ebay there are many IB switches sold at 1000-1500$ with 24 or 36 >> DDR ports. If used with SDP/rsockets you should archieve more or less >> 18Gbps > > That sounds cheap yes, but I assume that is used equipment? > I was looking at costs for new equipment. > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, > Just from a raw performance perspective, you might > want to make sure that the switch can handle lots of randomly > distributed traffic between all of the ports well. I'd expect that it > shouldn't be too terrible, but who knows. It has been many years since I've seen a switch have problems with that. This switch has a 160 Gb/s backplane, so I don't foresee any problems. Especially considering that I'm mixing the public and private networks on the same switch. > On really big deployments, static routing becomes a pretty big issue. > Dynamic routing, or at least well optimized routes can make a huge I use dedicated routers for routing, the switch is just a "dumb" L2-switch. I'm looking at a very small deployment (probably 8 servers).
On 05/06/2013 10:16 AM, Jens Kristian Søgaard wrote: > Hi, > >> Just from a raw performance perspective, you might >> want to make sure that the switch can handle lots of randomly >> distributed traffic between all of the ports well. I'd expect that it >> shouldn't be too terrible, but who knows. > > It has been many years since I've seen a switch have problems with that. > This switch has a 160 Gb/s backplane, so I don't foresee any problems. > > Especially considering that I'm mixing the public and private networks > on the same switch. > >> On really big deployments, static routing becomes a pretty big issue. >> Dynamic routing, or at least well optimized routes can make a huge > > I use dedicated routers for routing, the switch is just a "dumb" L2-switch. > > I'm looking at a very small deployment (probably 8 servers). Ha, ok. I suspect you'll be fine. Nothing like over-thinking the problem right? :) > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 06, 2013 at 02:36:39PM +0200, Gandalf Corvotempesta wrote: > 2013/5/6 Andreas Friedrich <andreas.friedrich@ts.fujitsu.com>: > > To enable the LD_PRELOAD mechanism for the Ceph daemons only, a little > > generic extension in the global section of /etc/ceph/ceph.conf would > > be helpful, e.g.: > > > > [global] > > environment = LD_PRELOAD=/usr/lib64/libsdp.so.1 > > > > The appending patch adds 5 lines in the Bobtail (0.56.6) init script. > > The init script will then read the environment setting and - if present - > > call the Ceph daemons with the preceding environment string. > > Cool! We are planning the same infrastructure with IB as networking. > Could you share more details about this? > - which performance are you having and on which hardware? 6x Storage-Server, each with 3x Intel-910 PCIe-SSD (OSD-xfs), Journal in 1GB RAM-Disk per OSD plus 1x Client-server, each Server with 10GbE, Intel-QDR as well as MLX-FDR The performance values are still scattering ... (1 fio client job, 128 QD, 1TB of data, 60 seconds) e.g. - read_4m_128, read_8m_128, randread_4m_128, randread_8m_128 approx. 2.2 GB/s on 56 GbIPoIB_CM and 56 GbSDP - write_4m_128, write_8m_128, randwrite_4m_128, randwrite_8m_128 approx. 500 MB/s on 56 GbIPoIB_CM and 56 GbSDP, but 900 MS/s on 40 GbIPoIB_CM > - any issue with IB and SDP? Well, you have to compile the complete OFED stack (ib_core, ... ib_sdp) from one vendor. MLX is running, the Intel status is 'wip' ... the cluster is up, but jerking ... (OSDs are getting out/in ... timeouts) looks like a shortage on ressource ?! > - are you able to use the newer rsockets and not SDP? It also has a > preloader library and is still developed (SDP is deprecated) rsockets.so is up, but stuttering ... ... and for the moment rsockets is UserLand only, in contrast to sdp.ko Cheers, -Dieter > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2013/5/6 Mark Nelson <mark.nelson@inktank.com>: > On the front end > portion of the network you'll always have client<->server communication, so > the pattern there will be less all-to-all than the backend traffic (more 1st > half <-> 2nd half). What do you suggest for the frontend portion? 10GbE or 2Gb (2x 1GbE bonded toghether) should be enough in case of RBD with Qemu? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/src/init-ceph.in 2013-05-03 21:31:07.000000000 +0200 +++ b/src/init-ceph.in 2013-05-06 10:56:56.000000000 +0200 @@ -212,6 +212,12 @@ # conf file cmd="$cmd -c $conf" + environment="" + get_conf environment '' 'environment' + if [ ! -z "$environment" ]; then + cmd="env $environment $cmd" + fi + if echo $name | grep -q ^osd; then get_conf osd_data "/var/lib/ceph/osd/ceph-$id" "osd data" get_conf fs_path "$osd_data" "fs path" # mount point defaults so osd data