Playing with Gluster File System

Aug 052013

Working on glusterfs test environment as mentioned here, so far everything’s working with some headache.

have 4 nodes up and running and join into same pool, make all extra disks (sdb, sdc, sdd) XFS (fdisk then mkfs, so sdb mentioned below is actually sdb1)
create distributed-replicated volume gfs_v0 with gfs11:sdb+gfs12:sdb and gfs11:sdc+gfs12:sdc
mount gfs on all these 4 boxes (I don’t have dedicated client hosts …)
copied 16 450M~500M tgz files to gfs_v0, everything looked fine

untar two tgz files to gfs_v0, result in 256 directories, with ~1100 files in each directory, it was working smoothly, though the performance is not that great, but I think this is a known issue, or by design to behave like this
add gfs13:sdd+gfs14:sdd to gfs_v0, then launch rebalance, both worked as expected and rebalance is faster than I was expecting (compare with the untar in step #5, but I think glusterfs is actually writing the XFS disks so performance is better)
here comes the trouble case – since current layout does not meet my requirement as mentioned in planning page, I need to use gfs13:sdb replace gfs12:sdc, “replace-brick start” without problem, but “replace-brick status” told me “cannot commit on localhost” or something similar, it seems this is a known issue and lots of people got hit by this, but so far I haven’t found any fix yet
kick out a “replace-brick commit force” (maybe I did wrong in this step, let’s see), and vol info shows that the replacement was done, but obviously without data properly transferred
I think I should do a heal at that moment, but who knows, I launched rebalance, which seems ok and healing is ongoing as well, but after ~10 hours it is still on-going, so I stopped rebalance, and launched heal to make sure at least the volume is good enough to handle disaster. It was really fast so I guess most stuffs got fixed in the rebalance stage.
add three more pairs of bricks gfs11:sdd+gfs14:sdb, gfs12:sdc+gfs13:sdc, and gfs12:sdd+gfs14:sdc, and launched rebalance

I’m waiting for the last rebalance to be finished and then I’m going to create some disasters :D. Things like power off a node, power cycle a node, fail a disk, all these sort of stuffs and let’s see what happen then.

One Response to “Playing with Gluster File System”

Hang says:

2013-08-05 at 17:18

BTW, I was doing an untar while adding new bricks and obviously it breaks the untar process, I don’t quite understand the mechanism behind GLusterFS, but obviously admin should avoid adding/removing bricks during peak hours – it was said GlusterFS can expand/shrink on the fly, but not exactly.

I will dig around to see if there is anything mentioned that.

Sorry, the comment form is closed at this time.

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31