Aug 052013
 

Working on glusterfs test environment as mentioned here, so far everything’s working with some headache.

  1. have 4 nodes up and running and join into same pool, make all extra disks (sdb, sdc, sdd) XFS (fdisk then mkfs, so sdb mentioned below is actually sdb1)
  2. create distributed-replicated volume gfs_v0 with gfs11:sdb+gfs12:sdb and gfs11:sdc+gfs12:sdc
  3. mount gfs on all these 4 boxes (I don’t have dedicated client hosts …)
  4. copied 16 450M~500M tgz files to gfs_v0, everything looked fine
  5. untar two tgz files to gfs_v0, result in 256 directories, with ~1100 files in each directory, it was working smoothly, though the performance is not that great, but I think this is a known issue, or by design to behave like this
  6. add gfs13:sdd+gfs14:sdd to gfs_v0, then launch rebalance, both worked as expected and rebalance is faster than I was expecting (compare with the untar in step #5, but I think glusterfs is actually writing the XFS disks so performance is better)
  7. here comes the trouble case – since current layout does not meet my requirement as mentioned in planning page, I need to use gfs13:sdb replace gfs12:sdc, “replace-brick start” without problem, but “replace-brick status” told me “cannot commit on localhost” or something similar, it seems this is a known issue and lots of people got hit by this, but so far I haven’t found any fix yet
  8. kick out a “replace-brick commit force” (maybe I did wrong in this step, let’s see), and vol info shows that the replacement was done, but obviously without data properly transferred
  9. I think I should do a heal at that moment, but who knows, I launched rebalance, which seems ok and healing is ongoing as well, but after ~10 hours it is still on-going, so I stopped rebalance, and launched heal to make sure at least the volume is good enough to handle disaster. It was really fast so I guess most stuffs got fixed in the rebalance stage.
  10. add three more pairs of bricks gfs11:sdd+gfs14:sdb, gfs12:sdc+gfs13:sdc, and gfs12:sdd+gfs14:sdc, and launched rebalance

I’m waiting for the last rebalance to be finished and then I’m going to create some disasters :D. Things like power off a node, power cycle a node, fail a disk, all these sort of stuffs and let’s see what happen then.

  One Response to “Playing with Gluster File System”

  1. BTW, I was doing an untar while adding new bricks and obviously it breaks the untar process, I don’t quite understand the mechanism behind GLusterFS, but obviously admin should avoid adding/removing bricks during peak hours – it was said GlusterFS can expand/shrink on the fly, but not exactly.

    I will dig around to see if there is anything mentioned that.

Sorry, the comment form is closed at this time.