ejabberd cluster

Oct 142013

Time to play with HA ejabberd setup now.

It was said number of daily active user will be less than 1M, and IM is just a feature of the mobile product, and believe that people will spend less then 2 hours on it everyday, also people will use the App mainly during traffic hours. So it’s 2M hours span to 7~9am and 6~8pm, 4 hours. I would like to assume it is evenly distributed, this makes the concurrent online user … 500K, Not a big number, and since it is just a non-key feature, so activities will be limited.

So I think I’d go with a full replicated ejabberd cluster, “full” means I’m going to replicate everything – password, roster, offline message. I will check the stats of the production environment to see how the resource utilization is going, and tune the allocation whenever necessary.

Regarding the setup, pretty straightforward once I went through once (I did test on Ubuntu 12.04 LTS, but production eventually will be CentOS6):

1. setup node#1, change host (actually, it is domain for the XMPP service) in /etc/ejabberd/ejabberd.cfg, add mod_http_bind to modules (so to enable http-bind and http-poll), and change EJABBERD_NODE from ‘ejabberd’ to ‘ejabberd@`hostname -f`’ in /etc/default/ejabberd. You need to remove everything (except .erlang.cookie) under /var/lib/ejabberd then restart so to have the right mnesia DB.
2. setup node#2, same as node#1, before restart ejabberd, copy .erlang.cookie from node#1
3. run “ejabberdctl debug” as ejabberd, then:
FirstNode = 'ejabberd@node#1', mnesia:stop(), mnesia:delete_schema([node()]), mnesia:start(), mnesia:change_config(extra_db_nodes, [FirstNode]), mnesia:change_table_copy_type(schema, node(), disc_copies).
note that node#1 should be FQDN
4. CTRL-C twice to quit (q(). used to work, but I don’t know what changed to erlang which makes it no longer works), then restart ejabberd. Run “ejabberdctl status” and “ejabberd mnesia info” to make sure the service is properly running on this host, and it is has node#1 included in DB node
5. repeat 2~4 on all other nodes (I have 4 nodes in total)
6. after all set, you can go to http://node#1:5280/admin/ to make sure all nodes are in the cluster

Now … I believe it’s doable in ejabberdctl debug, but still could not figure out how. However, it’s easier to do it through web admin interface. I just change everything exactly the same as node#1 on all other nodes (RAM, disc, disc and RAM, etc). After that, do a reboot on all machines then use ejabberdctl status (or even with –nodes Node#N) to make sure everything can be up and running after a reboot.

I’m going to do LVS for XMPP port and nginx load balancer for 5280 (http-bind), will post result later.

5 Responses to “ejabberd cluster”

Hang says:

2013-10-15 at 10:22

CentOS6 is a little bit different (packages from epel) – /etc/default/ejabberd becomes /etc/ejabberd/ejabberdctl.cfg, and mnesia DB located at /var/lib/ejabberd/spool.
Hang says:

2013-10-16 at 15:44

Finally I decided to go with HAProxy which can do proxy works for all ports, and it’s easier to deployment than LVS (IPVS) and faster than Nginx.

I’m injecting users now, target is to get 5M registrants and I’m @ 1.5M after half a day (with ejabberdctl, got no time to dig into my own codes). Also, I picked SleekXMPP and python for my next step testing.
Hang says:

2013-10-28 at 17:45

Refer to this for changing table storage to local:

[{Tb, mnesia:add_table_copy(Tb, node(), Type)} || {Tb, [{‘NODE_NAME’, Type}]} <- [{T, mnesia:table_info(T, where_to_commit)} || T <- mnesia:system_info(tables)]]. What I was wrong is trying to do a change_table_type while table is not here yet. 🙂
Hang says:

2013-11-07 at 11:22

To remove a node from the cluster, first stop ejabberd on that node, then on all other nodes:

mnesia:del_table_copy(schema, ‘node_to_be_removed’).

Got it from http://www.blinkenlights.ch/ccms/linux/ejabberd.html
m88 says:

2015-01-19 at 05:48

What’s up, I log on to your blog daily. Your writing style is awesome,
keep it up!

Sorry, the comment form is closed at this time.