2009-05-05 – Flying Bug

Now I believe C++ guys are definitely anti-unix style.

There is a data preparation program that takes way too long to finish – the data copying will take 24~30 hours, pre-process will take 8~12 hours, and final stage will take ~4 hours on 30 nodes. Since we lost some nodes recently, I started to look into the whole process to see how we can fit it into 2~3 machines.

And then I found they used to use only one rsync copy thousand files from west coast to east coast, no wonder it takes that long. I wrapped up a simple shell script and launch 20 rsyncs at the same time, now copying data takes only 2 hours. Thinking of the bandwidth is the bottleneck, launching 20 processes on one machine will not overload the system (sure, the load number looks ugly, but it’s just a number), I can still do ls/top/iostat etc without signaficant latency.

Then I think I should be able to improve performance of pre-process, and this is where comes the complaints to C++ guys. unix culture is building up bunch of small tools, each just takes care of one task, and using pipe/file to exchange data, so that user can easily glue them together with shell script to fit the reality. Pity the pre-process was done with C++, and it … has EVERYTHING in a single program, without multi-processing, so with all those hundreds G of data, we can only run single process even we have an 8 core/8G machine with RAID disks.

And because of it is “all-in-one” so it’s really hard to split it into different small tasks and run it in parallel, luckily I found the data (prepared by another team) is doing great job, they partitioned data and put nice hint in some meta files (how many files, fields in each file, etc) and the data file itself (“what’s the next file”). So I can fake a dummy “next file” and then launch the pre-process program so that it deal with one file only, and then I can launch multiple processes.

The improvement is not quite impressive – now I can finish pre-process in about 4 hours, but anyway it’s 50% shorter so I can live with it until someday I get time to rewrite the program.

It will be tough to deal with the final step, it is a memory consuming application that we never made more than 1 instance running on same box, though we have enough memory. The kernel always panic after a while (within an hour) if we launch more than 2 instances, I should ask kernel team look into it. However, in the worst case I can just run it on 3 machines, and the time will be (linear) 4 hours * 30 nodes / 3 nodes = 40 hours. Overall, I can finish the whole process on 3 nodes in 2 days, I would like to say it’s a nice move comparing with previous 30 nodes in 2 days.

I will post the result of last stage here if there is anything impressive.

By the way, I was working on ASM/C, then C++, then VB/Delphi, then Perl/PHP, then shell, does it mean I’m upgraded, or downgraded? I haven’t figured it out …

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Flying Bug

Unix culture

313 now