Nov 172011
 

I got 700K lines of apache log files from a friend’s web server and imported them to the testing Hadoop instance running on my MacBoox, following the instructions listed here I successfully run some analysis.

Note, the last section in the article talking about the Apache Weblog Data doesn’t seem to be correct – it lacks of some space (” “) after ^ and it gave me quite some headache since I’m not familiar with Java regular expressions. Luckily Hive issue 662 mentioned in the article gave me the correct regex to get things done.

It seems I can only learn to play with Hive/Hadoop cos Hadoop running on MacBook is still a single node installation which is … SLOW, but so far I’m fine with it as I don’t have high volume of data to be processed. As a reference, getting top accessed IPs (which I used to figure out potential abusers) took 83 seconds. The HSQL is simple, something like “select host, count(*) cc from apachelog group by host order by cc desc limit 10;”.

Hadoop is a richmen’s game, seems it only improve the performance whenever you have lots of nodes as it can well distributed tasks.

BTW, Hadoop: The Definitive Guide is a good book ๐Ÿ™‚ .

Nov 072011
 

Got an MacBook Pro 13″, but it’s an old model as the newest models are still pending on “regulator’s approval” in China, per Apple’s web site. I believe it’s the problem from the new Thunderbolt.

Mine has 8G memory so I can run 3 VMs (under VirtualBox, each has 1G memory) plus all applications from Microsoft Office (Word, Excel, PowerPoint, and Outlook), Safari, and several small utilities at the same time without any performance problem. For most time I only need to run 1 VM so it should be OK.

VirtualBox rolls, while Parallele Desktop and VMWare Fusion both suck. Alright I know I’m not familiar with either so VirtualBox gained my respect, but thinking of VirtualBox is free, this simply kills those two.

Installing Xcode was a nightmare – App Store in China didn’t work, at least for huge packages like Xcode (I installed some small stuffs without problem), finally I downloaded Xcode from Apple’s developer web site. There were not many to be installed as I can see, other than QQ and MPLayer. I’m trying to get Pidgin running now, but seems I have to run it under X11, and if this is the case I will move to Adium though I do want to have customized smileys. Maybe Adium got it already, need to check out.

Overall speaking, MBP is a nice laptop, it’s quiet, cool (against HOT MBP ppc I had 8 years ago). The only problem is that I have to remove my watch while working on it as chain of the watch will scratch the shell.

I ordered a keyboard screen and a bag for it, will get this week.