After creating python binding for HBase thrift interfase, python modules should be put to /Library/Python/2.7/site-packages/ (2.7 is default for OSX Lion).
Just a note.
After creating python binding for HBase thrift interfase, python modules should be put to /Library/Python/2.7/site-packages/ (2.7 is default for OSX Lion).
Just a note.
Just installed HBase on my Hadoop node, now I have a runnable HBase instance.
There were some issues and I want to list things clearly:
Moved to new office last week, now I’m away from my team, but … I’m leaving soon so doesn’t matter too much.
Wifi connection in the office is pretty bad, bad, bad. I’m using wired network for my working laptop and wifi for MBP … I almost lost connection every 30 minutes on my MBP.
Also it’s cold here ๐ .
I believe this is the first time in my life that my team members are working hard on something even during the weekend, but I’m doing nothing.
Feeling weird, but I still believe leave them “as-is” is the best way to all of us.
I got 700K lines of apache log files from a friend’s web server and imported them to the testing Hadoop instance running on my MacBoox, following the instructions listed here I successfully run some analysis.
Note, the last section in the article talking about the Apache Weblog Data doesn’t seem to be correct – it lacks of some space (” “) after ^ and it gave me quite some headache since I’m not familiar with Java regular expressions. Luckily Hive issue 662 mentioned in the article gave me the correct regex to get things done.
It seems I can only learn to play with Hive/Hadoop cos Hadoop running on MacBook is still a single node installation which is … SLOW, but so far I’m fine with it as I don’t have high volume of data to be processed. As a reference, getting top accessed IPs (which I used to figure out potential abusers) took 83 seconds. The HSQL is simple, something like “select host, count(*) cc from apachelog group by host order by cc desc limit 10;”.
Hadoop is a richmen’s game, seems it only improve the performance whenever you have lots of nodes as it can well distributed tasks.
BTW, Hadoop: The Definitive Guide is a good book ๐ .
I need to go to bookstore tomorrow for:
I think I’m going to spend 200~300 RMB to get 3~4 books, wish me luck.
Got an MacBook Pro 13″, but it’s an old model as the newest models are still pending on “regulator’s approval” in China, per Apple’s web site. I believe it’s the problem from the new Thunderbolt.
Mine has 8G memory so I can run 3 VMs (under VirtualBox, each has 1G memory) plus all applications from Microsoft Office (Word, Excel, PowerPoint, and Outlook), Safari, and several small utilities at the same time without any performance problem. For most time I only need to run 1 VM so it should be OK.
VirtualBox rolls, while Parallele Desktop and VMWare Fusion both suck. Alright I know I’m not familiar with either so VirtualBox gained my respect, but thinking of VirtualBox is free, this simply kills those two.
Installing Xcode was a nightmare – App Store in China didn’t work, at least for huge packages like Xcode (I installed some small stuffs without problem), finally I downloaded Xcode from Apple’s developer web site. There were not many to be installed as I can see, other than QQ and MPLayer. I’m trying to get Pidgin running now, but seems I have to run it under X11, and if this is the case I will move to Adium though I do want to have customized smileys. Maybe Adium got it already, need to check out.
Overall speaking, MBP is a nice laptop, it’s quiet, cool (against HOT MBP ppc I had 8 years ago). The only problem is that I have to remove my watch while working on it as chain of the watch will scratch the shell.
I ordered a keyboard screen and a bag for it, will get this week.
Hadoop time, trying to setup a 6-nodes cluster and once it works will play with codes for some while.