May 2012 – Flying Bug

MySQL was down for 2 days …

May 302012

It’s my fault not checking the status after upgrading the system, but obviously mysql upgrade on Ubuntu hit some problems and stopped in the middle leave the mysql totally inaccessible, which means, my site was down for two days.

Well, I was really busy on job related stuffs, dealing contextual algorithms while playing with network, but this is not the real cause – I do need a monitoring tool to make sure my site is up and running, though it is not that critical.

Chinese-English-mixed word in segmentation

Triviality No Responses »

May 242012

OK, if you don’t know what I’m saying, don’t bother to ask.

I’m using MMSEG as segmentation algorithms to extract Chinese terms from web pages, and then do some follow-up analysis. MMSEG is pretty good, easy to understand, fast, data driven, fast, small, … alright, lots of advantages. However, it does not handle Chinese-English-mixed word properly, as it just take out English (Latin) words and return it as token, thus you can never extract words like U盘 or T恤 (sorry for Chinese here, the first words is USB disk, and the second is T-Shirt).

I got suggestion from a colleague that treat Latin words as Chinese character, I found this solution was really simple, neat, and easy to implement. Within ~30 minutes I finished code change and unit test, it works just fine :). In the coming days I need to solve words like “C++” as it was not covered by current implementation.

People are saying Yahoo is not handling C++/C# properly, while Bing is not dealing with U盘 right. Actually this sound weird to me as supposedly they both use the same web search system …

Frastrated

Triviality No Responses »

May 242012

Zune, or Windows Phone make me mad.

There is no simple way to drag and drop the photos in the phone to a PC, and the phone cannot act like an USB drive, I mean there may be some way to make it but definitely not easy and direct. I searched online and found Library/Pictures/<Device Name> is the place …

I also tried to enable Internet Sharing so that I can upload those photos to Windows Live site and then access from there, but I failed because Internet sharing cannot be enabled without mobile data connection enabled. What H***? I have wifi access, why bother to ask me turn on expensive mobile data network?

Actually I know it may not be all Windows Phone’s problem. I’m not in good mode in the past several days as frastrated by all sort of technical support kind of tasks, wish things can be better in the coming days.

Performance of EC2 nodes

Triviality No Responses »

May 212012

I was running crawler-like application on my EC2 nodes, which grab a set of web pages, save them locally, then do some follow-ups, obviously I hit the bottleneck of the EC2 nodes.

It seems EC2’s storage, at least EBS performs quite bad, I don’t have comparison in number, but for the follow-ups jobs, which are nothing to do with network, EC2 is hundreds or thousands times slower than a regular modern machine.

I guess traditional applications on EC2 don’t rely on EBS too much – most data write to S3/RDS/SDB, etc. However, I do have to pay more attention on this as there is a friend gets to run some critical jobs and I recommended EC2 … I will do some more tests and post result here if I find anything signaficant.

Jumping among programming languages

Triviality 1 Response »

May 152012

Freaky!

Here what I have done in the past several days:

Majority of my time was spent on tokenizer/segmentation stuffs, which is purely C/C++
I used PHP as the glue to try out my segmentation codes so I do various PHP stuffs
I need to collect some data for other people test out the result so obviously I need to deal with Hive, and HQL
I have to create different adhoc shell scripts while dealing with above
Someone asked for a Python extension for the segmentation library so I also spent (little) time on Python codes
I was reading a C# program, trying to translate it to C++, I cannot believe I made it and I have to say, C# is some sort of “advanced Visual Basic” …
Again lots of segmenation algorithms are implemented in Java (thanks Lucene/Nutch/Solr) so occasionally I need to read Java codes and use them as reference to my C++ programs, most time for edge cases

Actually it was fun other than sometime I misput $ in front of a variable in a C++ program, or use ; to end the statement in python, and got confused with C++ containers and Java containers …

Get to re-organize home network

Triviality 1 Response »

May 022012

For getting better network connection quality, I’m going to flash a Cisco Linksys E1000 to DD-WRT and then use it as a repeater, will get this done tomorrow after somebody from Comcast cabling my house.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31