Oct 312012
 

Need to rewrite the function to count Chinese words in UTF8 once I have time, and need to refer to this page which contains UTF8 characters.

In short, here are Chinese related code blocks:

  • U+2E80 … U+2EFF: CJK Radicals Supplement
  • U+3000 … U+303F: CJK Symbols and Punctuation
  • U+31C0 … U+31EF: CJK Strokes
  • U+3200 … U+32FF: Enclosed CJK Letters and Months
  • U+3300 … U+33FF: CJK Compatibility
  • U+3400 … U+4DBF: CJK Unified Ideographs Extension A
  • U+4E00 … U+9FFF: CJK Unified Ideographs
  • U+F900 … U+FAFF: CJK Compatibility Ideographs
  • U+FE30 … U+FE4F: CJK Compatibility Forms
  • U+20000 … U+2A6DF: CJK Unified Ideographs Extension B
  • U+2A700 … U+2B73F: CJK Unified Ideographs Extension C
  • U+2B740 … U+2B81F: CJK Unified Ideographs Extension D
  • U+2F800 … U+2FA1F: CJK Compatibility Ideographs Supplement

For rest blocks … let’s simply take them as western characters, though they are not.

Oct 262012
 

Here is the to-do list for the mail/IM setup:

  1. IM message archived to mysql, but need to compose session to mail and drop to mail box to make backup more reliable (mail will be automatically copy to some gmail account)
  2. mail alert is not done, current stage is that I need to determine the right XMPP message type to use, logic-wise I’ve done the design
  3. need a quick design on mail seaerch
Oct 192012
 

I setup a mail server and an IM server, again … just for whatever reason. It’s fun though.

I’m still with openldap, postfix, mysql, ejabberd, but use dovecot replaces courier for imap, amavisd integrated with spamassassin and clamav to replace previous spamassassin-only system, and roundcube replaces squirrelmail. Things are more or less easier to setup.

There are two things left, one is that need to direct spams to spam folder, this was done by maildrop but since I’m away from courier, procmail may be a more reasonable choice but still need to evaluate. The other thing is that I want to send all incoming and outgoing mails to some other gmail accounts so that I can keep a copy of everything, but I haven’t decided what’s the better approach for this. Also, if it is possible to get XMPP messages backup somewhere, that will be great.

I was thinking of building up a search feature for this mail system, I haven’t got exact design done yet, but some features are in my mind: close to realtime (tens of seconds latency), attachment friendly (dig into attachments to find context), and scriptable, i.e. core engine can be (or have to be) C/C++, but lots of external stuffs should be able to be done in PHP/Perl, etc.

Let’s see.

Oct 052012
 

A book table like this:

...
PRIMARY KEY (`bookid`),
KEY `authorid` (`authorid`)
...

and the query is something like this:

SELECT * FROM book WHERE authorid = 123

but MySQL refused to use authorid index, instead, it did a full table scan, how stupid!

However, later on it turned to be how stupid I AM. The authorid column was, by whatever reason, defined as varchar instead of int, so MySQL decided not to use the index.

Anyway, I was stupid, but MySQL is stubborn.