January – 2009 – 后端技术 by Tim Yang

Archive for January, 2009

使用Google App Engine写的农历日历

Saturday, Jan 17th, 2009 by Tim | 3 Comments
Filed under: 编程

我的手机没有农历日历，想搜索一个WAP版的方便随时查看，但是一直以来都没看到合适的。刚好搜索到一个免费的python源代码。于是就把它移植和改造了一下，使它可以运行在 Google App Engine的环境，并增加了 Web 交互功能。Google App Engine以前也没正式用过，对一个没写过Python程序的眼光看来，感觉还是比较容易使用。

开发及改造过程

先花了几分钟看了文档中的 Hello World并跑起来，将找到的Python的农历代码替换了 Hello World 原来的程序，未料却碰到中文问题报错：

<type ‘exceptions.UnicodeEncodeError’>: ‘ascii’ codec can’t encode characters in position 1-2: ordinal not in range(128)

问了一下arbow, 发现将 u”中文” 改成 “中文”就搞定，原来是自己过度优化。然后把原来程序中的 print 改成了 IoString.write, 方便 response 输出。并增加向前和向后翻页功能，每一页是一月。App Engine 中的request, response 类似JSP中的request, response，所以很容易理解。

在本地调试通过，上传到服务器。用手机访问，OK

源码

# coding=utf-8
# Chinese Calendar for App Engine
# author Tim: iso1600 (at) gmail (dot) com, the Chinese calendar code searched from Internet.
from google.appengine.api import users
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app

g_lunar_month_day = [
    0x4ae0, 0xa570, 0x5268, 0xd260, 0xd950, 0x6aa8, 0x56a0, 0x9ad0, 0x4ae8, 0x4ae0,   #1910
    0xa4d8, 0xa4d0, 0xd250, 0xd548, 0xb550, 0x56a0, 0x96d0, 0x95b0, 0x49b8, 0x49b0,   #1920
    0xa4b0, 0xb258, 0x6a50, 0x6d40, 0xada8, 0x2b60, 0x9570, 0x4978, 0x4970, 0x64b0,   #1930
    0xd4a0, 0xea50, 0x6d48, 0x5ad0, 0x2b60, 0x9370, 0x92e0, 0xc968, 0xc950, 0xd4a0,   #1940
    0xda50, 0xb550, 0x56a0, 0xaad8, 0x25d0, 0x92d0, 0xc958, 0xa950, 0xb4a8, 0x6ca0,   #1950
    0xb550, 0x55a8, 0x4da0, 0xa5b0, 0x52b8, 0x52b0, 0xa950, 0xe950, 0x6aa0, 0xad50,   #1960
    0xab50, 0x4b60, 0xa570, 0xa570, 0x5260, 0xe930, 0xd950, 0x5aa8, 0x56a0, 0x96d0,   #1970
    0x4ae8, 0x4ad0, 0xa4d0, 0xd268, 0xd250, 0xd528, 0xb540, 0xb6a0, 0x96d0, 0x95b0,   #1980
    0x49b0, 0xa4b8, 0xa4b0, 0xb258, 0x6a50, 0x6d40, 0xada0, 0xab60, 0x9370, 0x4978,   #1990
    0x4970, 0x64b0, 0x6a50, 0xea50, 0x6b28, 0x5ac0, 0xab60, 0x9368, 0x92e0, 0xc960,   #2000
    0xd4a8, 0xd4a0, 0xda50, 0x5aa8, 0x56a0, 0xaad8, 0x25d0, 0x92d0, 0xc958, 0xa950,   #2010
    0xb4a0, 0xb550, 0xb550, 0x55a8, 0x4ba0, 0xa5b0, 0x52b8, 0x52b0, 0xa930, 0x74a8,   #2020
    0x6aa0, 0xad50, 0x4da8, 0x4b60, 0x9570, 0xa4e0, 0xd260, 0xe930, 0xd530, 0x5aa0,   #2030
    0x6b50, 0x96d0, 0x4ae8, 0x4ad0, 0xa4d0, 0xd258, 0xd250, 0xd520, 0xdaa0, 0xb5a0,   #2040
    0x56d0, 0x4ad8, 0x49b0, 0xa4b8, 0xa4b0, 0xaa50, 0xb528, 0x6d20, 0xada0, 0x55b0,   #2050
]

g_lunar_month = [
    0x00, 0x50, 0x04, 0x00, 0x20,   #1910
    0x60, 0x05, 0x00, 0x20, 0x70,   #1920
    0x05, 0x00, 0x40, 0x02, 0x06,   #1930
    0x00, 0x50, 0x03, 0x07, 0x00,   #1940
    0x60, 0x04, 0x00, 0x20, 0x70,   #1950
    0x05, 0x00, 0x30, 0x80, 0x06,   #1960
    0x00, 0x40, 0x03, 0x07, 0x00,   #1970
    0x50, 0x04, 0x08, 0x00, 0x60,   #1980
    0x04, 0x0a, 0x00, 0x60, 0x05,   #1990
    0x00, 0x30, 0x80, 0x05, 0x00,   #2000
    0x40, 0x02, 0x07, 0x00, 0x50,   #2010
    0x04, 0x09, 0x00, 0x60, 0x04,   #2020
    0x00, 0x20, 0x60, 0x05, 0x00,   #2030
    0x30, 0xb0, 0x06, 0x00, 0x50,   #2040
    0x02, 0x07, 0x00, 0x50, 0x03    #2050
]

#==================================================================================

from datetime import date, datetime
from calendar import Calendar as Cal

START_YEAR = 1901

def is_leap_year(tm):
    y = tm.year
    return (not (y % 4)) and (y % 100) or (not (y % 400))

def show_month(tm, out):
    (ly, lm, ld) = get_ludar_date(tm)
    out.write('\r\n')
    out.write("%d年%d月%d日" % (tm.year, tm.month, tm.day))
    out.write(" " + week_str(tm))
    out.write("|农历：" + y_lunar(ly) + m_lunar(lm) + d_lunar(ld))
    out.write('\r\n')
    out.write("日|一|二|三|四|五|六\r\n")

    c = Cal()
    ds = [d for d in c.itermonthdays(tm.year, tm.month)]
    count = 0
    for d in ds:
        count += 1
        if d == 0:
            out.write("| ")
            continue

        (ly, lm, ld) = get_ludar_date(datetime(tm.year, tm.month, d))
        if count % 7 == 0:
            out.write('\r\n')

        d_str = str(d)
        if d == tm.day:
            d_str = "*" + d_str
        out.write(d_str + d_lunar(ld) + "|")
    out.write('\r\n')

def this_month(out):
    show_month(datetime.now(), out)

def week_str(tm):
    a = '星期一 星期二 星期三 星期四 星期五 星期六 星期日'.split()
    return a[tm.weekday()]

def d_lunar(ld):
    a = '初一 初二 初三 初四 初五 初六 初七 初八 初九 初十\
         十一 十二 十三 十四 十五 十六 十七 十八 十九 廿十\
         廿一 廿二 廿三 廿四 廿五 廿六 廿七 廿八 廿九 三十'.split()
    return a[ld - 1]

def m_lunar(lm):
    a = '正月 二月 三月 四月 五月 六月 七月 八月 九月 十月 十一月 十二月'.split()
    return a[lm - 1]

def y_lunar(ly):
    y = ly
    tg = '甲 乙 丙 丁 戊 己 庚 辛 壬 癸'.split()
    dz = '子 丑 寅 卯 辰 巳 午 未 申 酉 戌 亥'.split()
    sx = '鼠 牛 虎 免 龙 蛇 马 羊 猴 鸡 狗 猪'.split()
    return tg[(y - 4) % 10] + dz[(y - 4) % 12] + ' ' + sx[(y - 4) % 12] + '年'

def date_diff(tm):
    return (tm - datetime(1901, 1, 1)).days

def get_leap_month(lunar_year):
    flag = g_lunar_month[(lunar_year - START_YEAR) / 2]
    if (lunar_year - START_YEAR) % 2:
        return flag & 0x0f
    else:
        return flag >> 4

def lunar_month_days(lunar_year, lunar_month):
    if (lunar_year < START_YEAR):
        return 30

    high, low = 0, 29
    iBit = 16 - lunar_month;

    if (lunar_month > get_leap_month(lunar_year) and get_leap_month(lunar_year)):
        iBit -= 1

    if (g_lunar_month_day[lunar_year - START_YEAR] & (1 << iBit)):
        low += 1

    if (lunar_month == get_leap_month(lunar_year)):
        if (g_lunar_month_day[lunar_year - START_YEAR] & (1 << (iBit -1))):
             high = 30
        else:
             high = 29

    return (high, low)

def lunar_year_days(year):
    days = 0
    for i in range(1, 13):
        (high, low) = lunar_month_days(year, i)
        days += high
        days += low
    return days

def get_ludar_date(tm):
    span_days = date_diff(tm)

    if (span_days <49):
        year = START_YEAR - 1
        if (span_days <19):
          month = 11;
          day = 11 + span_days
        else:
            month = 12;
            day = span_days - 18
        return (year, month, day)

    span_days -= 49
    year, month, day = START_YEAR, 1, 1
    tmp = lunar_year_days(year)
    while span_days >= tmp:
        span_days -= tmp
        year += 1
        tmp = lunar_year_days(year)

    (foo, tmp) = lunar_month_days(year, month)
    while span_days >= tmp:
        span_days -= tmp
        if (month == get_leap_month(year)):
            (tmp, foo) = lunar_month_days(year, month)
            if (span_days < tmp):
                return (0, 0, 0)
            span_days -= tmp
        month += 1
        (foo, tmp) = lunar_month_days(year, month)

    day += span_days
    return (year, month, day)

from datetime import date, timedelta
class MainPage(webapp.RequestHandler):
  def get(self):
    fontsize = 1
    pc = False
    if self.request.headers['User-Agent'].find("MSIE") >= 0:
      pc = True
    elif self.request.headers['User-Agent'].find("Firefox") >= 0:
      pc = True
    if pc:
      fontsize = 2
    mon = int(self.request.get("mon", default_value='0'))
    nm = datetime.now()
    out = self.response.out
    if mon != 0:
      dt = datetime.now()
      mo = dt.month
      yr = dt.year
      nm = datetime(yr, mo, 1) + timedelta(days=31) * mon
      nm = datetime(nm.year, nm.month, 1)
    if pc == False:
      self.response.headers['Content-Type'] = 'application/xhtml+xml; charaset=UTF8'

    out.write("""<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>WAP/Mobile/Web农历</title>
</head>
<body>
<h3>WAP/Mobile/Web农历</h3>
<div style="font-size:%dex; line-height:2ex;">
<a href="/nongli?mon=%d">上一月</a>
<a href="/nongli?mon=%d">下一月</a>
<a href="/">Tim's App Engine</a>
</div>
<pre style="font-size:%dex; line-height:2ex;">""" % (fontsize, mon - 1, mon + 1, fontsize))
    show_month(nm, out)
    self.response.out.write("</pre></body></html>")

application = webapp.WSGIApplication(
                                     [('/nongli', MainPage), ('/nongli', MainPage)],
                                     debug=True)
def main():
  run_wsgi_app(application)

if __name__ == "__main__":
  main()

访问地址

WAP/Mobile/WEB农历

另外排版有点乱，因为为了适应手机的屏幕，所以把空格都去掉了。下一步考虑用个TABLE套进去，这样界面会更整洁一些。

Java垃圾回收调优

Wednesday, Jan 7th, 2009 by Tim | 8 Comments
Filed under: Java | Tags: GC, Java

在Java中，通常通讯类型的服务器对GC(Garbage Collection)比较敏感。通常通讯服务器每秒需要处理大量进出的数据包，需要解析，分解成不同的业务逻辑对象并做相关的业务处理，这样会导致大量的临时对象被创建和回收。同时服务器如果需要同时保存用户状态的话，又会产生很多永久的对象，比如用户session。业务越复杂的应用往往用户session包含的引用对象就越多。这样在极端情况下会发生两件事情，long gc pause time 或 out of memory。

一，要解决long pause time首先要了解JVM中heap的结构

java gc heap

Java Heap为什么要分成几个不同的代(generation)? 由于80%-98%的对象的生存周期很短，大部分新对象存放在young generation可以很高效的回收，避免遍历所有对象。
young与old中内存分配的算法完全不同。young generation中由于存活的很少，要mark, sweep 然后再 compact 剩余的对象比较耗时，干脆把 live object copy 到另外一个空间更高效。old generation完全相反，里面的 live object 变化较少。因此采用 mark-sweep-compact更合适。

二，Java中四种垃圾回收算法

Java中有四种不同的回收算法，对应的启动参数为
–XX:+UseSerialGC
–XX:+UseParallelGC
–XX:+UseParallelOldGC
–XX:+UseConcMarkSweepGC

1. Serial Collector
大部分平台或者强制 java -client 默认会使用这种。
young generation算法 = serial
old generation算法 = serial (mark-sweep-compact)
这种方法的缺点很明显，stop-the-world, 速度慢。服务器应用不推荐使用。

2. Parallel Collector
在linux x64上默认是这种，其他平台要加 java -server 参数才会默认选用这种。
young = parallel，多个thread同时copy
old = mark-sweep-compact = 1
优点：新生代回收更快。因为系统大部分时间做的gc都是新生代的，这样提高了throughput(cpu用于非gc时间)
缺点：当运行在8G/16G server上old generation live object太多时候pause time过长

3. Parallel Compact Collector (ParallelOld)
young = parallel = 2
old = parallel，分成多个独立的单元，如果单元中live object少则回收，多则跳过
优点：old old generation上性能较 parallel 方式有提高
缺点：大部分server系统old generation内存占用会达到60%-80%, 没有那么多理想的单元live object很少方便迅速回收，同时compact方面开销比起parallel并没明显减少。

4. Concurent Mark-Sweep(CMS) Collector
young generation = parallel collector = 2
old = cms
同时不做 compact 操作。
优点：pause time会降低, pause敏感但CPU有空闲的场景需要建议使用策略4.
缺点：cpu占用过多，cpu密集型服务器不适合。另外碎片太多，每个object的存储都要通过链表连续跳n个地方，空间浪费问题也会增大。

几条经验：
1. java -server
2. 设置Xms=Xmx=3/4物理内存
3. 如果是CPU密集型服务器，使用–XX:+UseParallelOldGC, 否则–XX:+UseConcMarkSweepGC
4. 新生代,Parallel/ParallelOld可设大于Xmx1/4，CMS可设小，小于Xmx1/4
5. 优化程序，特别是每个用户的session中的集合类等。我们的一个模块中session中曾经为每个用户使用了一个ConcurrentHashMap, 里面通常只有几条记录，后来改成数组之后，每台机大概节约了1~2G内存。

不过总的说来，Java的GC算法感觉是业界最成熟的，目前很多其他语言或者框架也都支持GC了，但大多数都是只达到Java Serial gc这种层面，甚至分generation都未考虑。JDK7里面针对CMS又进行了一种改进，会采用一种G1(Garbage-First Garbage Collection)的算法。实际上Garbage-First paper(PDF) 2004年已经出来了，相信到JDK7已经可以用于严格生产环境，有时间也会进一步介绍一下G1。
另外在今年的Sun Tech Days上Joey Shen讲的Improving Java Performance(PDF)也是一个很好的Java GC调优的入门教程。

后端技术 by Tim Yang

Recent Posts

Categories

Most Commented

Archives

Feeds

Archive for January, 2009

使用Google App Engine写的农历日历

开发及改造过程

源码

访问地址

Java垃圾回收调优