twitter – 后端技术 by Tim Yang

Posts Tagged ‘twitter’

用Twitter的cursor方式进行Web数据分页

Tuesday, Jan 19th, 2010 by Tim | 16 Comments
Filed under: Web | Tags: mysql, performance, twitter

本文讨论Web应用中实现数据分页功能，不同的技术实现方式的性能方区别。

上图功能的技术实现方法拿MySQL来举例就是

select * from msgs where thread_id = ? limit page * count, count

不过在看Twitter API的时候，我们却发现不少接口使用cursor的方法，而不用page, count这样直观的形式，如 followers ids 接口

URL:
http://twitter.com/followers/ids.format
Returns an array of numeric IDs for every user following the specified user.

Parameters:
* cursor. Required. Breaks the results into pages. Provide a value of -1 to begin paging. Provide values as returned to in the response body’s next_cursor and previous_cursor attributes to page back and forth in the list.
o Example: http://twitter.com/followers/ids/barackobama.xml?cursor=-1
o Example: http://twitter.com/followers/ids/barackobama.xml?cursor=-1300794057949944903

http://twitter.com/followers/ids.format

从上面描述可以看到，http://twitter.com/followers/ids.xml 这个调用需要传cursor参数来进行分页，而不是传统的 url?page=n&count=n的形式。这样做有什么优点呢？是否让每个cursor保持一个当时数据集的镜像？防止由于结果集实时改变而产生查询结果有重复内容？
在Google Groups这篇Cursor Expiration讨论中Twitter的架构师John Kalucki提到

A cursor is an opaque deletion-tolerant index into a Btree keyed by source
userid and modification time. It brings you to a point in time in the
reverse chron sorted list. So, since you can’t change the past, other than
erasing it, it’s effectively stable. (Modifications bubble to the top.) But
you have to deal with additions at the list head and also block shrinkage
due to deletions, so your blocks begin to overlap quite a bit as the data
ages. (If you cache cursors and read much later, you’ll see the first few
rows of cursor[n+1]’s block as duplicates of the last rows of cursor[n]’s
block. The intersection cardinality is equal to the number of deletions in
cursor[n]’s block). Still, there may be value in caching these cursors and
then heuristically rebalancing them when the overlap proportion crosses some
threshold.

在另外一篇new cursor-based pagination not multithread-friendly中John又提到

The page based approach does not scale with large sets. We can no
longer support this kind of API without throwing a painful number of
503s.

Working with row-counts forces the data store to recount rows in an O
(n^2) manner. Cursors avoid this issue by allowing practically
constant time access to the next block. The cost becomes O(n/
block_size) which, yes, is O(n), but a graceful one given n < 10^7 and
a block_size of 5000. The cursor approach provides a more complete and
consistent result set.

Proportionally, very few users require multiple page fetches with a
page size of 5,000.

Also, scraping the social graph repeatedly at high speed is could
often be considered a low-value, borderline abusive use of the social
graph API.

通过这两段文字我们已经很清楚了，对于大结果集的数据，使用cursor方式的目的主要是为了极大地提高性能。还是拿MySQL为例说明，比如翻页到100,000条时，不用cursor，对应的SQL为

select * from msgs limit 100000, 100

在一个百万记录的表上，第一次执行这条SQL需要5秒以上。
假定我们使用表的主键的值作为cursor_id, 使用cursor分页方式对应的SQL可以优化为

select * from msgs where id > cursor_id limit 100;

同样的表中，通常只需要100ms以下, 效率会提高几十倍。MySQL limit性能差别也可参看我3年前写的一篇不成熟的文章 MySQL LIMIT 的性能问题。

结论

建议Web应用中大数据集翻页可以采用这种cursor方式，不过此方法缺点是翻页时必须连续，不能跳页。

Twitter API最近的一些飞跃

Wednesday, Dec 30th, 2009 by Tim | 4 Comments
Filed under: SNS | Tags: firehose, twitter

Twitter的平台总监Ryan Sarver在最近一封给开发者公开email Platform announcements from LeWeb提到，打算将API用户请求限制扩大10倍，由目前的150次/小时扩大到1,500次/小时(但同时将搜索范围缩短到7天)。

*Auth announcements*
With the recent launches of Retweet, Lists and Geotagging we have seen
applications struggle to provide the experience they want for their users
within the 150 req/hr limit. We are excited to open the skies up a bit and
provide some more room for developers to work within. Starting in a few
weeks all OAuth requests to api.twitter.com/1/ will be able to take
advantage of a 10x rate limit increase. Basic Whitelisting still exists and
is unchanged. We look forward to what this means in terms of the increased
richness around the user experience in Twitter apps.

注意文中的限制是OAuth客户端，为什么只限OAuth客户端？由于OAuth客户端可控性较强。如果发现app有滥用api嫌疑，可以直接suspend这个app；而另外一种鉴权方式Basic Authentication方式并不强制client传递app id, 服务器判断app abuse较困难。

在我的理解，microblog的最大的特性应该是realtime,(另外一特性应是social graph), 即使Twitter扩大rate limit, REST方式的HTTP协议终究没法实现realtime，如果所有的客户端都1分钟请求25次(1,500/60=25)，twitter服务器稳定性一向声誉不佳，增大后能否经住考验也是一个疑问。

如果要实现真正的realtime, 目前有http callback或者XMPP等方案。callback由于客户端通常在防火墙内并不可行。XMPP由于协议栈庞大，服务端及客户端编写都比较繁琐，而且XMPP是为IM协议设计，所以并不十分适合twitter api。

另外Twitter在邮件中还提到，打算将所有最新更新feed的数据流(Twitter称为Firehose)向所有人开放。

*Firehose for everyone*
Finally, the announcement that has garnered the most coverage and
excitement. As I stated in the session at LeWeb we are committed to
providing a framework for any company big or small, rich or poor to do a
deal with us to get access to the Firehose in the same way we did deals with
Google and Microsoft. We want everyone to have the opportunity — terms will
vary based on a number of variables but we want a two-person startup in a
garage to have the same opportunity to build great things with the full feed
that someone with a billion dollar market cap does. There are still a lot of
details to be fleshed out and communicated, but this a top priority for us
and we look forward to what types of companies and products get built on top
of this unique and rich stream.

Firehose可以理解成所有Twitter最近更新的水龙头，目前只对 Microsoft, Google等少量公司开放。Twitter表示以后即使”a two-person startup in a garage”这样的公司也可以获取firehose访问权限, “We want everyone to have the opportunity”。相信不少公司将会为这一特性而激动甚至疯狂。firehose开放意味为第三方提供了无限的创意空间，另外它也会对Twitter已有的服务search, geotag等业务构成威胁，走出这一步需要很大的勇气。

以上文字基本已在新浪微博发表过，整理后就成了一篇blog, 欢迎在新浪微博关注我，点这里进入 http://t.sina.com.cn/timyang

Twitter系统运维经验

Monday, Nov 2nd, 2009 by Tim | 10 Comments
Filed under: tech | Tags: cache, memcached, twitter, velocity

最近看到的另外一个介绍Twitter技术的视频[Slides] [Video (GFWed)]，这是Twitter的John Adams在Velocity 2009的一个演讲，主要介绍了Twitter在系统运维方面一些经验。本文大部分整理的观点都在Twitter(@xmpp)上发过，这里全部整理出来并补充完整。

Twitter没有自己的硬件，都是由NTTA来提供，同时NTTA负责硬件相关的网络、带宽、负载均衡等业务，Twitter operations team只关注核心的业务，包括Performance，Availability，Capacity Planning容量规划，配置管理等，这个可能跟国内一般的互联网公司有所区别。

1. 运维经验

* Metrics

Twitter的监控后台几乎都是图表(critical metrics)，类似驾驶室的转速表，时速表，让操作者可以迅速的了解系统当前的运作状态。联想到我们做的类似监控后台，数据很多，但往往还需要浏览者做二次分析判断，像这样满屏都是图表的方法做得还不够，可以学习下这方面经验。据John介绍可以从图表上看到系统的瓶颈-系统最弱的环节(web, mq, cache, db?)
根据图表可以科学的制定系统容量规划，而不是事后救火。

* 配置管理

每个系统都需要一个自动配置管理系统，越早越好，这条一整理发到Twitter上去之后引起很多回应。

* Darkmode

配置界面可以enable/disable 高计算消耗或高I/O的功能，也相当于优雅降级，系统压力过大时取消一些非核心但消耗资源大的功能。

* 进程管理

Twitter做了一个”Seppaku” patch, 就是将Daemon在完成了n个requests之后主动kill掉，以保持健康的low memory状态，这种做法据了解国内也有不少公司是这样做。

* 硬件

Twitter将CPU由AMD换成Xeon之后，获得30%性能提升，将CPU由双核/4核换成8核之后，减少了40%的CPU, 不过John也说，这种升级不适合自己购买硬件的公司。

2. 代码协同经验

* Review制度

Twitter有上百个模块，如果没有一个好的制度，容易引起代码修改冲突，并把问题带给最终用户。所以Twitter有一强制的source code review制度, 如果提交的代码的svn comment没有”reviewed by xxx”, 则pre-commit脚本会让提交失败, review过的代码提交后会通过自动配置管理系统应用到上百台服务器上。有@xiaomics同学在Twitter上马上就问，时间成本能否接受？如果有紧急功能怎么办？个人认为紧急修改时有两人在场，一人修改一人review也不是什么难事。

* 部署管理

从部署图表可以看到每个发布版本的CPU及latency变化，如果某个新版本latency图表有明显的向上跳跃，则说明该发布版本存在问题。另外在监控首页列出各个模块最后deploy版本的时间，可以清楚的看到代码库的现状。

* 团队沟通

Campfire来协同工作，campfire有点像群，但是更适合协同工作。对于Campfire就不做更多介绍，可参考Campfire官方说明。

3. cache

Memcache key hash, 使用FNV hash 代替 MD5 hash，因为FNV更快。
开发了Cache Money plugin(Ruby), 给应用程序提供read-through, write-through cache, 就像一个db访问的钩子，当读写数据库的时候会自动更新cache, 避免了繁琐的cache更新代码。
“Evictions make the cache unreliable for important configuration data”，Twitter使用memcache的一条经验是，不同类型的数据需放在不同的mc,避免eviction，跟作者前文Memcached数据被踢(evictions>0)现象分析中的一些经验一致。
Memcached SEGVs, Memcached崩溃(cold cache problem)据称会给这种高度依赖Cache的Web 2.0系统带来灾难，不知道Twitter具体怎么解决。
在Web层Twitter使用了Varnish作为反向代理，并对其评价较高。

后端技术 by Tim Yang

Recent Posts

Categories

Most Commented

Archives

Feeds