2009年5月14日星期四

Multicore CPU vs. GPGPU

With the development of the semiconductor industry, Moore's Law continues to take its course, but the performance boost by increasing the number of transistors in a single CPU core has been increasingly ineffective, and the alternative way of integrating more cores on a die has become a more economical solution. Comparing by the same number of transistors, the latter can provide more raw computing performance, which is the float point peak throughput. Consequently, CPU now has given up the original development roadmap, and turn to adopt the approach of enhancing performance through multithreads running on multicores. At the same time, with the GPU advancement in flexibility and programmability of graphic rendering, especially the emergence of DX10's Uniformed Shader, the GPUs are moving towards the general purpose direction, and therefore could be able to bear the heavy load of generic floating-point operations. If we look the steam processor of GPGPU as a micro processor core unit, the GPGPU is in fact another CPU architecture implementation of multithreading.

Because CPU and GPU are created to handle quite different applications at the time of their birth, so there are a lot of differences in their architectures. For instance, CPU hides the memory latency by introducing cache, while GPU attains that goal through switching threads, so the cache mechanism of CPU is much more complicated than GPU; Also, CPU thread scheduling is implemented by OS, while GPU thread scheduling relies largely on hardware; The number of CPU's concurrent threads is far less than GPU; CPU cache coherence is based on the hardware, while GPU cache consistency among different shader clusters is completed by software; There is a substantial number of units existed in GPU for graphic processing such as read-only texture cache. Even though Intel draws a lot of GPU design style, its Teraflops implementation, Larrabee, has several typical characteristics of CPU. Similarly, Nvidia's GPGPU products, GeForce GT280, its primary mission is still to run 3D games better, so its general purpose computing is still based on the nature of GPU.

Being different from X86’s apparent dominance in single core era, at the time of entering the multicore era, it is still too early to predict whether Many Core CPU or GPGPU will become the mainstream of the future multi-threaded computing. Comparing to the hardware engineers who are only required to upgrade high-performance processors under the framework established by their employer, software engineers' tasks are more difficult, because they need to face the challenges of cross-platform. If every multi-threaded application has to completely reprogram when ported from Nvidia platform to Intel platform, then the software engineers will have no chance to see the tomorrow's sun. Clearly, in the history of GPU development, it also had the similar problem, and later the debut of DX solved everything. Programmers only need to call the fixed APIs to perform required functions, without directly addressing a wide range of GPU hardware itself. Similarly, multi-threaded computing today is experiencing the same things, such as DX11's Computing Shader and OpenCL. Provided with a middle layer, the programmers are only need to be familiar with the APIs, and don't have to deal directly with processors of different architectures. In order to adapt to such changes, the compiler is also divided into two stages. First of all, the compiler will compile the source code into a unified intermediate language, and then compiled dynamically at different platforms by the main CPU to binary code which fits into multi-threaded processors; Or directly compiled into suitable binary code at the application installation process, and at the time of program execution, it will be directly downloaded to the multi-threaded processor's local memory (such as GPU memory) to run.

Thus, at multi-threading era, the compiler can be divided into two categories. One type compiles the source code into the compiler intermediate language. The optimization technology of this type has nothing to do with architecture. It uses the general automatic multi-threading technology and single-thread optimization techniques. The other is the compiler between the intermediate language and binary code, this need to consider the characteristics of its hardware architecture and make targeted optimization. Comparatively speaking, it is relatively easy to implement the former by using existing multi-threading tools, while the latter are relatively scarce from theory to practice. And it still need constantly evolving with the hardware development, this is what I am interested in the direction of the research.

In my opinion, in the process of compiling intermediate language to binary code, the following aspects of computer architecture will become important factors that affect performance:

  1. Instruction-Level parallelism (ILP). Such as Intel Larrabee architecture which uses SIMD instructions of 16 operands. In order to achieve effective execution, new and existing ILP mining methods must be introduced and developed.
  2. Core to core communications and communications between the core and its different memory hierarchies. No matter what architecture it is, NV or Intel, multi-threaded processor's core to core bandwidth is far below the throughput capacity of each core, and because of the bit-width and frequency limitation of local memory, the bandwidth that a core can provide is limited. In order to effectively enhance the efficiency of multi-threaded processor, we need to make good use and management of the core's subsidiary cache, reduce inter-core communications and reduce local memory access.
  3. Thread instruction stream and data structure rearranging. There is a large difference of the number of Light-Weight Threads (LWT) which could be executed by different architectures, such as Intel architecture can deal with the light-weight threads on the quantity that is far less than the NV architecture, so the number of thread must be set appropriately, re-structuring the operation tasks and corresponding data sets of these threads, it is also necessary to optimize the way of thread dispatch.
  4. Multiple multi-threaded processor optimizations. Currently, it is very common to see multiple-CPU systems and multiple-GPU systems, and multiple multi-threaded processor system will also become a popular configuration. So in that case, the load balancing and communication optimization among the multi-threaded processors are also topics worthy studying on.

2008年9月14日星期日

First Impression

It has been four days since I arrived in Japan, most of which was working hours, I don't even have a chance to see anything of the city except the commute route. On September 10th, I got up at 6:30 AM to catch the plane, we landed in Narita Airport and took a one hour train to Kawasaki, the city where the TOSMEC headquarter located. Then we went directly to the office with our baggages and started setup the environment. It was about 4:00 PM, and we didn't have lunch. First day was already making us feel the tension in the Japanese office, like the tight schedule of this project. We work 13 hours a day, 9am~10pm, it is said that the Japanese colleagues went home at about 1am, I don't know this for sure, but everyday we are the first to leave the office. They don't speak English, only the managers speak some lousy English, which I can barely understand. This is kind of a good thing, I don't need to talk to them. Our team leader has been here many times, he takes care of the communication part. 

The Japanese are very polite, careful and thrifty. They have many regulations about not to disturb the public and protect the environment in daily life, and they are good at following rules. Garbage must be categorized and each category should be disposed at one specific day in a week which I can't remember, quiet in the train or elevator, waiting in a line spontaneously at every possible scenario you can image, greeting each other when entering or leaving the office, sometimes the female colleagues bow to send their regards when you run into them in the office, this makes me nervous because I don't bow back. The PCs, printers and monitors in the lab are very old models, far behind the American office, even Shanghai office is better than here. They don't have a drinking-water machine, instead they use a small electric teapot which you may see in some motels in China, it is so small that need to refill frequently. Moreover, they don't provide free bag-tea, coffee, or anything, just hot water. Everything is about cost down. If you want to have some tea, bring you own.

We live in a motel about 15 min walking distance from the office, can not compare with the California Hotel, but at least better than my room in Shanghai. It has a air-conditioner, a TV which I only turned on once because all program spoke Japanese, a microwave oven, a cooking bench with electric heating, a table, a chair, a bed, a balcony, many cabinets and Internet access. The pillow is extremely uncomfortable so I have to sleep without it. They telephone here is expensive, not like the US, I can only be reached on the Internet.

Japanese food is delicious, tiny and expensive. Bentō is available at every convenient store and supermarket, charging from 400 Yen to 800 Yen, looks fancy and delicate, and tastes good. There are many small restaurants with variety of tastes nearby, I got my appetite back as soon as I see the luring pictures outside. The anorexia was cured but I have to restrain myself not to return all my "bloody-sweaty" money to Japanese.

After one day's heavy work, I go back to motel, exhausted but feels good, better than routine and meaningless days in Shanghai. To devote all to one thing, with no time or energy to consider anything else, it is an effective way to escape from the anxiety and depression of the real life, also, it is the best way the improve oneself and experience a joy of achievements. 

Life is hard, but not necessarily boring. It is all about trying something new.

2008年3月28日星期五

Once Upon a Time in America

I remember Hunter told me that the young Jenifer in this movie was astonishing and unbelievable beautiful when she made her first appearance, took his heart at the first glance. And I am pretty sure LaoCai was showing his truly and deepest admiration to the GREAT Robert De Niro when he recommended this to me. America is always the heaven for dreamers, it's good to have a place like this on our planet, we could have what we want if we try.

The three months adventure is still a dream to me, it's like a gift from god and couldn't be any better. I met with all kinds of amazing people from across the globe, Singapore, Malaysia, HongKong, Taiwan, India, Japan, Korea, Thailand, Switzerland, and of course the Americans. People here are so nice to each other that I couldn't help to like them, they have different backgrounds that provide different perspectives to the whole society, which make their united states strong, prosperous, and most important free! It represents the most highly developed human civilization, and attracts the most brilliant and creative people in the world to come together for a single reason: To lead the mankind to infinity, and beyond!

This precious memory is frozen and isolated in my brain, disconnected from others, because it is completely different from any of my former experience, and I can't take the chance to contaminate it. But from time to time, I recall it and relive it, only to feel its power and refill myself with energy, like having a spare heart pumping the blood in every vessel all over my body. The smell is still familiar when I breathed the free air at Stanford campus, the overwhelming remains the same when I set foot on the Golden Gate Bridge, the mother nature strikes me like yesterday with the lovely Black Bear, the giant Red Wood and the magnificent mountains and falls in the Yosemite Valley, the intelligence of human shocks me again and again by exploring in the Intel Museum, wandering at the Google Campus and passing by all these fairytale high-tech companies in the Sillicon Valley. Not to mention the Sun is shining and the Pacific keeps beating on the glorious California beach, can you just close you eyes and image what a life it was.

I have always been thinking about what I want, how do I want to live my life, until then I felt like I had an answer, at least for now. And I can also respond to Hunter's question a year ago, what will you be seven years from now. Young man, follow your heart, you have everything you need, a healthy body, a determined will, a scientific method and an open mind, nothing can stop you, the dream is coming true!

2007年8月27日星期一

The Return Of The King

逆境忠诚现,今生黑白魂。梦临阿尔卑,向往都灵城。凛凛真斑马,堂堂贵妇人!此心终不悔,永远伴尤文!

百年历史,第一次降级,任何一 个热爱斑马的人心中都是难以名状的痛。我们损失了两个联赛冠军,损失了大牌球星和教练,损失了赞助商,但真正的尤文人不会离你而去,Nedved、 Buffon、 Piero、 Trezeguet、 Camoranesi留了下来,真正的尤文精神保留了下来,我们依然是那支永不放弃永不言败的斑马。经历了一年的血与火的洗礼,涅磐重生的新斑马,黑与白 在那一刻无比的纯净和高贵。
我们的阵容也许并不能称之为强大,但是,永远不要低估一颗冠军的心。披上黑白战袍,踏上阿尔卑的草地,一种神圣的使命 感油然而生,11个人的团队心中明白,他们不是一个人在战斗,他们代表了斑马铁骑横扫亚平宁所向披靡的光荣传统,他们现在代表了尤文百年君临意甲的辉煌, 他们要发出振聋发聩的王者归来的宣言!
David Trezeguet,那个曾“改变了本届欧锦赛”的飘逸剑客,以他敏锐的嗅觉,精妙的射术,亮剑必见血.His hat trick is just the right time to turn me on. Pavel Nedved,钢铁战神,不知疲倦的奔跑,留下金发飞扬的背影,只是35岁的德哥,每次被对方铲翻痛苦倒地的样子,我总是感觉眼中有晶莹的泪花闪动。但也 许正是这一点,才是贵妇人与众不同的地方,我们求胜的信念和精神经过代代传递,每一代的精神领袖都能以其杰出的才能感召着队友,引领他们迈出坚实的前进脚 步。

Loyalty, that's what you can get as a Bianconero.

卑鄙是卑鄙者的通行证,对乘人之危落井下石的国米,我只有一句话,出来混的,总有一天是要还的!可吹,你感觉到了吗?

申姬不甘人后,亦附上他的作品:
去岁沉浮磨难,
今朝重返生辉。
岁月峥嵘未离弃,
上下齐心永伴随。
笑看时局危。

燕雀常欺大鸟,
昆鸡屡笑鹰非。
意甲鏖兵烽火烈,
论剑江湖霸主归。
壮吾斑马威!

I am blind, not deaf!

---Demon Hunter
恶魔之穷凶极恶,可见一斑。周日的老特拉福德,抱着嗜血的渴望,恶魔们登场了。热刺本赛季的投入创下了球队纪录,但开局不顺,三轮输了两场,并且都是输给 弱旅,马丁·尤尔的帅位岌岌可危。队中的首席射手保加利亚前锋Berbatov,在经历了他第一个英超赛季的大放异彩之后,翅膀长硬了,也开始和主教练分 庭抗礼起来。将帅不和的热刺,场上11人并没有表现出不睦,他们高举技术流大旗,打出了赏心悦目的地面进攻。Berbatov成了场上最引人注目的明星, 轻巧卸球转身搓出弧线球打远角差之毫厘,倒地之后机敏的捅出一脚被Ferdinand门线上解围,面对出击门将夹球跳过凌空弹射被Brown用身体挡出, 才华横溢的他用最优雅又最致命的技巧考验着红魔球迷的神经,让人们又爱又恨。近传弗爵爷有意挖他过来,以解锋线燃眉之急,在我看来这将是最完美解决方案 了,本场比赛的表现更坚定了爵爷的想法。
英超打了四轮,曼联的四场比赛给人的感觉就是破门乏术,得势不得分。Rooney伤停2个月, Ronaldo红牌停赛三场,Saha和Ole还未伤愈,董方卓难堪大任,没有前锋可用固然是一个原因。但是根本问题是前场上没有高点,没有能够背身拿球 的Target Man,这样只能就依靠边路打开缺口,这正是上赛季曼联克敌制胜的法宝,但是新赛季对手对这套打法已经研究的很透彻了,边路都会有重兵把守,根本不给你突 破的机会。回想一下,第一轮平雷丁,De la Cruz紧盯Rooney并多次成功救险。第二轮做客朴茨茅斯,Sol Campbell宝刀不老,像一面铁闸,让曼联的进攻到此为止。第三轮曼彻斯特德比,明日之星Richards星光熠熠,一对一成功率高的惊人,曼联完全没有真正威胁到对方球门。本轮对热刺,上赛季的英超最佳右后卫法国国脚Chimbonda依然冻结着边路。用一个速度快并且强壮灵 活的黑人边后卫紧盯曼联最活跃的边路拿球队员,让其根本没有空间加速突破,然后中场回来策应补防,这一招对付曼联屡试不爽,任何一个明眼人都看出来了。虽 然被克的这么厉害,老爵爷也是巧妇难为无米之炊,没有强力中锋的后果就是中场组织一次又一次的进攻,到了禁区前沿,没有空间,回传或者被抢断。控球率虽然 很高,但是光打雷不下雨,只能寄希望于外围的远射轰出世界波。
大难不死,必有后福。在本方球门几次濒临失守之后,Nani站了出来,前场轻巧的脚后跟断球转身,趟了一步后,禁区弧顶附近拔脚怒射,
皮球如出堂的炮弹,呼啸着直挂网底。伴随着C罗二世进球后目眦尽裂的狰狞怒吼,看台上沸腾了,球迷们积蓄了三周的热情终于爆发了,魔鬼的主场如此可怕的气氛,足以令任何一个对手胆寒。接下来,球场上又进入了红魔最熟悉的的节奏,胜利也就顺理成章的到来。但是别忘了,我们距离榜首的Chelsea已经有了5分之遥,我们的进攻问题还没有根本解决,卫冕之路必将艰险万分。

2007年8月10日星期五

2007年8月9日星期四

Leading Innovation

本周末新赛季的英超就要开战了,但是转播问题到现在好像还没有解决方案,由于天盛买断了转播权,以前地方台提供的免费球赛被取消了,每年交纳588人民币(平均每月49元),就可在每天观看24小时的欧洲足球实况直播。球迷不愿意付钱,地方台也不愿意向天盛妥协,目前天盛处于一个非常尴尬的境地,已经买断的转播权,却没有电视台跟它合作。所以我也不知道怎么才能看到比赛,估计到时候PPLive上应该会有外国(或香港)电视台的信号。
上周末看了曼联和切尔西的社区盾杯赛(因为不是正式的联赛,所以不在买断之列),作为新赛季开始的序幕,联赛冠军和杯赛冠军都向球迷们展示了全新的阵容,由于Ballack和Roben的去留未定,切尔西派出了更多的新面孔首发,双方的队长Neville和Terry都没有出场。尽管不是最强阵容,比赛依然进行的行云流水精彩激烈,经历了昏昏欲睡的亚洲杯和令中国球迷没脸见人的国字号球队夏季大溃败之后,又看到了英超标志性的飞铲,球员满场的飞奔,精神马上为之一振。90分钟双方各进一球,直接进入点球决胜。Van Der Sar天神附体,连续扑出切尔西的前三个点球,而曼联则弹无虚发,前三轮过后就捧起了奖杯。这场比赛最露脸的应该是刚从里昂转到切尔西的法国国脚Malouda,他突然启动利用速度过了Ferdinand之后面对Van Der Sar,失去重心的情况下倒地之前将球捅入球门远角扳平比分,完美的个人秀。切尔西延续了他们从法国挖宝的优良传统,从Drogba到Essien再到Malouda,每一个都是当今足坛各自位置上的绝对巨星,法国联赛的人才太多了。
新的红魔战袍有点紧身衣的味道,球员们蕴涵着巨大能量的肌肉块清晰可见,看得我一阵阵的心痒。Rooney接过了充满传奇色彩的象征着球队核心的曼联10号球衣,这份荣耀将激励着他发挥出更大潜能。两个月不见,他一身的横肉练得越发健壮了,配合着他那张毛茸茸的充满朝气的脸,有兽型;一直以来Rooney最被我喜爱的一点就是斗志的饱满,在场上不知疲倦用尽全力的奔跑拼抢,这样最可贵的就是随时能燃起队友的斗志,更随时可以引爆球迷的激情,看见了Rooney还在跑,就绝不会轻言放弃,无论对手是谁,不会因为对手弱而谦让,更不会因为对手强而胆怯,他就是为大场面而生的,踢起球来完全不会去考虑对手,只按照自己的意思来,不会考虑自己是巨星而碍于面子,事必躬亲,哪怕是个界外球也要面红耳赤的找边线裁判理论一番,有兽性;不要被他莽横的外表和性格欺骗,天才少年的球技丝毫不亚于任何人,无论停球传球射门,都非常精准合理,绝不是一味的用蛮力,FM里面有一个习惯动作射门喜欢将球挑过门将,这个场面是不是似曾相识啊,毕竟,球场上还是要靠实力说话,兽人之王实至名归。
随着Tevez加盟的定案,新赛季曼联的锋线将由两个小兽人搭档,没有身高优势的他们,冲击力依然可怕,我唯一担心的就是前场没有头球优势,会对中场的组织提出更高的要求。Hargreaves终于回归英格兰本土,他的防守能力和敬业精神世界杯上有目共睹,这样可以给弗爵爷提供更多的中场选择,毕竟Scholes年纪不饶人,而切他和Carrick都是攻强于守的类型。非常令人心疼的就是曼联老臣Heinze闹着要走,这是一个极其优秀的防守球员,无论左边卫还是左中卫都能胜任,而且他的一脚传中我觉得完全可以媲美右路传中天王Sagnol。边路球员方面花重金引进了被誉为下一个C罗的葡萄牙年轻国脚Nani和波尔图年仅19岁的巴西天才Anderson,两翼齐飞的边路进攻一直是曼联克敌制胜的法宝,这两个前途无限的边锋买的绝对超值。勇猛剽悍的Alan Simth不能拿到稳定的主力位置,在重伤苦等了一个赛季之后,只好转投喜鹊,精神属性基本全20的人啊,祝他一路走好。对于新赛季的战术,我也斗胆帮忙出出主意。上赛季C罗是风光无限,包揽三项球员个人大奖,其实这也反映出了球队的战术思想的重点。没有强力中锋,不能像切尔西那样玩粗放型的后场长传高吊,我们的战术讲究的是一个整体。后场断球后经过中场的调度,分到两个边路,边后卫的套上传中,边前卫的内切射门,边路球员的进攻能力非常重要,曼联边路进攻还有一个独到之处就是两个边锋能左能右,可以非常默契的交叉换位,Giggs可以走右路,Ronaldo从左路突破依然犀利,这会打乱对方的边路盯人防守。另外,两个前锋Rooney和Tevez的拿球能力都非常强,他们也可以回撤拿球组织,吸引对方中卫跟出来,这时候中场队员后插上将会有非常大的威胁,特别是我们的中场远射能力都很强。曼联还有一项得分利器就是角球,Giggs和Nani的脚法发出的球质量极高,让Vidic这个新一代头球王子在进攻中发挥出了巨大的能量。这只是4-4-2的打法,还可以根据球员伤病情况和对手的特点调整为4-5-1,如果在冬季转会期能找来一个好的高中锋,那么战术体系就会更加的完备。
阿森纳留不住他们的国王Henry,现在只能指望van Persie突然爆发,实力有所下降,我不看好它们今年能有新的突破。利物浦弄来了垂涎已久的西班牙金童Torres,加上上赛季冬季转会捡到的超级大元宝,阿根廷国脚,号称“一个人守在后面,其他人可以无忧投入进攻的”的史上最强防守型中场Mascherano(世上唯一可以制住Kaka的人),和勤勤恳恳甘愿为他人做嫁衣的的Kuyt、Alonso,渐入佳境时有惊人之举“竹竿”Crouch,在连续几个赛季投入巨额资金之后,远射之王Gerrard的球队变得非常可怕,是联赛冠军有力的争夺者。穆里尼奥的卢布军团今年在转会市场上好像比较低调,队长Terry高薪续约,周薪达到创纪录的13万镑,8月31号转会窗口关闭之前,切尔西可能会有大动作。随着Owen的伤愈复出,加上Viduka、Martins、Smith,纽卡斯尔的锋线非常可怕,中场又补充了Barton、Geremi,他们完全有能力在今年取得好成绩。曼城找来了埃里克森当教练,号称在转会市场上疯狂投入5300万镑,吃进Petrov、Bojinov、Bianchi、Elano等实力派,加上未来英格兰代表队Neville的接班人Richards,如果教练能够把这些人用好,必将一改上赛季锋无力的局面,相信中国太阳也能得到新的提高。新赛季的英超球星云集,当年流传的名言“意甲无弱旅”现在可以换成“英超无弱旅”了,各队都厉兵秣马,磨刀霍霍,期待07/08赛季的英超联赛能带给广大球迷更加完美的客户体验。