We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon
Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
*Note: This is a joint translation by Joy Dantong Ma and Jeffrey Ding -- all credit for the original goes to the authors and the original text linked below. These are informal translations and all credit for the original work goes to the authors. Others are welcome to share excerpts from these translations as long as my original translation is cited. Commenters should be aware that the Google Doc is also publicly shareable by link. These translations are part of the ChinAI newsletter - weekly-updated library of translations from Chinese thinkers on AI-related issues: chinai.substack.com
In December 1989, Taipei's cold rain was drizzling, and Samsung head Lee Kun-hee (???) went to Taiwan for a study trip. He secretly invited Morris Chang (Chinese: ???/Morris Chang [a][c][d]), the founder of TSMC, to have breakfast for one purpose: to poach the 58-year-old veteran.
At this time, TSMC has been silently established for two years, and it is still unknown in the industry. Its “foundry” model was not the mainstream approach of the chip field at that time, and people couldn’t understand it. In 1987 when TSMC was founded, Samsung founder Lee Byung-chul (???) passed away, and his third son, Lee Kun-hee, took charge of Samsung. As soon as he took office, he shouted the slogan "Start-up Again (????)" and made a foray into electronics and semiconductors.
Morris Chang is the talent that Lee Kun-hee badly needs. In 1983, he stepped back from the position of "third-in-command" of Texas Instruments. Although he could just enjoy the American dream of "one house, two cars, three dogs,” he was always unwilling. So two years later, when Sun Yun-suan, "Chiang Ching-kuo's successor," invited him to take up the post of president of the Industrial Technology Research Institute in Taiwan, Morris Chang decided to risk it and get out of his comfort zone.
Morris Chang grew up on the mainland and left in 49. He went to the United States to study and work for more than 30 years. He is not familiar with Taiwan, but he can't stand the temptation of “being the top figure” (??????). Only one year after arriving in Taiwan, Morris Chang, 56, decided to go wild again: start a business and become a company that specializes in manufacturing for other chip companies. This model was not favored by peers and investors at the time.
At the dinner table, Lee Kun-hee broke it down: Semiconductors require a lot of capital and talent, and the risks are also great. Samsung is currently developing well. He wanted Morris Chang to come to the Samsung factory and take a look. Lee Kun-hee's words embodied his admiration and sympathy for Zhang, similar to when Cao Cao looked at Liu Bei and said, "?????? (only us two are the strongest)," he wished that Morris Chang would immediately elope with him to Seoul.
Morris Chang was now struggling to get sales/orders, but he was tired of being second and third, and he firmly refused Samsung. Lee Kun-hee did not give up, and invited him to visit Samsung. Morris Chang readily accepted, and after visiting the factory for half a day, he even praised Samsung's production capacity as "impressive", but still refused to leave TSMC. Seeing his firm attitude, Lee Kun-hee had to give up.
After parting ways this time, the trajectory of the two would continue to intersect: TSMC and Samsung have each followed their own routes to rise to become global semiconductor giants, and then launched a bloody duel. At the peak, both sides threw money around, eyes blood-red, and once bought 40% of the world's semiconductor manufacturing equipment. This is both a business war between the two companies and a path to industrial upgrading for these two regions that they had no alternative for.
For quite a long time in the past, and for a long time in the future, TSMC's biggest rival is Samsung, not SMIC.
Part I. Overtaking: From the edge of the industry to the center of the stageAlthough TSMC is a Taiwanese company, Morris Chang injects the "American soul" into it: the management experience from his time at Texas Instruments, IBM-licensed technology, and a large number of talents returning from the United States.
For example, Hu Chenming (???/Calvin Hu), the first technology executive of TSMC, is a professor at the University of California, Berkeley, and his favorite pupil Liang Mong-Song (???) also switched from AMD to TSMC; R&D team leader Chiang Shang-Yi (???) previously worked in Texas Instruments and HP; Cai Lixing (???/Rick Tsai), who later took over for Morris Chang as CEO, was a PhD graduate from Cornell; Yu Zhenhua (???/Douglas Yu), a core technical figure, also had a PhD. from Georgia Institute of Technology.
TSMC received not only talents but also orders for goods from the United States. The semiconductor industry originated in the United States, and then began to transfer to Japan. As a result, Japanese semiconductor companies have cultivated very strong battle-tested competencies, and in the field of memory chips, relying on shrewd management and cost advantages, had utterly routed American companies. In the late 1980s, six of the world's top ten semiconductor companies were Japanese companies.
If you can’t beat them, change the race track. Thus, the United States withdrew from memory chips and made efforts in logic chips such as CPUs. Unlike the strong requirements for integration in memory chips, the design and production segments of logic chips could be separated, which brings opportunities for TSMC. In 1988, Intel sent the first large order and gave guidance on more than 200 processes, which can be described as "funding to get you on top of the horse and technology to take you far along the ride." [e]
Of course, these benefits were not given in vain. Feedbacks from the fast-growing TSMC redounded to the US semiconductor industry. Because they no longer had to bear the huge costs of independently building fabs, a large number of start-up chip design companies could pack lightly for battle and develop rapidly. American chip companies regained the commanding heights of the global chip industry. Today’s giants such as Qualcomm, NVIDIA, Marvell, etc. all benefited from thi [f]s.
In 1995, Nvidia founder Jensen Huang (???) encountered a commercial bottleneck, so he wrote to Morris Chang for help. Soon, he received a call in his noisy office from Morris. Jensen excitedly said to the people around him: Quiet! Morris called me. Subsequently, TSMC successfully completed Nvidia's order, helping it quickly occupy the market.
Jensen Huang was very moved by this, and had this experience captured in a comic which he gave to Morris Chang  [g].
Although TSMC was finding its wings and its model had gotten recognition from industry, its technology was licensed from IBM, and its (technological) autonomy was lacking, so it was still considered by Silicon Valley companies to be a second-rate company. Morris Chang was reluctant to become a technological vassal of the United States and had been waiting for the opportunity to become the master of his own fate. Finally, the opportunity came, and it came twice: i) a counterattack in copper interconnect technology, ii) a breakthrough in lithography machines.
The counterattack in copper interconnect technology ended IBM's technological hegemony: In 2003, IBM hoped to sell its newly developed copper interconnect technology to TSMC, but Morris Chang believed that IBM’s tech was not mature, and TSMC could do better itself. It was Chiang Shang-Yi who led the team to develop and fully utilize the process management experience learned at Texas Instruments: In order to prevent contaminating the materials, the R&D personnel had to strictly follow the route drawn on the floor even when walking.
More than a year later, TSMC's copper interconnect technology made the lead breakthrough, and it was six people who played a central role: Yu Zhenhua, Liang Mong-Song, Sun Yuancheng, Chiang Shang-Yi, Yang Guanglei, Lin Benjian. IBM's technology had not yet made it out of the laboratory. IBM's technological hegemony in foundries was ended. Ten years later, IBM paid another 1.5 billion US dollars, transferred the foundry business to GlobalFoundries [j][k], and completely withdrew from this field.
As for the breakthrough in lithography machines, it not only gave TSMC a technological overtaking, but it also helped cultivate a solid equipment ally: In 2004, TSMC decided to develop wet lithography technology through a new method, and the initiator of this technological project, Lin Benjian, also comes from IBM.
This technology runs counter to the dry method scheme at the time, and Lin Benjian was mocked as "standing in the way of an aircraft carrier [l]." The Japanese "lithography machine overlords" Canon and Nikon also resisted the plan. Only the Dutch small factory ASML was willing to try it. In the end, TSMC completed a technological breakthrough, and ASML also quickly rose, becoming an industry giant and defeating Japanese hegemony, and forming a deep friendship with TSMC. [m]
Today, ASML is under the heavy pressure of the United States. This is a pain in the neck for SMIC, and it also limits Huawei's desire to find OEM partners.
Learning from the U.S. and then surpassing, these two “overtakings on a curve” showed that TSMC's foundry technology was extraordinary. In 2004, TSMC won half of the world's chip foundry orders, ranking in the top ten of the semiconductor industry in terms of scale. Ranked second is South Korea’s Samsung, which won 30% of the world’s market share by battling Japan to the death in memory chips, while Japan’s once-brilliant semiconductor industry only had three companies remaining in the top ten.
Seeing that the overall landscape had solidified, Morris Chang was eager to leave more time for his family. So in June 2005, Morris Chang announced his retirement as CEO and stepped back to an advisory post. He got off work exactly at 7 o'clock to accompany his family, eat dinner, and listen to concerts. At this time, his old friend Lee Kun-hee was gathering troops to prepare to enter the hinterland of TSMC. [n][o]
Part II. Awakened: Lee Kun-hee Attacks, Morris Chang Returns to the FurnaceLee Kun-hee is the chief designer behind Samsung’s semiconductor division. In 1974, he proposed a plan (to set up this division) and went to the United States more than 50 times to bring back technology. This also moved his father Lee Byung-chul to declare that he must start this business before his eyes closed for the last time. However, in the face of Japan -- at the apex of its power -- in the 1980s, up to the death of Lee Byung-chul, Samsung Semiconductor lost 300 million US dollars, with no gain.
And in 1987 when Lee Kun-hee took over, the opportunity for Samsung Semiconductor finally came. That year, the secret of Japan’s Toshiba’s private sale of equipment to the Soviet Union was discovered by the United States. [p]The United States, which had been crushed by Japanese semiconductors, immediately took the opportunity to wield the sanctions stick. Not only did it impose a 100% tariff on Japanese memory chips, it also launched a "301" investigation that made many trading nations frightened. [q]
Thrilled and rejoicing at the news that Japanese semiconductors were being pressed to the ground by the United States, Samsung rushed over and also pressed its foot down.
The secret to Lee Kun-hee's success is "buy people at high prices and sell goods at low prices": he paid at triple their wages to hollow out Toshiba’s engineers, quickly upgraded Samsung’s technology, and then launched an offensive by selling at very low prices. He also played the “emotional” card, calling for overseas Korean engineers to return home and join the fray. Chae Dae Je, a technical backbone (key talent) who worked at IBM for 7 years, was swept up with excitement after hearing the call, and immediately returned to Korea to join the battle.
During the Asian financial crisis in 1997, Samsung still maintained high capital expenditures, dared to fight price wars, and used counter-cyclical investment methods to invest more even as it was losing more money. Finally, in 2004, it overwhelmed companies such as Toshiba of Japan and became the overlord of memory chips. Subsequently, Lee Kun-hee waved his hand, and directed his artillery fire to TSMC, thereby beginning a march on chip foundries.
Lee Kun-hee's strategy is very simple, take advantage of Samsung's diversified advantages, fan out from single points into a whole expanse, to break free from encirclement. For example, in facing Qualcomm, the largest customer of TSMC, Samsung adopted the method of "using purchasing to find selling opportunities," having Samsung phones buy Qualcomm chips, and then persuading Qualcomm to hand over the chip manufacturing to Samsung. For Apple, Samsung bundled together storage chips, display panels, and chip foundries in a sales strategy, the so-called strategy of “selling a cabbage and giving away some spring onions.”
Samsung's bundling strategy is a common way for semiconductor companies to develop business. For example, Qualcomm also uses the advantages of baseband processing to pair unsalable goods up with goods that sell well; AMD also sells CPUs and GPUs together. Samsung’s foundry revenue was only $75 million in 2006, but four years later it tripled, with A-series chips for Apple’s foundry accounting for 85%. It can be said that Samsung's chip division grew up by eating apples one by one.
In 2009, Samsung, which had a winning streak, held an internal meeting, and Lee Kun-hee’s eldest son, Lee [r] (Jae-yong), ambitiously announced a plan: Kill Taiwan, i.e., first eliminate Taiwan’s panel and memory chip industry, and then to defeat TSMC -- the Mount Everest in Taiwan -- to allow Samsung to completely dominate the advanced electronics industry and also to lay the foundation for his own succession to take over the power of his father.
It was only in 2013 that this plan was disclosed by Taiwan’s “New Weekly.” By then, Taiwan’s panel and memory chip industries were all chopped down by Samsung, leaving only TSMC.
Looking at the situation in 2009, it was not impossible for Samsung to defeat TSMC  [s]. At that time, TSMC's leaky house was suffering from continuous rain: after the 2009 financial crisis, profits fell sharply and it was forced to lay off workers. The chief architect of the layoffs was CEO Cai Lixing, who had been working for TSMC for nearly 20 years. He had done things calmly and had strong execution. He once led TSMC's first 8-inch fab, and was known as "a mini-Morris Chang".
Cai Lixing implemented a stricter performance evaluation system than in the Morris Chang era. For the bottom 5% of employees, per the evaluation, he canceled the previous observation period and directly announced layoffs. The fast cut to optimize costs was, in essence, a routine operation, but it angered people because it was not carried out in a humane way.
An employee who had worked for more than ten years and had received excellent rewards was thrown into the elimination list because he could not “996 [t]” since he needed to take care of his pregnant wife. This employee’s elderly father wrote a letter to Morris Chang in tears, urging the company not to lay off his son. Some people even posted banners on the door of Morris Chang’s house, accusing him for being dishonest. Embarrassed by this stain on his reputation, he hurriedly told his wife to send pastries and condolences.
At the same time, the yield rate of the company's new production line has not been improved, and customers have even cancelled orders. Morris Chang was anxious in his heart: the menacing Samsung was about to fight its way to his house gates, and Cai Lixing was still trying to show off his financial reporting skills in suppressing costs and increasing profits! In mid-June, TSMC held a board of directors meeting, and in less than ten minutes Morris Chang assigned Cai Lixing over to a photovoltaic department with less than ten people.
Then, the 78-year-old announced that he would return to the furnace, though he didn’t set a deadline.
After the return, Morris Chang did two things: he expanded the army and the equipment. He declared that the previous layoffs were invalid. Those who were willing to come back to work could immediately return to their posts, and he also invited the retired Chiang Shang-Yi to eat a meal in his office -- did not discuss salary or compensation -- and only gave an order for him to come back and spend the company’s new R&D budget increase of USD 1 billion as soon as possible.
In order to boost morale, Morris Chang even quoted Shakespeare's verse describing Henry V’s battle during his speech to TSMC employees: "Once more unto the breach, dear friends?" Henry V was regarded as a national hero by the British. He led a weak infantry force of less than 6,000 men and defeated the elite French troops that had six times the number of men. Morris Chang's referenced him for obvious reasons, in hopes that TSMC can also create a miracle of weak over strong.
The veteran took the lead, doing the work of two people. TSMC got customers to return and upgraded its technology. Especially in the key technology of the 28-nanometer process, Chiang Shang-Yi chose the gate-last scheme (high-k/metal gate -- HKMG) instead of the gate-first scheme (HKMG) that Samsung was developing. With correct judgment and strict process, the yield of TSMC greatly improved, whereas Samsung did not make progress.
However, at this time, Morris Chang did not unfurrow his eyebrows. He knew that to defeat Samsung and eliminate future troubles, the decisive point was not in Seoul but thousands of miles away on the West Coast of the United States, where there was a super customer that could support any foundry to get to the pinnacle of the world.
PART III. Uprising: Apple Fights Against Korea, TSMC Hands Over A Knife
In 2010, Morris Chang received a special guest at home: Apple's COO Jeff Williams. The two talked about building a factory over red wine, and after a long discussion, reached a cooperation agreement: Apple would order its entire generation of chips from TSMC, provided that TSMC secured US$9 billion in funding to build factories and 6,000 workers to ensure production capacity.
The meeting was much welcomed by both parties as Apple was struggling for air with Samsung’s hands on its neck. From 2008 to 2011, the global share of Samsung's smartphones increased 6 times, reaching 20%. However, more than half of Apple’s key parts needed to be purchased from Samsung. Apple gasped for air while gradually setting in motion a plan to replace Samsung as a supplier. In 2008, Apple shifted orders for flash memory chips from Samsung to Toshiba. Two years later, Apple diverted part of its screen order to Sharp.
In April 2011, Apple filed 16 accusations at one go against Samsung, alleging that Samsung's mobile phone plagiarized Apple’s. However, Samsung, who held Apple's lifeline of key component supplies, refused to surrender and immediately countered Apple’s claims. It accused Apple of infringing on 10 Samsung’s patents and demanded a complete stop of iPhone sales in the US. Apple was fighting in court with a hand on its neck and urged TSMC to accelerate its R&D. Chang sent out TSMC’s top troops at the year end of 2011 - “one team. ”
This team was composed of more than a hundred inter-departmental R&D engineers. They quietly flew to the US from Taipei, Hsinchu and other places and stationed at the Apple headquarters in Cupertino. Cupertino happens to be Hsinchu’s sister city and is less than 10 miles from Samsung and Apple’s courts. These engineers signed strict confidentiality agreements, and their secret task is to develop A8 chips with Apple, bypassing Samsung’s patents.
Since Samsung held many core patents, if TSMC used similar technologies, Samsung would sue TSMC until the latter went bankrupt. Therefore, this task had to be kept as a secret. Samsung was not shy about their plan either. Whenever they communicate with stock analysts, they would emphasize, “if TSMC dares to do so, we will sue them without a doubt.” In order to dispel Apple's concerns, TSMC first participated in the design of A6 chip to show its capabilities, and shared all of its own patents with Apple for verification without reservation.
In order to ensure success, TSMC specially developed two A8 versions for Apple to choose from, vowing to never stop until completely bypassing Samsung’s patent wall. At the same time, TSMC expanded its infrastructure at unprecedented speed and ramped up production capacity. Factory No.12 in Hsinchu, No.15 in Taichung and No.14 in Tainan were expanding at nearly three times the usual speed. Over the western coastline of Taiwan, planes transported chip manufacturing equipment from Europe, America and Japan non-stop.
Between 2011 and 2013, there were only three newly built ultra-large 12-inch foundries in the world, and TSMC alone built a new one and expanded two. In 2013, half of TSMC's revenue was poured into expansion. It wouldn’t be a stretch to say, Chang bet all his chips. Samsung, while not aware of the secret deal between Apple and TSMC, saw all of the expansion and came up with a plan.
Samsung contacted TSMC proactively, and expressed its intention to have TSMC manufacture Samsung’s 4G chip. The order was just a facade. What Samsung really hoped to achieve was to test TSMC’s technologies, process and production capacity. It was clear to Chang where Samsung’s real intention lay. Therefore, he asked Samsung to work with a Taiwanese chip design company, and then the design company could cooperate with TSMC. This way, there wouldn’t be any direct interaction between Samsung and TSMC. Samsung had to drop its plan.
In 2014, Apple finally announced the list of foundries for its A8 chip: TSMC was the one and only. The share price of TSMC soared, and the employees were relieved, "if it were not for the sense of honor and hope to defeat Samsung, who would want to be separated from their families for so long.” Taiwanese media reported with enthusiasm, “mobile phone saves Taiwan” [v], “Morris Chang unravels Samsung.”
But who would know, a few months later, a man who claimed to have TSMC in his blood, stood with Samsung and pushed TSMC to the edge of a cliff, yet again.
PART IV. Setbacks: Samsung Blessed with One General, TSMC Faced One Failure After Another.
Chang had a picture of his wife and himself in the office. The person who took the picture was one of Chang’s favorite pupils, Liang Mong-Song .
Liang was a student of TSMC’s first CTO, Chiang Shang-Yi, and a critical member during TSMC’s breakthrough in copper interconnect technology in 2002, ranking second in Taiwan authorities’ award list. He was the most promising candidate to succeed Chiang as director of R&D. However, just four months before Morris Chang returned in 2009, Liang handed in his resignation, and left TSMC after his 17-year tenure.
Liang did not leave TSMC because of poor performance; instead, he felt he was pushed out of the company. When Chiang retired in 2006, Liang considered himself the best successor in the company. What came next, however, was not a promotion, but an order to transfer him to lead a new initiative called “Beyond Moore [w][x](‘s Law).” This plan sounded fancy, but had only a small office that could fit four people. It was later only implemented in two less-advanced foundries.
The office started to feel like an icebox for Liang. For more than eight months, he sat in the office alone, without stepping out or seeing colleagues. Liang felt embarrassed and insulted. But in fact the current CEO Wei Zhejia (???/Dr. C. C. Wei) also used to lead the “Beyond Moore” initiative. Wei started with one order after another, and eventually revived these two small factories. This “initiative” might have been a test to Liang from Morris Chang.
Whether it was an icebox or a test, Liang lost all hopes and decided to resign in February 2009. Liang taught at National Tsing Hua University for a few months after the resignation. Before long, he was introduced by his wife’s family (who are Korean) to teach telecommunication at Sungkyunkwan University. Two years later, Liang’s non-competition agreement expired, and he found a new employer - TSMC's old rival Samsung - as the technical lead of its semiconductor division.
Samsung had a thirst for talents like Liang. Liang’s annual salary was rumored to be as high as NT$135 million, tripling the standard amount at TSMC and even exceeding Samsung's co-CEO. Lee Kun-hee often said that three Bill Gates could elevate South Korea, and his task was to find three such geniuses. This kind of emphasis on talent regardless of cost is undoubtedly worth studying for mainland China 10 years later.
However, the gift from Samsung hadn’t been in Liang’s hands for three months, a cold complaint from TSMC arrived. The complaint was sternly worded, demanding that Liang stop divulging secrets and immediately resign from Samsung. Looking at the complaint, Liang might have wondered: shouldn’t there be no more contact from his previous employer?
TSMC filed the stern complaint because they noticed - Samsung had changed.
Morris Chang called Samsung a "300-pound gorilla" in the marketplace, but a “black dot in the radar” when it came to technology. However, since Liang’s resignation, Samsung seemed to have gotten a secret cookbook and achieved technology breakthroughs one after another every year: 45nm, 32nm, 28nm. In 2011, it was almost on par with TSMC. For the same catch-up, SMIC in mainland China spent more than 7 years.
Coincident or not, after some investigations, TSMC did find a lot of clues that Liang might have violated the non-compete agreement: The Sungkyunkwan University that Liang taught at is known as South Korea’s Tsinghua and is funded by Samsung; its campus is inside Samsung headquarters; the actual location of Liang’s classes was inside Samsung factory. In the past few years, whenever Liang traveled between Korea and Taiwan, it was all by Samsung’s private plane.
The thought of Liang violating his non-compete and working for Samsung enraged TSMC. Liang on the other hand also felt wronged by TSMC. In a courtroom that’s often confrational and combative, Liang poured out all of his sorrows.
Liang almost choked up and said [y], “based on my seniority, on what grounds did I get sent to a small unit that was so limiting? I felt deceived and insulted. The leadership at TSMC simply forgot about me. Why is TSMC so ruthless? For a person who has dedicated his life to TSMC, I hoped to once again fight for the TSMC but silence was the only response I received.”
He passionately continued, “I’m a man of my word, not a rebel who fled to the enemy’s camp like the media pictured. This narrative is a great insult to me and caused great harm to my family.”
Liang’s passionate demonstration of his frustration was moving. In the end, the judge ruled that Liang had resigned for two years, which had long passed the period of non-compete agreement. TSMC’s complaint was dismissed. TSMC was hit hard by the ruling, and before it could recover, another shock piled up. The Apple A9 chip order, which TSMC thought was a sure deal, went to Samsung. What scored this order for Samsung was an unmatched technology - the world's first 14nm FinFET process.
Hu Chenming is the inventor of FinFET and led TSMC’s effort to productionize FinFET. FinFET is a type of 3D transistors, a technology TSMC had been working on for nearly a decade. It was widely recognized as the key to unlock processes under 20nm. Hu’s favorite student was no others but Liang Mong-Song. Who could have known that Liang’s switch from TSMC to Samsung enabled the latter to jump ahead. It is just like the old saying – master taught the apprentice, only to be defeated and starved.
Taiwanese media regretfully commented, "the technical advantages of TSMC have been wiped out overnight."
After Apple turned to Samsung, Qualcomm followed suit and handed its latest chip OEM order to Samsung. When clients left, investors started to worry. Credit Suisse, who had been optimistic about TSMC for five consecutive years, gave a negative rating for the first time; Lyon Securities projected that TSMC would lose 80% of Apple's orders and more than $1 billion.
At the shareholders meeting in January 2015, Chang looked into the camera and admitted solemnly, "yes, we are a bit behind."
PART V. Counterattack: 100,000 youths, 100,000 livers
On the date Chang admitted TSMC’s disadvantage, its stock rose by 8%. Investors believed that Morris was very angry and there would be serious consequences. Indeed, TSMC started preparing for a multi-front counterattack.
Stand up where one falls down. So did TSMC. The company assembled an unprecedented R&D team in the industry: the Nightingale Army - a team that worked at night. TSMC learned from Foxconn assembly line and built a three-shift R&D department to ensure 24-hour uninterrupted R&D. Nightingale's salary was much higher than assembly workers or regular R&D personnel - 30% increase in base salary and 50% in dividend.
Attracted by the rewards, before long the Nightingale quickly gathered more than 400 people. Because staying up late harms the liver, the nightingale model is also called "liver buster.” A few sayings started to spread in Taiwan, “100,000 young people, 100,000 livers”, “the tougher the liver, the more money." [z]In 2014, the total annual working hours of Taiwanese laborers was 2135 hours, far exceeding the rest of the world. When Intel was defeated by TSMC’s technology in 2017, some Intel employees went to TSMC to figure out why, and the answer was: you snooze you lose [aa]; you’ve been sleeping too much for too long.
After putting in place the technical charging team, Chang turned to the lawyers and ordered them to "fight to the end". A new judge who had a Korean husband and was familiar with Korean cases replaced the old judge. TSMC’s lawyers also dug up a lot of crucial evidence, including that 10 of Liang’s students were senior engineers at Samsung; Liang started using Samsung's internal mailbox as early as 2009; 7 key features of Samsung’s process were similar to TSMC’s.
In the end, the court ruled that Liang could not return to Samsung until the end of 2015. This was the first ruling in Taiwan that even after an non-compete period ended, an former employee was still prohibited from working for a competitor. TSMC’s victory was a joint effort of Taiwanese commercial, political and legal circles. [ab] Taiwan specifically revised its Trade Secrets Act and included industrial espionage. The judge even declared publicly, “if we don’t protect companies like TSMC, who will we protect?” [ac]
Liang, of course, disagreed with the ruling and appealed to the Supreme Court of Taiwan. The lawsuit dragged on until August 2015 and Liang lost again. At this point, Liang could simply wait four months and then go back to work at Samsung, but another shock was awaiting: as Liang was tied up in the lawsuit, Samsung rushed the production and floundered on A9 chips.
Netizens found that Samsung’s foundry’s iPhone 6s chip lasted 2 hours less than TSMC’s, and the temperature rose by more than 10%. There were even tutorials online showing how to distinguish iPhones with Samsung chips so that buyers could return the product as soon as possible. Although Apple denied the performance difference, it quietly transferred the A9 order from Samsung to TSMC. On the list of foundries after A10, only TSMC was left.
TSMC once again won more than half of the OEM market share, becoming the backbone of Taiwan’s economy, stock market, and even population. TSMC's new foundry consumed a third of Taiwan's newly generated power; its 40,000 plus employees also contributed more than 1% of total newborns in Taiwan in 2019 [ad]. On average, Taiwan had one large foundry per 1,000 square kilometers, making it the leading region in foundry density globally.
In June 2018, the 86-year-old Chang announced his retirement the second time. At his last shareholder meeting, he said affectionately in applause, “TSMC's miracle will never stop!” Chang left the industry with success and achievement, while his old rival Lee Kun-hee had been ill since 2014 and until this date, hasn’t stepped out of Samsung Medical Center. Lee’s eldest son Lee Jae-yong, the de facto head of Samsung was imprisoned for bribery in 2017, and released 6 months later.
In July 2019, Japan cut off the supply of semiconductor materials to Samsung, and Lee Jae-yong had to rush to Taiwan to plead for the purchase of raw materials. This signaled the complete bankruptcy of his plan to “Kill Taiwan.” In December, TSMC's market value briefly surpassed Samsung and became the global semiconductor leader. The showdown since 2004 had gone through four stages and finally, the curtains fell.
However, just as Chang advised, what TSMC won was merely a battle. The entire semiconductor industry has been fighting since the 1970s and why won’t the war continue?
TSMC becomes a totem in Taiwan, which is attributed to Chang’s personal charm, independent R&D strategy, hardworking spirit, and the principles Chang set at the beginning – staying neutral to serve, winning trust from partners.
It is based on trust that not only Apple can hand over orders without worry, but also Huawei may rest assured. All Kirin chips are all manufactured by TSMC, which contributes more than 10% of TSMC’s total orders. In this regard, Taiwanese IT Godfather Shi Zhenrong (???/Stan Shih) once said, “Taiwan is a friend of the world, while Samsung is the enemy of the world.” - meaning that Taiwan is determined to specialize and work for technology companies all over the world, while Samsung wants to expand across the sector. [ae][af][ag]
Of course, TSMC is still too young. When the Pacific’s tectonic plates begin to collide, mainland China and the United States are like two invisible walls. At this time, can TSMC really make friends as it pleases?
Looking back at history, four treasures at the beginning of TSMC were talents from Berkeley, management experiences from Texas Instruments, IBM's technology licensing, and orders from American chip companies. It is Apple and the United States behind it that decided the balance between Samsung and TSMC, and TSMC’s shareholders are mostly on Wall Street, holding 80% of total equity [ah][ai].
Behind TSMC, there is always that sharp knife of Samsung. Although Samsung floundered on the 7nm process and lags behind TSMC by quite a lot, Apple and Qualcomm fear TSMC’s hegemony and will never completely discard Samsung. History of the electronics industry tells us: never underestimate the madness and guts of Koreans.
When there are high walls on both sides and a sharp knife behind the back, can TSMC maintain its commanding heights of the global chip industry? Can TSMC enjoy its neutral position as before? No one can answer that. Morris Chang can no longer come back to the battles, but his old colleagues and subordinates are standing across the strait and jumping into the torrent of history.
At the end of 2016, Chiang Shang-Yi, TSMC's No. 2, knocked on Chang’s office and told him that he would join SMIC as a director. A month later, Chiang’s entire family moved to the mainland.
Six months later, Liang Mong-Song, who resigned from Samsung, accepted an annual salary of only $200,000 and joined SMIC with his team as a co-chief executive. In only three quarters, Liang and his team brought an important 14nm technological breakthrough to SMIC.
In 2019, Yang Guanglei, who was among TSMC’s “Six R&D Gentlemen” during its defeat of IBM, took over Chiang Shang-Yi’s post as an independent board director and moved to mainland China.
Chiang’s next post is the CEO of HSMC in Wuhan. At a summit last year, he commented, "Moore's Law is slowing down. This offers an excellent opportunity to China's mainland semiconductor companies to catch and take over."
In addition to these executives, countless Taiwanese engineers rushed to the mainland and joined in writing history of the chip industry. From HiSilicon in Shenzhen to XMC in Wuhan, from JCET in Jiangyin to CXMT in Hefei, streams of engineers return from overseas, forming a turbulent wave and contributing to the last battle of China’s industries. [aj]
Therefore, TSMC is caught in between the US and China's technology war. It benefits from both sides but also faces a dilemma. It wants to be “neutral” but hard to achieve so. It is after all founded by the descendants of Yan Di and Huang Di [ak] (the Chinese people), and how can it truly be "neutral"?
[a]previous translation had this profile of Zhang Zhongmou: "At about the same time that Zhang Rujing was taken to Taiwan by his parents, 17-year-old Zhang Zhongmou, a native of Ningbo in Zhejiang, also boarded a ship in Shanghai, crammed into a narrow cabin room with his family, and set off for Hong Kong.
After staying in Hong Kong for a few months, Zhang Zhongmou immediately applied to Harvard University, becoming the only Chinese student among the school’s 1,000 or so freshmen. He later transferred to the Massachusetts Institute of Technology and obtained a master's degree. In 1958, Zhang Zhongmou joined Texas Instruments, and worked his way to the company's No. 3 position. Zhang Rujing, who joined Texas Instruments in 1977, nominally had a "colleague" relationship with Zhang Zhongmou for eight years, but contrary to the media hype, their paths did not intersect during this period.
In 1985, Zhang Zhongmou resigned from a high-paying position at Texas Instruments and returned to Taiwan to become president of the Industrial Technology Research Institute of Taiwan. Prior to this, Zhang Zhongmou, who was in his fifties, had never lived in Taiwan for a long time. In 1987, Zhang Zhongmou founded TSMC and received strong support from the government. By the time Zhang Rujing resigned from Texas Instruments and returned to Taiwan, Zhang Zhongmou had already become the industrial hero of Taiwan in the same way as Akio Morita of Japan.
Thanks for the heads up! I was translating it into Morris Chang according to TSMC press release but can change it pretty easily. One thought is maybe at the first mention of him, we can include both names since Morris is the name that's more widely reported and used in western media (forbes, qz, etc.). That way we can make sure people know who he is.
[c]Oh wow I didn't even register that the first time around translating -- yeah Morris Chang is definitely better to use actually
[d]yeah, i think Chinese folks know him as Zhongmou while American know him as Morris. There are few more names like that in this article. And it gets a bit messier when you mix in Taiwanese spelling coz it's different than pinyin.. I will highlight them all once I finish the second part translation and we can streamline them.
[af]Very interesting perception of Samsung's business model. Question: How does that fit to Samsung's plans to ramp up their contract foundry business? taipeitimes.com
[ag]i wonder if this is more of a "we also want to do foundry business" or "we are going to *exclusively* focus on foundry business." as long as samsung spans across the entire supply chain (aka making both chips and phones), the collaborative-competitive dynamic persists.
[ah]This paragraph is confusingly worded. What is the connection between Apple deciding it and TSMC shareholders?
[ai]I changed the "Whereas" to "and" -- I think the connection is just that there's a lot of American capital behind TSMC -- whether it's human capital, investments, chip orders, etc.
[aj]the last block of the industrialization pyramid.
[ak]???? -- Yan Di and Huang Di were two legendary rulers of remote antiquity
$IDEX - GREEN Act: US Democrats to push EV sales cap + E-BIKE ActThere is also legislation on the table to introduce tax credits for electric bicycles.https://www.electrive.com/2021/02/12/green-act-us-democrats-to-push-ev-sales-cap-e-bike-act/
The story of IDEA vs VS Code is a story of low-end disruption, straight from the textbook. There is an emergent competitior that is not yet feature-rich but it is not elephantine yet and its technology has an unique advantage. The key technology here is probably the Language Server Protocol; it offloads programming language related smarts directly to the compiler (typically), thus relieving the IDE from supporting a hundred programming languages with all their warts and twists. As the compiler is the ultimate authority on the matters of language, that seems like a wise move: reusing instead of reimplementing. The other technology is Electron and, by extension, a browser engine based GUI. Given how much money was put into browsers, that is probably the most mature multiplatform GUI framework in existence. That situation may catch the IDEA family between a rock and a hard place: it has to outdo compilers on one side and browser engines on the other side. Specifically, in their primary line of business. Which may easily become an unwinnable bet. So far it looked like textbook low-end disruption, except for one thing: the disruptor is an old fat incumbent while the disrupted party is a (relatively) young emergent competitor. That is the funny part. Clearly, the MS leadership has read the book. What bothers me about JetBrains is the fact that they seem to be playing along. Judging by the talks on their recent conference, IDEA goes more elephantine, more corporate, more complex. That is easy to understand: a corporate user pays a lot. But, that is the classic low-end disrupted trajectory that leads to the very same corner where IBM DB2 sits. They will retain some deep pocketed customers, but all the interesting things will be happening somewhere else. In this regard I will mention two announcements: the CodeWithMe technology and a lightweight "IDEA viewer" product. CodeWithMe allows one IDEA instance to run another instance remotely to allow for collaborative work. Even the color scheme and the shortcuts are provided by the master instance, if I understood that correctly. That is very Java and very corporate, no doubt. The IDEA viewer is another step in the same direction: let the IDE run somewhere else, maybe on a server, and let people use a lightweight client "on a laptop in a cafe". I seriously doubt it will work well in an average cafe though. Their WiFi tends to be unreliable. Given this degree of dependence of the client machine, spotty WiFi will cause a developer to smash his laptop against the coffee machine. Then, the developer will have to pay for both. I experienced similar issues with JetBrains YouTrack in the past: it was an awesome issue tracker, unless you use a poor connection or the server is too far away. Then, it was unnerving. So I assume, that will be stress-free on a wired connection only. Whether the developer is using a laptop or a workstation is of secondary importance then: it is wired. Still, that feature fits some corporate behavior patterns, so it may entrench IDEA better in that environment. In addition to the rock and the hard place, IDEA seems to be caught in one more aspect. GitHub is now a MS property, so in the future VS Code is expected to be more and more integrated into GitHub. That pretty much catches all the young and hip audience in a MS owned behavior loop. Which certainly shows some very competent strategic planning. Meanwhile, JetBrains seems to be staying afloat mostly thanks to hard work and sheer luck.
Artificial Neural Networks (ANNs) are based around the backpropagation algorithm. The backpropagation algorithm allows you to perform gradient descent on a network of neurons. When we feed training data through an ANNs, we use the backpropagation algorithm to tell us how the weights should change.
ANNs are good at inference problems. Biological Neural Networks (BNNs) are good at inference too. ANNs are built out of neurons. BNNs are built out of neurons too. It makes intuitive sense that ANNs and BNNs might be running similar algorithms.
There is just one problem: BNNs are physically incapable of running the backpropagation algorithm.
We do not know quite enough about biology to say it is impossible for BNNs to run the backpropagation algorithm. However, "a consensus has emerged that the brain cannot directly implement backprop, since to do so would require biologically implausible connection rules" .
The backpropagation algorithm has three steps.
Flow information forward through a network to compute a prediction.
Compute an error by comparing the prediction to a target value.
Flow the error backward through the network to update the weights.
The backpropagation algorithm requires information to flow forward and backward along the network. But biological neurons are one-directional. An action potential goes from the cell body down the axon to the axon terminals to another cell's dendrites. An axon potential never travels backward from a cell's terminals to its body.
Hebbian theory Predictive coding is the idea that BNNs generate a mental model of their environment and then transmit only the information that deviates from this model. Predictive coding considers error and surprise to be the same thing. Hebbian theory is specific mathematical formulation of predictive coding.
Predictive coding is biologically plausible. It operates locally. There are no separate prediction and training phases which must be synchronized. Most importantly, it lets you train a neural network without sending axon potentials backwards.
Predictive coding is easier to implement in hardware. It is locally-defined; it parallelizes better than backpropagation; it continues to function when you cut its substrate in half. (Corpus callosotomy is used to treat epilepsy.) Digital computers break when you cut them in half. Predictive coding is something evolution could plausibly invent.
Unification The paper Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs[1:1] "demonstrate[s] that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules." The authors have unified predictive coding and backpropagation into a single theory of neural networks. Predictive coding and backpropagation are separate hardware implementations of what is ultimately the same algorithm.
There are two big implications of this.
This paper permanently fuses artificial intelligence and neuroscience into a single mathematical field.
This paper opens up possibilities for neuromorphic computing hardware.
All C++20 core language features with examples Apr 2, 2021
Introduction The story behind this article is very simple, I wanted to learn about new C++20 language features and to have a brief summary for all of them on a single page. So, I decided to read all proposals and create this “cheat sheet” that explains and demonstrates each feature. This is not a “best practices” kind of article, it serves only demonstrational purpose. Most examples were inspired or directly taken from corresponding proposals, all credit goes to their authors and to members of ISO C++ committee for their work. Enjoy!
“Most mathematicians prove what they can, von Neumann proves what he wants”
It is indeed supremely difficult to effectively refute the claim that John von Neumann is likely the most intelligent person who has ever lived. By the time of his death in 1957 at the modest age of 53, the Hungarian polymath had not only revolutionized several subfields of mathematics and physics but also made foundational contributions to pure economics and statistics and taken key parts in the invention of the atomic bomb, nuclear energy and digital computing.
Known now as “the last representative of the great mathematicians”, von Neumann’s genius was legendary even in his own lifetime. The sheer breadth of stories and anecdotes about his brilliance, from Nobel Prize-winning physicists to world-class mathematicians abound:
”You know, Herb, Johnny can do calculations in his head ten times as fast as I can. And I can do them ten times as fast as you can, so you can see how impressive Johnny is” — Enrico Fermi (Nobel Prize in Physics, 1938)
“One had the impression of a perfect instrument whose gears were machined to mesh accurately to a thousandth of an inch.” — Eugene Wigner (Nobel Prize in Physics, 1963)
“I have sometimes wondered whether a brain like von Neumann’s does not indicate a species superior to that of man” — Hans Bethe (Nobel Prize in Physics, 1967)
An émigré to America in 1933, von Neumann’s life was one famously dedicated to cognitive and creative pursuits, but also the enjoyments of life. Twice married and wealthy, he loved expensive clothes, hard liquor, fast cars and dirty jokes, according to his friend Stanislaw Ulam. Almost involuntarily, his posthumous biographer Norman Macrae recounts, people took a liking to von Neumann, even those who disagreed with his conservative politics (Regis, 1992).
This essay aims to highlight some of the unbelievable feats of “Johnny” von Neumann’s mind. Happy reading!
Early years (1903–1921)Neumann János Lajos (John Louis Neumann in English) was born (or “ arrived”) on December 28th 1903 in Budapest, Hungary. Born to wealthy non-observant Jewish bankers, his upbringing can be described as privileged. His father held a doctorate in law, and he grew up in an 18-room apartment on the top floor above the Kann-Heller offices at 62 Bajcsy-Zsilinszky Street in Budapest (Macrae, 1992).
John von Neumann at age 7 (1910)Child prodigy“Johnny” von Neumann was a child prodigy. Even from a young age, there were strange stories of little John Louis’ abilities: dividing two eight-digit numbers in his head and conversing in Ancient Greek at age six (Henderson, 2007), proficient in calculus at age eight (Nasar, 1998) and reading Emile Borel’s Théorie des Fonctions (“On some points in the theory of functions” ) at age twelve (Leonard, 2010). Reportedly, von Neumann possessed an eidetic memory, and so was able to recall complete novels and pages of the phone directory on command. This enabled him to accumulate an almost encyclopedic knowledge of what ever he read, such as the history of the Peloponnesian Wars, the Trial Joan of Arc and Byzantine history (Leonard, 2010). A Princeton professor of the latter topic once stated that by the time he was in his thirties, Johnny had greater expertise in Byzantine history than he did (Blair, 1957).
Left: John von Neumann at age 11 (1915) with his cousin Katalin Alcsuti. (Photo: Nicholas Vonneumann). Right: The Neumann brothers Miklós (1911–2011), Mihály (1907–1989) and János Lajos (1903–1957)
"One of his remarkable abilities was his power of absolute recall. As far as I could tell, von Neumann was able on once reading a book or article to quote it back verbatim; moreover, he could do it years later without hesitation. He could also translate it at no diminution in speed from its original language into English. On one occasion I tested his ability by asking him to tell me how A Tale of Two Cities started. Whereupon, without any pause, he immediately began to recite the first chapter and continued until asked to stop after about ten or fifteen minutes."Excerpt, The Computer from Pascal to von Neumann by Herman Goldstein (1980)
An unconventional parent, von Neumann’s father Max would reportedly bring his workaday banking decisions home to the family and ask his children how they would have reacted to particular investment possibilities and balance-sheet risks (Macrae, 1992). He was home-schooled until 1914, as was the custom in Hungary at the time. Starting at the age of 11, he was enrolled in the German-speaking Lutheran Gymnasium in Budapest. He would attend the high school until 1921, famously overlapping the high school years of three other “ Martians” of Hungary:
Leo Szilard (att. 1908–16 at Real Gymnasium), the physicist who conceived of the nuclear chain reaction and in late 1939 wrote the famous Einstein-Szilard letter for Franklin D. Roosevelt that resulted in the formation of the Manhattan Project that built the first atomic bomb Eugene Wigner (att. 1913–21 at Lutheran Gymnasium), the 1963 Nobel Prize laureate in Physics who worked on the Manhattan Project, including the theory of the atomic nucleus, elementary particles and Wigner’s Theorem in quantum mechanics Edward Teller (att. 1918–26 at Minta School), the “father of the hydrogen bomb”, an early member of the Manhattan Project and contributor to nuclear and molecular physics, spectroscopy and surface physics
Although all of similar ages and interests, as Macrae (1992) writes:
"The four Budapesters were as different as four men from similar backgrounds could be. They resembled one another only in the power of the intellects and in the nature of their professional careers. Wigner [...] is shy, painfully modest, quiet. Teller, after a lifetime of successful controversy, is emotional, extroverted and not one to hide his candle. Szilard was passionate, oblique, engagé, and infuriating. Johnny [...] was none of these. Johnny's most usual motivation was to try to make the next minute the most productive one for whatever intellectual business he had in mind."- Excerpt, John von Neumann by Norman Macrae (1992)
Yet still, the four would work together off and on as they all emigrated to America and got involved in the Manhattan Project.
By the time von Neumann enrolled in university in 1921, he had already written a paper with one of his tutors, Mikhail Fekete on “A generalization of Fejér’s theorem on the location of the roots of a certain kind of polynomial” (Ulam, 1958). Fekete had along with Laszló Rátz reportedly taken a notice to von Neumann and begun tutoring him in university-level mathematics. According to Ulam, even at the age of 18, von Neumann was already recognized as a full-fledged mathematician. Of an early set theory paper written by a 16 year old von Neumann, Abraham Fraenkel (of Zermelo-Fraenkel set theory fame) himself later stated (Ulam, 1958):
Letter from Abraham Fraenkel to Stanislaw Ulam Around 1922-23, being then professor at Marburg University, I received from Professor Erhard Schmidt, Berlin [...] a long manuscript of an author unknown to me, Johann von Neumann, with the title Die Axiomatisierung der Mengerlehre, this being his eventual doctor dissertation which appeared in the Zeitschrift only in 1928 [...] I asked to express my view since it seemed incomprehensible. I don't maintain that I understood anything, but enough to see that this was an outstanding work, and to recognize ex ungue leonem [the claws of the lion]. While answering in this sense, I invited the young scholar to visit me in Marburg, and discussed things with him, strongly advising him to prepare the ground for the understanding of so technical an essay by a more informal essay which could stress the new access to the problem and its fundamental consequences. He wrote such an essay under the title Eine Axiomatisierung der Mengerlehre and I published it in 1925.
In University (1921–1926)As Macrae (1992) writes, there was never much doubt that Johnny would one day be attending university. Johnny’s father, Max, initially wanted him to follow in his footsteps and become a well-paid financier, worrying about the financial stability of a career in mathematics. However, with the help of the encouragement from Hungarian mathematicians such as Lipót Fejér and Rudolf Ortvay, his father eventually acquiesced and decided to let von Neumann pursue his passions, financing his studies abroad.
Johnny, apparently in agreement with his father, decided initially to pursue a career in chemical engineering. As he didn’t have any knowledge of chemistry, it was arranged that he could take a two-year non-degree course in chemistry at the University of Berlin. He did, from 1921 to 1923, afterwards sitting for and passing the entrance exam to the prestigious ETH Zurich. Still interested in pursuing mathematics, he also simultaneously entered University Pázmány Péter (now Eötvös Loránd University) in Budapest as a Ph.D. candidate in mathematics. His Ph.D. thesis, officially written under the supervision of Fejér, regarded the axiomatization of Cantor’s set theory. As he was officially in Berlin studying chemistry, he completed his Ph.D. largely in absentia, only appearing at the University in Budapest at the end of each term for exams. While in Berlin, he collaborated with Erhard Schmidt on set theory and also attended courses in physics, including statistical mechanics taught by Albert Einstein. At ETH, starting in 1923, he continued both his studies in chemistry and his research in mathematics.
“Evidently, a Ph.D. thesis and examinations did not constitute an appreciable effort” — Eugene Wigner
Two portraits of John von Neumann (1920s)In mathematics, he first studied Hilbert’s theory of consistency with German mathematician Hermann Weyl. He eventually graduated both as a chemical engineer from ETH and with Ph.D. in mathematics, summa cum laude from the University of Budapestin 1926 at 24 years old.
“There was a seminar for advanced students in Zürich that I was teaching and von Neumann was in the class. I came to a certain theorem, and I said it is not proved and it may be difficult. von Neumann didn’t say anything but after five minutes he raised his hand. When I called on him he went to the blackboard and proceeded to write down the proof. After that I was afraid of von Neumann” — George Pólya
From von Neumann’s Fellowship application to the International Education Board (1926)His application to the Rockefeller-financedInternational Education Board (above) for a six-month fellowship to continue his research at the University of Göttingen mentions Hungarian, German, English, French and Italian as spoken languages, and was accompanied by letters of recommendation from Richard Courant, Hermann Weyl and David Hilbert, three of the world’s foremost mathematicians at the time (Leonard, 2010).
In Göttingen (1926–1930)
The Auditorium Maximum at the University of Göttingen, 1935Johnny traveled to Göttingen in the fall of 1926 to continue his work in mathematics under David Hilbert, likely the world’s foremost mathematician of that time. Reportedly, according to Leonard (2010), von Neumann was initially attracted to Hilbert’s stance in the debate over so-called metamathematics, also known as formalism and that this is what drove him to study under Hilbert. In particular, in his fellowship application, he wrote of his wish to conduct (Leonard, 2010)
"Research over the bases of mathematics and of the general theory of sets, especially Hilbert's theory of uncontradictoriness [...], [investigations which] have the purpose of clearing up the nature of antinomies of the general theory of sets, and thereby to securely establish the classical foundations of mathematics. Such research render it possible to explain critically the doubts which have arisen in mathematics"
Very much both in the vein and language of Hilbert, von Neumann was likely referring to the fundamental questions posed by Georg Cantor regarding the nature of infinite sets starting in the 1880s. von Neumann, along with Wilhelm Ackermann and Paul Bernays would eventually become Hilbert’s key assistants in the elaboration of his Entscheidungsproblem (“decision problem”) initiated in 1918. By the time he arrived in Göttingen, von Neumann was already well acquainted with the topic, in addition to his Ph.D. dissertation having already published two related papers while at ETH.
Set theoryJohn von Neumann wrote a cluster of papers on set theory and logic while in his twenties:
von Neumann (1923). His first set theory paper is entitledZur Einführung der transfiniten Zahlen (“On the introduction of transfinite numbers”) and regards Cantor’s 1897 definition of ordinal numbers as order types of well-ordered sets. In the paper, von Neumann introduces a new theory of ordinal numbers, which regards an ordinal as the set of the preceding ordinals (Van Heijenoort, 1970).von Neumann (1925). His second set theory paper is entitled Eine Axiomatisierung der Mengenlehre (“An axiomatization of set theory”). It is the first paper that introduces what would later be known as the von Neumann-Bernays-Gödel set theory (NBG) and includes the first introduction of the concept of a class, defined using the primitive notions of functions and arguments. In the paper, von Neumann takes a stance in the foundations of mathematics debate, objecting to Brouwer and Weyl’s willingness to ‘sacrifice much of mathematics and set theory’, and logicists’ ‘attempts to build mathematics on the axiom of reducibility’. Instead, he argued for the axiomatic approach of Zermelo and Fraenkel, which, in von Neumann’s view, replaced vagueness with rigor (Leonard, 2010).von Neumann (1926). His third paper Az általános halmazelmélet axiomatikus felépitése, his doctoral dissertation, which contains the main points which would be published in German for the first time in his fifth paper.von Neumann (1928). In his fourth set theory paper, entitled Die Axiomatisierung der Mengenlehre (“The Axiomatization of Set Theory”), von Neumann formally lays out his own axiomatic system. With its single page of axioms, it was the most succinct set theory axioms developed at the time, and formed the basis for the system later developed by Gödel and Berneys.von Neumann (1928). His fifth paper on set theory, “Über die Definition durch transfinite Induktion und verwandte Fragen der allgemeinen Mengenlehre” (“On the Definition by Transfinite Induction and related questions of General Set Theory”) proves the possibility of definition by transfinite induction. That is, in the paper von Neumann demonstrates the significance of axioms for the elimination of the paradoxes of set theory, proving that a set does not lead to contradictions if and only if its cardinality is not the same as the cardinality of all sets, which implies the axiom of choice (Leonard, 2010).von Neumann (1929). In his sixth set theory paper, Über eine Widerspruchsfreiheitsfrage in der axiomatischen Mengenlehre, von Neumann discusses the questions of relative consistency in set theory (Van Heijenoort, 1970).
Summarized, von Neumann’s main contribution to set theory is what would become the von Neumann-Bernays-Gödel set theory (NBG), an axiomatic set theory that is considered a conservative extension of the accepted Zermelo-Fraenkel set theory (ZFC). It introduced the notion of class (a collection of sets defined by a formula whose quantifiers range only over sets) and can define classes that are larger than sets, such as the class of all sets and the class of all ordinal numbers.
Left: John von Neumann in the 1920s. Right: von Neumann, J (1923). Zur Einführung der transfiniten Zahlen (“On the introduction of transfinite numbers”). Acta Litterarum ac Scientiarum Regiae Universitatis Hungaricae Francisco-Josephinae, sectio scientiarum mathematicarum, 1, pp. 199–208. Inspired by the works of Georg Cantor, Ernst Zermelo’s 1908 axioms for set theory and the 1922 critiques of Zermelo’s set theory by Fraenkel and Skolem, von Neumann’s work provided solutions to some of the problems of Zermelo set theory, leading to the eventual development of Zermelo-Fraenkel set theory (ZFC). The problems he helped resolve include:
The problem of developing Cantor’s theory of ordinal numbers in Zermelo set theory. von Neumann redefined ordinals using sets that are well-ordered using the so-called ?-relation.The problem of finding a criterion identifying classes that are too large to be sets. von Neumann introduced the criterion that a class is too large to be a set if and only if it can be mapped onto the class of all sets.Zermelo’s somewhat imprecise concept of a ‘definite propositional function’ in his axiom of separation. von Neumann formalized the concept with his functions, whose construction requires only finitely many axioms.The problem of Zermelo’s foundations of the empty set and an infinite set, and iterating the axioms of pairing, union, power set, separation and choice to generate new sets. Fraenkel introduced an axiom to exclude sets. von Neumann revised Fraenkel’s formulation in his axiom of regularity to exclude non-well-founded sets.
Of course, following the critiques and further revisions of Zermelo’s set theory by Fraenkel, Skolem, Hilbert and von Neumann, a young mathematician by the name of Kurt Gödel in 1930 published a paper which would effectively end von Neumann’s efforts in formalist set theory, and indeed Hilbert’s formalist program altogether, his theorem of incompleteness. von Neumann happened to be in the audience when Gödel first presented it:
"At a mathematical conference preceding Hilbert's address, a quiet, obscure young man, Kurt Gödel, only a year beyond his PhD, announced a result which would forever change the foundations of mathematics. He formalized the liar paradox, "This statement is false" to prove roughly that for any effectively axiomatized consistent extension T of number theory (Peano arithmetic) there is a sentence s which asserts its own unprovability in T.John von Neumann, who was in the audience immediately understood the importance of Gödel's incompleteness theorem. He was at the conference representing Hilbert's proof theory program and recognized that Hilbert's program was over.In the next few weeks von Neumann realized that by arithmetizing the proof of Gödel's first theorem, one could prove an even better one, that no such formal system T could prove its own consistency. A few weeks later he brought his proof to Gödel, who thanked him and informed him politely that he had already submitted the second incompleteness theorem for publication."- Excerpt, Computability. Turing, Gödel, Church and Beyond by Copeland et al. (2015)
One of Gödel’s lifelong supporters, von Neumann later stated that
“Gödel is absolutely irreplaceable. In a class by himself.”
By the end of 1927, von Neumann had published twelve major papers in mathematics. His habilitation (qualification to conduct independent university teaching) was completed in December of 1927, and he began lecturing as a Privatdozent at the University of Berlin in 1928 at the age of 25, the youngest Privatdozent ever elected in the university’s history in any subject.
"By the middle of 1927 it was clearly desirable for the young eagle Johnny to soar from Hilbert's nest. Johnny had spent his undergraduate years explaining what Hilbert had got magnificently right but was now into his postgraduate years where had to explain what Hilbert had got wrong"- Excerpt, John von Neumann by Norman Macrae (1992)
Game theoryAround the same time he was making contributions to set theory, von Neumann also proved a theorem known as the minimax theorem for zero-sum games, which would later lay the foundation for the new field of game theory as a mathematical discipline. The minimax theorem may be summarized as follows:
The Minimax Theorem (von Neumann, 1928) The minimax theorem provides the conditions that guarantee that the max-min inequality is also an equality, i.e. that every finite, zero-sum, two-person game has optimal mixed strategies.
The proof was published in Zur Theorie der Gesellschaftsspiele (“On the Theory of Games of Strategy”) in 1928. In collaboration with economist Oskar Morgenstern, von Neumann later published the definitive book on such cooperative, zero-sum games, Theory of Games and Economic Behavior (1944).
Left: von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele (“On the Theory of Games of Strategy”). Right: First edition copy of Theory of Games and Economic Behavior (1944) by John von Neumann and Oskar Morgenstern (Photo: Whitmore Rare Books). By the end of 1929, von Neumann’s number of published major papers had risen to 32, averaging almost one major paper per month. In 1929 he briefly became a Privatdozent at the University of Hamburg, where he found the prospects of becoming a professor to be better.
Quantum mechanicsIn a shortlist von Neumann himself submitted to the National Academy of Sciences later in his life, he listed his work on quantum mechanics in Göttingen (1926) and Berlin (1927–29) as the “most essential”. The term quantum mechanics, largely devised by Göttingen’s own twenty-three year old wunderkind Werner Heisenberg the year before was still hotly debated, and in the same year von Neumann arrived, Erwin Schrödinger, then working from Switzerland, had rejected Heisenberg’s formulation as completely wrong (Macrae, 1992). As the story goes:
"In Johnny's early weeks at Göttingen in 1926, Heisenberg lectured on the difference between his and Schrödinger's theories. The aging Hilbert, professor of mathematics, asked his physics assistant, Lothar Nordheim, what on earth this young man Heisenberg was talking about. Nordheim sent to the professor a paper that Hilbert still did not understand. To quote Nordheim himself, as recorded in Heims's book: "When von Neumann saw this, he cast it in a few days into elegant axiomatic form, much to the liking of Hilbert." To Hilbert's delight, Johnny's mathematical exposition made much use of Hilbert's own concept of Hilbert space."- Excerpt, John von Neumann by Norman Macrae (1992)
Starting with the incident above, in the following years, von Neumann published a set of papers which would establish a rigorous mathematical framework for quantum mechanics, now known as the Dirac-von Neumann axioms. As Van Hove (1958) writes,
"By the time von Neumann started his investigations on the formal framework of quantum mechanics this theory was known in two different mathematical formulations: the "matrix mechanics" of Heisenberg, Born and Jordan, and the "wave mechanics" of Schrödinger. The mathematical equivalence of these formulations had been established by Schrödinger, and they had both been embedded as special cases in a general formalism, often called "transformation theory", developed by Dirac and Jordan.This formalism, however, was rather clumsy and it was hampered by its reliance upon ill-defined mathematical objects, the famous delta-functions of Dirac and their derivatives. [..] [von Neumann] soon realized that a much more natural framework was provided by the abstract, axiomatic theory of Hilbert spaces and their linear operators."- Excerpt, Von Neumann's Contributions to Quantum Theory by Léon Van Hove (1958)
In the period from 1927–31, von Neumann published five highly influential papers relating to quantum mechanics:
von Neumann (1927). Mathematische Begründung der Quantenmechanik (“Mathematical Foundation of Quantum Mechanics”) in Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse pp. 1–57.von Neumann (1927). Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik (“Probabilistic Theory of Quantum Mechanics”) in Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse pp. 245–272.von Neumann (1927). Thermodynamik quantenmechanischer Gesamtheiten (“Thermodynamics of Quantum Mechanical Quantities”) in Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse. pp. 273–291.von Neumann (1930). Allgemeine Eigenwerttheorie Hermitescher Funktionaloperatoren (“General Eigenvalue Theory of Hermitian Functional Operators”) in Mathematische Annalen 102 (1) pp 49–131.von Neumann (1931). Die Eindeutigkeit der Schrödingerschen Operatoren (“The uniqueness of Schrödinger operators”) in Mathematische Annalsen 104 pp 570–578.
His basic insight, which neither Heisenberg, Bohr or Schrödinger had, in the words of Paul Halmos was “that the geometry of the vectors in a Hilbert space as the same formal properties as the structure of the states of a quantum mechanical system” (Macrae, 1992). That is, von Neumann realized that a state of a quantum system could be represented by the point of a complex Hilbert space, that in general, could be infinite-dimensional even for a single particle. In such a formal view of quantum mechanics, observable quantities such as position or momentum are represented as linear operators acting on the Hilbert space associated with the quantum system (Macrae, 1992). The uncertainty principle, for instance, in von Neumann’s system is translated into the non-commutativity of two corresponding operators.
Summarized, von Neumann’s contributions to quantum mechanics can be said to broadly be two-fold, consisting of:
The mathematical framework of quantum theory, where states of the physical system are described by Hilbert space vectors and measurable quantities (such as position, momentum and energy) by unbounded hermitian operators acting upon them; andThe statistical aspects of quantum theory. In the course of his formulation of quantum mechanics in terms of vectors and operators in Hilbert spaces, von Neumann also gave the basic rule for how the theory should be understood statistically (Van Hove, 1958). That is, as the result of the measurement of a given physical quantity on a system in a given quantum state, its probability distribution should be expressed by means of a vector representing the state and the spectral resolution of the operator representing the physical quantity.
First edition copy of Mathematische Grundlagen der Quantenmechanik (1932) by John von NeumannHis work on quantum mechanics was eventually collected in the highly influential 1932 book Mathematische Grundlagen der Quantenmechanik (“Mathematical Foundations for Quantum Mechanics”), considered the first rigorous and complete mathematical formulation of quantum mechanics.
Quantum mechanics was very fortunate indeed to attract, in the very first years after its discovery in 1925, the interest of a mathematical genius of von Neumann’s stature. As a result, the mathematical framework of the theory was developed and the formal aspects of its entirely novel rules of interpretation were analysed by one single man in two years (1927–1929). — Van Hove (1958)
Operator theoryFollowing his work in set theory and quantum mechanics, while still in Berlin, von Neumann next turned his attention to algebra, in particular operator theory which concerns the study of linear operators on function spaces. The most trivial examples are the differential and integral operators we all remember from calculus. von Neumann introduced the study of rings of operators through his invention of what is now known as von Neumann algebras, defined as
Definition of a von Neumann algebra A von Neumann algebra is a *-algebra of bounded operators on a Hilbert space that is closed in the weak operator topology and contains the identify operator
The work was published in the paper Zur Algebra der Funktionaloperationen und Theorie der normalen Operatoren (“On the Algebra of Functional Operations and Theory of Normal Operators”) in 1930.
In AmericaJohn von Neumann first travelled to America while still a Privatdozent at the University of Hamburg in October 1929 when he was invited to lecture on quantum theory at Princeton University. The visit led to an invitation to return as a visiting professor, which he did in the years 1930–33. The same year this tenure finished, Adolf Hitler first came to power in Germany, leading von Neumann to abandon his academic posts in Europe altogether, stating about the Nazi regime that
“If these boys continue for two more years, they will ruin German science for a generation — at least”
By most accounts, of course, von Neumann’s prediction turned out true. The following year, when asked by the Nazi minister of education “How mathematics is going at Göttingen, now that it is free from the Jewish influence?” Hilbert is said to have replied:
“There is no mathematics in Göttingen anymore.”
At Princeton University (1930–1933)The circumstances under which von Neumann (and a plethora of other first-rate mathematicians and physicists) would find themselves in Princeton, New Jersey in the mid-1930s is by now well known.
In the case of von Neumann in particular, he was recruited alongside his Lutheran high school contemporary Eugene Wigner by Princeton University professor Oswald Veblen, on a recommendation from Princeton, according to Wigner (Macrae, 1992) to:
"..invite not a single person but at least two, who already knew each other, who wouldn't suddenly feel put on an island where they had no intimate contact with anybody. Johnny's name was of course well known by that time the world over, so they decided to invite Johnny von Neumann. They looked: who wrote articles with John von Neumann? They found: Mr. Wigner. So they sent a telegram to me also."- Excerpt, John von Neumann by Norman Macrae (1992)
And so von Neumann first came to Princeton in 1930 as a visiting professor. Regarding his work while there, von Neumann himself later in life especially highlighted his work on ergodic theory.
Ergodic theoryErgodic theory is the branch of mathematics that studies the statistical properties of deterministic dynamical systems. Formally, ergodic theory is concerned with the states of dynamical systems with an invariant measure. Informally, think of how the planets move according to Newtonian mechanics in a solar system: the planets move but the rule governing the planets’ motion remains constant. In two papers published in 1932, von Neumann made foundational contributions to the theory of such systems, including the von Neumann’s mean ergodic theorem, considered the first rigorous mathematical basis for the statistical mechanics of liquids and gases. The two papers are titled Proof of the Quasi-ergodic Hypothesis (1932) and Physical Applications of the Ergodic Hypothesis (1932).
A subfield of measure theory, ergodic theory in other words concerns the behavior of dynamical systems which are allowed to run for a long time. von Neumann’s ergodic theorem is one of the two most important theorems in the field, the other being by Birkhoff (1931). According to Halmos (1958)
"The profound insight to be gained from [von Neumann's] paper is that the whole problem is essentially group-theoretic in character, and that, in particular, for the solvability of the problem of measure the ordinary algebraic concept of solvability of a group is relevant. Thus, according to von Neumann, it is the change of group that makes a difference, not the change of space; replacing the group of rigid motions by other perfectly reasonable groups we can produce unsolvable problems in R2 and solvable ones in R3."- Excerpt, Von Neumann on Measure and Ergodic Theory by Paul R. Halmos (1958)
“If von Neumann had never done anything else, they would have been sufficient to guarantee him mathematical immortality” — Paul Halmos (1958)
At the Institute for Advanced StudyFollowing his three-year stay as a visiting professor at Princeton in the period 1930–33, von Neumann was offered a lifetime professorship on the faculty of the Institute for Advanced Study (IAS) in 1933. He was 30 years old. The offer came after the the institute’s plan to appoint von Neumann’s former professor Hermann Weyl fell through (Macrae, 1992). Having only been founded three years prior, von Neumann became one of the IAS’ first six professors, the others being J. W. Alexander, Albert Einstein, Marston Morse, Oswald Veblen and eventually, Hermann Weyl.
Institute for Advanced Study in Princeton, New Jersey (Photo: Cliff Compton)When he joined in 1933, the Institute was still located in the math department of Princeton University’s Fine Hall. Founded in 1930 by Abraham Flexner and funded by philanthropy money from Louis Bamberger and Caroline Bamberger Fuld, the Institute for Advanced Study was and is still a university unlike any other. Inspired by Flexner’s experiences at Heidelberg University, All Souls College, Oxford and the Collège de France, the IAS has been described as
“ A first-rate research institution with no teachers, no students, no classes, but only researchers protected from the vicissitudes and pressures of the outside world.” — Sylvia Nasar (1998)
In 1939 moved to its own campus and common room Fuld Hall, the Institute for Advanced Study in a matter of a few years in the early 1930s effectively inherited the University of Göttingen’s throne as the foremost center of the mathematical universe. The dramatic and swift change has since become known as the “Great Purge” of 1933, as a number of top rate academics fled Europe, fearing for their safety. Among them, in addition to von Neumann and Wigner, of course was Einstein (1933), Max Born (1933), fellow Budapestians Leó Szilárd (1938) and Edward Teller (1933), as well as Edmund Landau (1927), James Franck (1933) and Richard Courant (1933), among others.
Left: Photo of part of the faculty at the Institute for Advanced Study, including its most famous resident Albert Einstein, and John von Neumann, visible in the background. Right: Julian Bigelow, Herman Goldstine, J. Robert Oppenheimer and John von Neumann in front of MANIAC, the Institute for Advanced Study computer. GeometryWhile at the Institute for Advanced Study, von Neumann founded the field of continuous geometry, an analogue of complex projective geometry where instead of a dimension of a subspace being in a discrete set 0, 1, …, n, it can be an element of the unit interval [0,1].
A continuous geometry is a lattice L with the following properties:- L is modular - L is complete - The lattice operations satisfy a continuity property - Every element in L has a complement - L is irreducible, meaning the only elements with unique complements are 0 and 1
As with his result in ergodic theory, von Neumann published two papers on continuous geometry, one proving its existence and discussing its properties, and one providing examples:
von Neumann (1936). Continuous geometry. Proceedings of the National Academy of Sciences 22 (2) pp. 92–100.von Neumann (1936). Examples of continuous geometries. Proceedings of the National Academy of Sciences 22 (2) pp. 101–108;
The Manhattan Project (1937–1945)In addition to his academic pursuits, beginning in the mid to late 30s, von Neumann developed an expertise in the science of explosions, phenomena which are very hard to model mathematically. In particular, von Neumann became a leading authority on the mathematics of shaped charges, explosive charges shaped to focus the effect of the energy of an explosive.
By 1937, according to Macrae, von Neumann had decided for himself that war was clearly coming. Although obviously suited for advanced strategic and operations work, humbly he instead applied to become a lieutenant in the reserve of the ordnance department of the U.S.Army. As a member of the Officers’s Reserve Corps, this would mean that he could get trouble-free access to various sorts of explosion statistics, which he thought would be fascinating (Macrae, 1992).
Left: The photo from von Neumann’s Los Alamos ID badge. Right: John von Neumann talking with Richard Feynman and Stanislaw Ulam in Los Alamos (Photo: ) Needless to say, von Neumann‘s main contributions to the atomic bomb would not be as a lieutenant in the reserve of the ordnance department, but rather in the concept and design of the explosive lenses that were needed to compress the plutonium core of the Fat Man weapon that was later dropped on Nagasaki.
A member of the Manhattan Project in Los Alamos, New Mexico, von Neumann in 1944 showed that the pressure increase from explosion shock wave reflections from solid objects was greater than previously believed, depending on the angle of its incidence. The discovery led to the decision to detonate atomic bombs some kilometers above the target, rather than on impact (Macrae, 1992). von Neumann was present during the first Trinity test on July 16th, 1945 in the Nevada desert as the first atomic bomb test ever successfully detonated.
Work on philosophy
von Neumann speaking at the American Philosophical Society in 1957. Photo: Alfred EisenstaedtMacrae (1992) makes the point that in addition to being one of the foremost mathematicians in his lifetime, in many ways, von Neumann should perhaps also be considered one of his era’s most important philosophers. Professor of philosophy John Dorling at the University of Amsterdam, in particular, highlights in particular von Neumann’s contributions to the philosophy of mathematics (including set theory, number theory and Hilbert spaces), physics (especially quantum theory), economics (game theory), biology (cellular automata), computers and artificial intelligence.
His work on the latter two, computers and artificial intelligence (AI) occurred first while he was in Princeton in the mid 1930s meeting with the 24 year old Alan Turing first when the latter spent a year at the IAS in 1936–37. Turing began his career by working in the same field as von Neumann had — on working on set theory, logic and Hilbert’s Entscheidungsproblem. When he finished his Ph.D at Princeton in 1938, Turing had extended the work of von Neumann and Gödel and introduced ordinal logic and the notion of relative computing, augmenting his previously devised Turing machines with so-called oracle machines, allowing the study of problems that lay beyond the capabilities of Turing machines. Although inquired about by von Neumann for a position as a postdoctoral research assistant following his Ph.D., Turing declined and instead travelled back to England.(Macrae, 1992).
Work on computing
"After having been here for a month, I was talking to von Neumann about various kinds of inductive processes and evolutionary processes, and just as an aside he said, "Of course that's what Turing was talking about." And I said, "Who's Turing?". And he said, "Go look up Proceedings of the London Mathematical Society, 1937".The fact that there is a universal machine to imitate all other machines ... was understood by von Neumann and few other people. And when he understood it, then he knew what we could do." - Julian Bigelow"- Excerpt, Turing's Cathedral by George Dyson (2012)
Although Turing left, von Neumann continued thinking about computers throughout the end of the 30s and the war. Following his experiences working on the Manhattan Project, he was first drawn into the ENIAC project at the Moore School of Engineering at the University of Pennsylvania during the summer of 1944. Having observed the large amounts of calculation needed to predict blast radii, plan bomb paths and break encryption schemes, von Neumann early saw the need for substantial increases in computing power.
In 1945, von Neumann proposed a description for a computer architecture now known as the von Neumann architecture, which includes the basics of a modern electronic digital computer including:
A processing unit that contains an arithmetic logic unit and processor registers;A control unit that contains an instruction register and a program counterA memory unit that can store data and instructions;External storage; andInput and output mechanisms;
John von Neumann with the IAS machine, sometimes called the “von Neumann Machine”, stored in the the basement of Fuld Hall from 1942–1951 (Photo: Alan Richards) The same year, in software engineering, von Neumann invented the so-called merge sort algorithm which divides arrays in half before sorting them recursively and then merging them. von Neumann himself wrote the first 23 page sorting program for the EDVAC computer in ink. In addition, in a pioneering 1953 paper entitled Probabilistic Logics and the Synthesis of Reliable Organisms from Unrealiable Components, von Neumann was first to introduce stochastic computing, though the idea was so groundbreaking that it could not be implemented for another decade or so (Petrovik & Siljak, 1962). Related, von Neumann created the field of cellular automata through his rigorous mathematical treatment of the structure of self-replication, which preceded the discovery of the structure of DNA by several years.
Although influential in his own right, throughout his life, von Neumann made sure to acknowledge that the central concept of the modern computer was indeed Turing’s 1936 paperOn Computable Numbers, with an Application to the Entscheidungsproblem (Fraenkel, 1972)
”von Neumann firmly emphasised to me, and to others I am sure, that the fundamental conception is owing to Turing — insofar as not anticipated by Babbage, Lovelace and others.” — Stanley Fraenkel (1972)
"The only part of your thinking we'd like to bid for systematically is that which you spend shaving: we'd like you to pass on to us any ideas that come to you while so engaged."Excerpt, Letter from the Head of the RAND Corporation to von Neumann (Poundstone, 1992)
Throughout his career in America, von Neumann held a number of consultancies for various private, public and defense contractors including the National Defense Research Council (NDRC), the Weapons Systems Evaluation Group (WSEG), the Central Intelligence Agency (CIA), the Lawrence Livermore National Laboratory (LLNL) and the RAND Corporation, in addition to being an advisor to the Armed Forces Specials Weapons Project, a member of the General Advisory Committee of the Atomic Energy Commission, of the Scientific Advisory Group of the United States Air Force and in 1955 a commissioner of the Atomic Energy Commission (AEC).
PersonalityDespite his many appointments, responsibilities and copious research output, von Neumann lived a rather unusual lifestyle for a mathematician. As described by Vonnauman and Halmos:
“Parties and nightlife held a special appeal for von Neumann. While teaching in Germany, von Neumann had been a denizen of the Cabaret-era Berlin nightlife circuit.” — Vonneuman (1987)
The parties at the von Neumann’s house were frequent, and famous, and long. — Halmos (1958)
John von Neumann with his wife Klari Dan and their dog (Photo: Alan Richards)
His first wife, Klara, said that he could count everything except calories.
von Neumann also enjoyed Yiddish and dirty jokes, especially limericks (Halmos, 1958). He was a non-smoker, but at the IAS received complaints for regularly playing extremely loud German march music on the gramophone in his office, distracting those in neighboring offices, including Albert Einstein. In fact, von Neumann claimed to do some of his best work in noisy, chaotic environments such as in the living room of his house with the television blaring. Despite being a bad driver, he loved driving, often while reading books, leading to various arrests and accidents.
Von Neumann in the Florida Everglades in 1938 (Photo: Marina von Neumann Whitman)As a thinkerStanislaw Ulam, one of von Neumann’s close friends, described von Neumann’s mastery of mathematics as follows:
“Most mathematicians know one method. For example, Norbert Wiener had mastered Fourier transforms. Some mathematicians have mastered two methods and might really impress someone who knows only one of them. John von Neumann had mastered three methods: 1) A facility for the symbolic manipulation of linear operators, 2) An intuitive feeling for the logical structure of any new mathematical theory; and 3) An intuitive feeling for the combinatorial superstructure of new theories.”
Biographer Sylvia Nasar describes von Neumann’s own “thinking machine” by the following, now well-known anecdote regarding the so-called “ two trains puzzle”:
Two bicyclists start twenty miles apart and head toward each other, each going at a steady rate of 10 m.p.h. At the same time, a fly that travels at a steady 15 m.p.h. starts from the front wheel of the southbound bicycle and flies to the front wheel of the northbound one, then turns around and flies to the front wheel of the southbound one again, and continues in this manner till he is crushed between the two front wheels. Question: what total distance did the fly cover?There are two ways to answer the problem. One is to calculate the distance the fly covers on each leg of its trips between the two bicycles and finally sum the infinite series so obtained. The quick way is to observe that the bicycles meet exactly an hour after they start so that the fly had just an hour for his travels; the answer must therefore be 15 miles. When the question was put to von Neumann, he solved it in an instant, and thereby disappointed the questioner: “Oh, you must have heard the trick before!” “What trick,” asked von Neumann, “all I did was sum the infinite series.”Excerpt, A Beautiful Mind (Nasar, 1998)
As a supervisor
In the paper Szeged in 1934 (Lorch, 1993) Edgar R. Lorch describes his experience of working as an assistant for von Neumann in the 1930s, including his duties:
Attending von Neumann’s lectures on operator theory, taking notes, completing unfinished proofs and circulating them to all American university libraries;Assisting von Neumann in his role as the editor of the Annals of Mathematics by reading through every manuscript accepted to the publication, underlining greek letters in red and german letters in green, circling italics, writing notes to printers in the margins and going once per week to the printers in order to instruct them in the art of typesetting;Translating von Neumann’s numerous 100-page papers into English;
"His fluid line of thought was difficult for those less gifted to follow. He was notorious for dashing out equations on a small portion of the available blackboard and erasing expressions before students could copy them."- Excerpt, John von Neumann: As Seen by his Brother by N.A. Vonneuman (1987)
President Dwight D. Eisenhower (left) presenting John von Neumann (right) the Presidential Medal of Freedom in 1956In 1955, Von Neumann was diagnosed with what was likely either bone, pancreatic or prostate cancer (accounts differ on which diagnosis was made first). He was 51 years old. Following two years of illness which at the end confined him to a wheelchair, he eventually died on the 8th of February 1957, at 53 years old. On his deathbed, he reportedly entertained his brother by reciting the first few lines of each page from Goethe’s Faust, word-for-word, by heart (Blair, 1957).
He is buried at Princeton Cemetery in Princeton, New Jersey alongside his lifelong friends Eugene Wigner and Kurt Gödel. Gödel wrote him a letter a year before his death, which has been made public. The letter is discussed in detail by Hartmanis (1989) in his working paper The Structural Complexity Column. An excerpt is included below:
Letter from Kurt Gödel to von Neumann, March 20th 1956Dear Mr. von Neumann: With the greatest sorrow I have learned of your illness. The news came to me as quite unexpected. Morgenstern already last summer told me of a bout of weakness you once had, but at that time he thought that this was not of any greater significance. As I hear, in the last months you have undergone a radical treatment and I am happy that this treatment was successful as desired, and that you are now doing better. I hope and wish for you that your condition will soon improve even more and that the newest medical discoveries, if possible, will lead to a complete recovery.[...]I would be very happy to hear something from you personally. Please let me know if there is something that I can do for you. With my best greetings and wishes, as well to your wife,Sincerely yours, Kurt GödelP.S. I heartily congratulate you on the award that the American government has given to you
Interview on TelevisionRemarkably, there exists a video interview with von Neumann on the NBC show America’s Youth Wants to Know in the early 1950s (below):
For anyone interested in learning more about the life and work of John von Neumann, I especially recommend his friend Stanislaw Ulam’s 1958 essayJohn von Neumann 1903–1957 in the Bulletin of the American Mathematical Society 64 (3) pp 1–49 and the book John von Neumann* by Norman Macrae (1992).
This essay is part of a series of stories on math-related topics, published in Cantor’s Paradise, a weekly Medium publication. Thank you for reading!
Twenty years after an apparent anomaly in the behavior of elementary particles raised hopes of a major physics breakthrough, a new measurement has solidified them: Physicists at Fermi National Accelerator Laboratory near Chicago announced today that muons — elementary particles similar to electrons — wobbled more than expected while whipping around a magnetized ring.
The widely anticipated new measurement confirms the decades-old result, which made headlines around the world. Both measurements of the muon’s wobbliness, or magnetic moment, significantly overshoot the theoretical prediction, as calculated last year by an international consortium of 132 theoretical physicists. The Fermilab researchers estimate that the difference has grown to a level quantified as “4.2 sigma,” well on the way to the stringent five-sigma level that physicists need to claim a discovery.
Taken at face value, the discrepancy strongly suggests that unknown particles of nature are giving muons an extra push. Such a discovery would at long last herald the breakdown of the 50-year-old Standard Model of particle physics — the set of equations describing the known elementary particles and interactions.
“Today is an extraordinary day, long awaited not only by us but by the whole international physics community,” Graziano Venanzoni, one of the leaders of the Fermilab Muon g-2 experiment and a physicist at the Italian National Institute for Nuclear Physics, told the press.
However, even as many particle physicists are likely to be celebrating — and racing to propose new ideas that could explain the discrepancy — a paper published today in the journal Nature casts the new muon measurement in a dramatically duller light.
The paper, which appeared just as the Fermilab team unveiled its new measurement, suggests that the muon’s measured wobbliness is exactly what the Standard Model predicts.
In the paper, a team of theorists known as BMW present a state-of-the-art supercomputer calculation of the most uncertain term that goes into the Standard Model prediction of the muon’s magnetic moment. BMW calculates this term to be considerably larger than the value adopted last year by the consortium, a group known as the Theory Initiative. BMW’s larger term leads to a larger overall predicted value of the muon’s magnetic moment, bringing the prediction in line with the measurements.
If the new calculation is correct, then physicists may have spent 20 years chasing a ghost. But the Theory Initiative’s prediction relied on a different calculational approach that has been honed over decades, and it could well be right. In that case, Fermilab’s new measurement constitutes the most exciting result in particle physics in years.
“This is a very sensitive and interesting situation,” said Zoltan Fodor, a theoretical particle physicist at Pennsylvania State University who is part of the BMW team.
BMW’s calculation itself is not breaking news; the paper first appeared as a preprint last year. Aida El-Khadra, a particle theorist at the University of Illinois who co-organized the Theory Initiative, explained that the BMW calculation should be taken seriously, but that it wasn’t factored into the Theory Initiative’s overall prediction because it still needed vetting. If other groups independently verify BMW’s calculation, the Theory Initiative will integrate it into its next assessment.
Dominik Stöckinger, a theorist at the Technical University of Dresden who participated in the Theory Initiative and is a member of the Fermilab Muon g-2 team, said the BMW result creates “an unclear status.” Physicists can’t say whether exotic new particles are pushing on muons until they agree about the effects of the 17 Standard Model particles they already know about.
Regardless, there’s plenty of reason for optimism: Researchers emphasize that even if BMW is right, the puzzling gulf between the two calculations could itself point to new physics. But for the moment, the past 20 years of conflict between theory and experiment appear to have been replaced by something even more unexpected: a battle of theory versus theory.
Momentous MuonsThe reason physicists have eagerly awaited Fermilab’s new measurement is that the muon’s magnetic moment — essentially the strength of its intrinsic magnetism — encodes a huge amount of information about the universe.
A century ago, physicists assumed that the magnetic moments of elementary particles would follow the same formula as larger objects. Instead they found that electrons rotate in magnetic fields twice as much as expected. Their “gyromagnetic ratio,” or “g-factor” — the number relating their magnetic moment to their other properties — seemed to be 2, not 1, a surprise discovery later explained by the fact that electrons are “spin-1/2” particles, which return to the same state after making two full turns rather than one.
For years, both electrons and muons were thought to have g-factors of exactly 2. But then in 1947, Polykarp Kusch and Henry Foley measured the electron’s g-factor to be 2.00232. The theoretical physicist Julian Schwinger almost immediately explained the extra bits: He showed that the small corrections come from an electron’s tendency to momentarily emit and reabsorb a photon as it moves through space.
Many other fleeting quantum fluctuations happen as well. An electron or muon might emit and reabsorb two photons, or a photon that briefly becomes an electron and a positron, among countless other possibilities that the Standard Model allows. These temporary manifestations travel around with an electron or muon like an entourage, and all of them contribute to its magnetic properties. “The particle you thought was a bare muon is actually a muon plus a cloud of other things that appear spontaneously,” said Chris Polly, another leader of the Fermilab Muon g-2 experiment. “They change the magnetic moment.”
The rarer a quantum fluctuation, the less it contributes to the electron or muon’s g-factor. “As you go further into the decimal places you can see where suddenly the quarks start to appear for the first time,” said Polly. Further along are particles called W and Z bosons, and so on. Because muons are 207 times heavier than electrons, they’re about 2072 (or 43,000) times more likely to conjure up heavy particles in their entourage; these particles therefore alter the muon’s g-factor far more than an electron’s. “So if you’re looking for particles that could explain the missing mass of the universe — dark matter — or you’re looking for particles of a theory called supersymmetry,” Polly said, “that’s where the muon has a unique role.”
For decades, theorists have strived to calculate contributions to the muon’s g-factor coming from increasingly unlikely iterations of known particles from the Standard Model, while experimentalists measured the g-factor with ever-increasing precision. If the measurement were to outstrip the expectation, this would betray the presence of strangers in the muon’s entourage: fleeting appearances of particles beyond the Standard Model.
Muon magnetic moment measurements began at Columbia University in the 1950s and were picked up a decade later at CERN, Europe’s particle physics laboratory. There, researchers pioneered the measurement technique still used at Fermilab today.
High-speed muons are shot into a magnetized ring. As a muon whips around the ring, passing through its powerful magnetic field, the particle’s spin axis (which can be pictured as a little arrow) gradually rotates. Millionths of a second later, typically after speeding around the ring a few hundred times, the muon decays, producing an electron that flies into one of the surrounding detectors. The varying energies of electrons emanating from the ring at different times reveal how quickly the muon spins are rotating.
In the 1990s, a team at Brookhaven National Laboratory on Long Island built a 50-foot-wide ring to fling muons around and began collecting data. In 2001, the researchers announced their first results, reporting 2.0023318404 for the muon’s g-factor, with some uncertainty in the final two digits. Meanwhile, the most comprehensive Standard Model prediction at the time gave the significantly lower value of 2.0023318319.
It instantly became the world’s most famous eighth-decimal-place discrepancy.
“Hundreds of newspapers covered it,” said Polly, who was a graduate student with the experiment at the time.
Brookhaven’s measurement overshot the prediction by nearly three times its supposed margin of error, known as a three-sigma deviation. A three-sigma gap is significant, unlikely to be caused by random noise or an unlucky accumulation of small errors. It strongly suggested that something was missing from the theoretical calculation, something like a dark matter particle or an extra force-carrying boson.
But unlikely sequences of events sometimes happen, so physicists require a five-sigma deviation between a prediction and a measurement to definitively claim a discovery.
Trouble With HadronsA year after Brookhaven’s headline-making measurement, theorists spotted a mistake in the prediction. A formula representing one group of the tens of thousands of quantum fluctuations that muons can engage in contained a rogue minus sign; fixing it in the calculation reduced the difference between theory and experiment to just two sigma. That’s nothing to get excited about.
But as the Brookhaven team accrued 10 times more data, their measurement of the muon’s g-factor stayed the same while the error bars around the measurement shrank. The discrepancy with theory grew back to three sigma by the time of the experiment’s final report in 2006. And it continued to grow, as theorists kept honing the Standard Model prediction for the g-factor without seeing the value drift upward toward the measurement.
The Brookhaven anomaly loomed ever larger in physicists’ psyches as other searches for new particles failed. Throughout the 2010s, the $20 billion Large Hadron Collider in Europe slammed protons together in hopes of conjuring up dozens of new particles that might complete the pattern of nature’s building blocks. But the collider found only the Higgs boson — the last missing piece of the Standard Model. Meanwhile, a slew of experimental searches for dark matter found nothing. Hopes for new physics increasingly rode on wobbly muons. “I don’t know if it is the last great hope for new physics, but it certainly is a major one,” Matthew Buckley, a particle physicist at Rutgers University, told me.
The original Muon g-2 experiment was constructed at Brookhaven National Laboratory on Long Island in the 1990s. Rather than build a new experiment from scratch, physicists used a series of barges and trucks to move the 700-ton electromagnetic ring down the Atlantic coast, across the Gulf of Mexico, and up the Mississippi, Illinois and Des Plaines rivers to the Fermi National Laboratory in Illinois. Thousands of people came out to celebrate its arrival in July 2013.
Darin Clifton/Ceres Barge; Reidar Hahn
Everyone knew that in order to cross the threshold of discovery, they would need to measure the muon’s gyromagnetic ratio again, and more precisely. So plans for a follow-up experiment got underway. In 2013, the giant magnet used at Brookhaven was loaded onto a barge off Long Island and shipped down the Atlantic Coast and up the Mississippi and Illinois rivers to Fermilab, where the lab’s powerful muon beam would let data accrue much faster than before. That and other improvements would allow the Fermilab team to measure the muon’s g-factor four times more accurately than Brookhaven had.
In 2016, El-Khadra and others started organizing the Theory Initiative, seeking to iron out any disagreements and arrive at a consensus Standard Model prediction of the g-factor before the Fermilab data rolled in. “For the impact of such an exquisite experimental measurement to be maximized, theory needs to get its act together, basically,” she said, explaining the reasoning at the time. The theorists compared and combined calculations of different quantum bits and pieces that contribute to the muon’s g-factor and arrived at an overall prediction last summer of 2.0023318362. That fell a hearty 3.7 sigma below Brookhaven’s final measurement of 2.0023318416.
But the Theory Initiative’s report was not the final word.
Uncertainty about what the Standard Model predicts for the muon’s magnetic moment stems entirely from the presence in its entourage of “hadrons”: particles made of quarks. Quarks feel the strong force (one of the three forces of the Standard Model), which is so strong it’s as if quarks are swimming in glue, and that glue is endlessly dense with other particles. The equation describing the strong force (and thus, ultimately, the behavior of hadrons) can’t be exactly solved.
That makes it hard to gauge how often hadrons pop up in the muon’s midst. The dominant scenario is the following: The muon, as it travels along, momentarily emits a photon, which morphs into a hadron and an antihadron; the hadron-antihadron pair quickly annihilate back into a photon, which the muon then reabsorbs. This process, called hadronic vacuum polarization, contributes a small correction to the muon’s gyromagnetic ratio starting in the seventh decimal place. Calculating this correction involves solving a complicated mathematical sum for each hadron-antihadron pair that can arise.
Uncertainty about this hadronic vacuum polarization term is the primary source of overall uncertainty about the g-factor. A small increase in this term can completely erase the difference between theory and experiment. Physicists have two ways to calculate it.
With the first method, researchers don’t even try to calculate the hadrons’ behavior. Instead, they simply translate data from other particle collision experiments into an expectation for the hadronic vacuum polarization term. “The data-driven approach has been refined and optimized over decades, and several competing groups using different details in their approaches have confirmed each other,” said Stöckinger. The Theory Initiative used this data-driven approach.
But in recent years, a purely computational method has been steadily improving. In this approach, researchers use supercomputers to solve the equations of the strong force at discrete points on a lattice instead of everywhere in space, turning the infinitely detailed problem into a finite one. This way of coarse-graining the quark quagmire to predict the behavior of hadrons “is similar to a weather forecast or meteorology,” Fodor explained. The calculation can be made ultra-precise by putting lattice points very close together, but this also pushes computers to their limits.
The 14-person BMW team — named after Budapest, Marseille and Wuppertal, the three European cities where most team members were originally based — used this approach. They made four chief innovations. First they reduced random noise. They also devised a way of very precisely determining scale in their lattice. At the same time, they more than doubled their lattice’s size compared to earlier efforts, so that they could study hadrons’ behavior near the center of the lattice without worrying about edge effects. Finally, they included in the calculation a family of complicating details that are often neglected, like mass differences between types of quarks. “All four [changes] needed a lot of computing power,” said Fodor.
The researchers then commandeered supercomputers in Jülich, Munich, Stuttgart, Orsay, Rome, Wuppertal and Budapest and put them to work on a new and better calculation. After several hundred million core hours of crunching, the supercomputers spat out a value for the hadronic vacuum polarization term. Their total, when combined with all other quantum contributions to the muon’s g-factor, yielded 2.00233183908. This is “in fairly good agreement” with the Brookhaven experiment, Fodor said. “We cross-checked it a million times because we were very much surprised.” In February 2020, they posted their work on the arxiv.org preprint server.
The Theory Initiative decided not to include BMW’s value in their official estimate for a few reasons. The data-driven approach has a slightly smaller error bar, and three different research groups independently calculated the same thing. In contrast, BMW’s lattice calculation was unpublished as of last summer. And although the result agrees well with earlier, less precise lattice calculations that also came out high, it hasn’t been independently replicated by another group to the same precision.
The Theory Initiative’s decision meant that the official theoretical value of the muon’s magnetic moment had a 3.7-sigma difference with Brookhaven’s experimental measurement. It set the stage for what has become the most anticipated reveal in particle physics since the Higgs boson in 2012.
The RevelationsA month ago, the Fermilab Muon g-2 team announced that they would present their first results today. Particle physicists were ecstatic. Laura Baudis, a physicist at the University of Zurich, said she was “counting the days until April 7,” after anticipating the result for 20 years. “If the Brookhaven results are confirmed by the new experiment at Fermilab,” she said, “this would be an enormous achievement.”
And if not — if the anomaly were to disappear — some in the particle physics community feared nothing less than “the end of particle physics,” said Stöckinger. The Fermilab g-2 experiment is “our last hope of an experiment which really proves the existence of physics beyond the Standard Model,” he said. If it failed to do so, many researchers might feel that “we now give up and we have to do something else instead of researching physics beyond the Standard Model.” He added, “Honestly speaking, it might be my own reaction.”
The 200-person Fermilab team revealed the result to themselves only six weeks ago in an unveiling ceremony over Zoom. Tammy Walton, a scientist on the team, rushed home to catch the show after working the night shift on the experiment, which is currently in its fourth run. (The new analysis covers data from the first run, which makes up 6% of what the experiment will eventually accrue.) When the all-important number appeared on the screen, plotted along with the Theory Initiative’s prediction and the Brookhaven measurement, Walton was thrilled to see it land higher than the former and pretty much smack dab on top of the latter. “People are going to be crazy excited,” she said.
Papers proposing various ideas for new physics are expected to flood the arxiv in the coming days. Yet beyond that, the future is unclear. What was once an illuminating breach between theory and experiment has been clouded by a far foggier clash of calculations.
It’s possible that the supercomputer calculation will turn out to be wrong — that BMW overlooked some source of error. “We need to have a close look at the calculation,” El-Khadra said, stressing that it’s too early to draw firm conclusions. “It is pushing on the methods to get that precision, and we need to understand if the way they pushed on the methods broke them.”
That would be good news for fans of new physics.
Interestingly, though, even if the data-driven method is the approach with an unidentified problem under the hood, theorists have a hard time understanding what the problem could be other than unaccounted-for new physics. “The need for new physics would only shift elsewhere,” said Martin Hoferichter of the University of Bern, a leading member of the Theory Initiative.
Researchers who have been exploring possible problems with the data-driven method over the past year say the data itself is unlikely to be wrong. It comes from decades of ultraprecise measurements of 35 hadronic processes. But “it could be that the data, or the way it is interpreted, is misleading,” said Andreas Crivellin of CERN and other institutions, a coauthor (along with Hoferichter) of one paper studying this possibility.
It’s possible, he explained, that destructive interference happens to reduce the likelihood of the hadronic processes arising in certain electron-positron collisions, without affecting hadronic vacuum polarization near muons; then the data-driven extrapolation from one to the other doesn’t quite work. In that case, though, another Standard Model calculation that’s sensitive to the same hadronic processes gets thrown off, creating a different tension between the theory and data. And this tension would itself suggest new physics.
It’s tricky to resolve this other tension while keeping the new physics “elusive enough to not have been observed elsewhere,” as El-Khadra put it, yet it’s possible — for instance, by introducing the effects of hypothetical particles called vector-like leptons.
Thus the mystery swirling around muons might lead the way past the Standard Model to a more complete account of the universe after all. However things turn out, it’s safe to say that today’s news — both the result from Fermilab, as well as the publication of the BMW calculation in Nature — is not the end for particle physics.
Although the historical annual improvement of about 40% in central processing unit performance is slowing, the combination of CPUs packaged with alternative processors is improving at a rate of more than 100% per annum. These unprecedented and massive improvements in processing power combined with data and artificial intelligence will completely change the way we think about designing hardware, writing software and applying technology to businesses.
Every industry will be disrupted. You hear that all the time. Well, it’s absolutely true and we’re going to explain why and what it all means.
In this Breaking Analysis, we’re going to unveil some data that suggests we’re entering a new era of innovation where inexpensive processing capabilities will power an explosion of machine intelligence applications. We’ll also tell you what new bottlenecks will emerge and what this means for system architectures and industry transformations in the coming decade.
Is Moore’s Law really dead?
We’ve heard it hundreds of times in the past decade. EE Times has written about it, MIT Technology Review, CNET, SiliconANGLE and even industry associations that marched to the cadence of Moore’s Law. But our friend and colleague Patrick Moorhead got it right when he said:
Moore’s Law, by the strictest definition of doubling chip densities every two years, isn’t happening anymore.
And that’s true. He’s absolutely correct. However, he couched that statement saying “by the strictest definition” for a reason… because he’s smart enough to know that the chip industry are masters at figuring out workarounds.
Historical performance curves are being shatteredThe graphic below is proof that the death of Moore’s Law by its strictest definition is irrelevant.
The fact is that the historical outcome of Moore’s Law is actually accelerating, quite dramatically. This graphic digs into the progression of Apple Inc.’s system-on-chip developments from the A9 and culminating in the A14 five-nanometer Bionic system on a chip.
The vertical axis shows operations per second and and the horizontal axis shows time for three processor types. The CPU, measured in terahertz (the blue line which you can hardly see); the graphics processing unit or GPU, measured in trillions of floating point operations per second (orange); and the neural processing unit or NPU, measured in trillions of operations per second (the exploding gray area).
Many folks will remember that historically, we rushed out to buy the latest and greatest personal computer because the newer models had faster cycle times, that is, more gigahertz. The outcome of Moore’s Law was that performance would double every 24 months or about 40% annually. CPU performance improvements have now slowed to roughly 30% annually, so technically speaking, Moore’s Law is dead.
Apple’s SoC performance shatters the normCombined, the improvements in Apple’s SoC since 2015 have been on a pace that’s higher than 118% annual improvement. Actually it’s higher because 118% is the actual figure for these three processor types shown above. In the graphic, we’re not even counting the impact of the digital signal processors and accelerator components of the system, which would push this higher.
Apple’s A14 shown above on the right is quite amazing with its 64-bit architecture, multiple cores and alternative processor types. But the important thing is what you can do with all this processing power – in an iPhone! The types of AI continue to evolve from facial recognition to speech and natural language processing, rendering videos, helping the hearing impaired and eventually bringing augmented reality to the palm of your hand.
Processing goes to the edge – networks and storage become the bottlenecksWe recently reported Microsoft Corp. Chief Executive Satya Nadella’s epic quote that we’ve reached peak centralization. The graphic below paints a picture that is telling. We just shared above that processing power is accelerating at unprecedented rates. And costs are dropping like a rock. Apple’s A14 costs the company $50 per chip. Arm at its v9 announcement said that it will have chips that can go into refrigerators that will optimize energy use and save 10% annually on power consumption. They said that chip will cost $1 — a buck to shave 10% off your electricity bill from the fridge.
Processing is plentiful and cheap. But look at where the expensive bottlenecks are: networks and storage. So what does this mean?
It means that processing is going to get pushed to the edge – wherever the data is born. Storage and networking will become increasingly distributed and decentralized. With custom silicon and processing power placed throughout the system with AI embedded to optimize workloads for latency, performance, bandwidth, security and other dimensions of value.
And remember, most of the data – 99% – will stay at the edge. We like to use Tesla Inc. as an example. The vast majority of data a Tesla car creates will never go back to the cloud. It doesn’t even get persisted. Tesla saves perhaps five minutes of data. But some data will connect occasionally back to the cloud to train AI models – we’ll come back to that.
But this picture above says if you’re a hardware company, you’d better start thinking about how to take advantage of that blue line, the explosion of processing power. Dell Technologies Inc., Hewlett Packard Enterprise Co., Pure Storage Inc., NetApp Inc. and the like are either going to start designing custom silicon or they’re going to be disrupted, in our view. Amazon Web Services Inc., Google LLC and Microsoft are all doing it for a reason, as are Cisco Systems Inc. and IBM Corp.. As cloud consultant Sarbjeet Johal has said, “this is not your grandfather’s semiconductor business.”
And if you’re a software engineer, you’re going to be writing applications that take advantage of of all the data being collected and bringing to bear this immense processing power to create new capabilities like we’ve never seen before.
AI everywhereMassive increases in processing power and cheap silicon will power the next wave of AI, machine intelligence, machine learning and deep learning.
We sometimes use artificial intelligence and machine intelligence interchangeably. This notion comes from our collaborations with author David Moschella. Interestingly, in his book “ Seeing Digital,” Moschella says “there’s nothing artificial” about this:
There’s nothing artificial about machine intelligence just like there’s nothing artificial about the strength of a tractor.
It’s a nuance, but precise language can often bring clarity. We hear a lot about machine learning and deep learning and think of them as subsets of AI. Machine learning applies algorithms and code to data to get “smarter” – make better models, for example, that can lead to augmented intelligence and better decisions by humans, or machines. These models improve as they get more data and iterate over time.
Deep learning is a more advanced type of machine learning that uses more complex math.
The right side of the chart above shows the two broad elements of AI. The point we want to make here is that much of the activity in AI today is focused on building and training models. And this is mostly happening in the cloud. But we think AI inference will bring the most exciting innovations in the coming years.
AI inference unlocks huge valueInference is the deployment of the model, taking real-time data from sensors, processing data locally, applying the training that has been developed in the cloud and making micro-adjustments in real time.
Let’s take an example. We love car examples and observing Tesla is instructive and a good model as to how the edge may evolve. So think about an algorithm that optimizes the performance and safety of a car on a turn. The model takes inputs with data on friction, road conditions, angles of the tires, tire wear, tire pressure and the like. And the model builders keep testing and adding data and iterating the model until it’s ready to be deployed.
Then the intelligence from this model goes into an inference engine, which is a chip running software, that goes into a car and gets data from sensors and makes micro adjustments in real time on steering and braking and the like. Now as we said before, Tesla persists the data for a very short period of time because there’s so much data. But it can choose to store certain data selectively if needed to send back to the cloud and further train the model. For example, if an animal runs into the road during slick conditions, maybe Tesla persists that data snapshot, sends it back to the cloud, combines it with other data and further perfects the model to improve safety.
This is just one example of thousands of AI inference use cases that will further develop in the coming decade.
AI value shifts from modeling to inferencingThis conceptual chart below shows percent of spend over time on modeling versus inference. And you can see some of the applications that get attention today and how these apps will mature over time as inference becomes more mainstream. The opportunities for AI inference at the edge and in the “internet of things” are enormous.
Modeling will continue to be important. Today’s prevalent modeling workloads in fraud, adtech, weather, pricing, recommendation engines and more will just keep getting better and better. But inference, we think, is where the rubber meets the road, as shown in the previous example.
And in the middle of the graphic we show the industries, which will all be transformed by these trends.
One other point on that: Moschella in his book explains why historically, vertical industries remained pretty stovepiped from each other. They each had their own “stack” of production, supply, logistics, sales, marketing, service, fulfillment and the like. And expertise tended to reside and stay within that industry and companies, for the most part, stuck to their respective swim lanes.
But today we see so many examples of tech giants entering other industries. Amazon entering grocery, media and healthcare, Apple in finance and EV, Tesla eyeing insurance: There are many examples of tech giants crossing traditional industry boundaries and the enabler is data. Auto manufacturers over time will have better data than insurance companies for example. DeFi or decentralized finance or platforms using the blockchain will continue to improve with AI and disrupt traditional payment systems — and on and on.
Hence we believe the oft-repeated bromide that no industry is safe from disruption.
Snapshot of AI in the enterpriseLast week we showed you the chart below from Enterprise Technology Research.
This is data shows on the vertical axis Net Score or spending momentum. The horizontal axis is Market Share or pervasiveness in the ETR data set. The red line at 40% is our subjective anchor; anything about 40% is really good in our view.
Machine learning and AI are the No. 1 area of spending velocity and has been for a while, hence the four stars. Robotic process automation is increasingly an adjacency to AI and you could argue cloud is where all the machine learning action is taking place today and is another adjacency, although we think AI continues to move out of the cloud for the reasons we just described.
Enterprise AI specialists carve out positionsThe chart below shows some of the vendors in the space that are gaining traction. These are the companies chief information officers and information technology buyers associate with their AI/ML spend.
This graph above uses the same Y/X coordinates – Spending Velocity on the vertical by Market Share on the horizontal axis, same 40% red line.
The big cloud players, Microsoft, AWS and Google, dominate AI and ML with the most presence. They have the tooling and the data. As we said, lots of modeling is going on in the cloud, but this will be pushed into remote AI inference engines that will have massive processing capabilities collectively. We are moving away from peak centralization and this presents great opportunities to create value and apply AI to industry.
Databricks Inc. is seen as an AI leader and stands out with a strong Net Score and a prominent Market Share. SparkCognition Inc. is off the charts in the upper left with an extremely high Net Score albeit from a small sample. The company applies machine learning to massive data sets. DataRobot Inc. does automated AI – they’re super high on the Y axis. Dataiku Inc. helps create machine learning-based apps. C3.ai Inc. is an enterprise AI company founded and run by Tom Siebel. You see SAP SE, Salesforce.com Inc. and IBM Watson just at the 40% line. Oracle is also in the mix with its autonomous database capabilities and Adobe Inc. shows as well.
The point is that these software companies are all embedding AI into their offerings. And incumbent companies that are trying not to get disrupted can buy AI from software companies. They don’t have to build it themselves. The hard part is how and where to apply AI. And the simple answer is: Follow the data.
Key takeawaysThere’s so much more to this story, but let’s leave it there for now and summarize.
We’ve been pounding the table about the post-x86 era, the importance of volume in terms of lowering the costs of semiconductor production, and today we’ve quantified something that we haven’t really seen much of and that’s the actual performance improvements we’re seeing in processing today. Forget Moore’s Law being dead – that’s irrelevant. The original premise is being blown away this decade by SoC and the coming system on package designs. Who knows with quantum computing what the future holds in terms of performance increases.
These trends are a fundamental enabler of AI applications and as is most often the case, the innovation is coming from consumer use cases; Apple continues to lead the way. Apple’s integrated hardware and software approach will increasingly move to the enterprise mindset. Clearly the cloud vendors are moving in that direction. You see it with Oracle Corp. too. It just makes sense that optimizing hardware and software together will gain momentum because there’s so much opportunity for customization in chips as we discussed last week with Arm Ltd.’s announcement – and it’s the direction new CEO Pat Gelsinger is taking Intel Corp.
One aside – Gelsinger may face massive challenges with Intel, but he’s right on that semiconductor demand is increasing and there’s no end in sight.
If you’re an enterprise, you should not stress about inventing AI. Rather, your focus should be on understanding what data gives you competitive advantage and how to apply machine intelligence and AI to win. You’ll buy, not build AI.
Data, as John Furrier has said many times, is becoming the new development kit. He said that 10 years ago and it’s more true now than ever before:
Data is the new development kit.
If you’re an enterprise hardware player, you will be designing your own chips and writing more software to exploit AI. You’ll be embedding custom silicon and AI throughout your product portfolio and you’ll be increasingly bringing compute to data. Data will mostly stay where it’s created. Systems, storage and networking stacks are all being disrupted.
If you developer software, you now have processing capabilities in the palm of your hands that are incredible and you’re going to write new applications to take advantage of this and use AI to change the world. You’ll have to figure out how to get access to the most relevant data, secure your platforms and innovate.
And finally, if you’re a services company you have opportunities to help companies trying not to be disrupted. These are many. You have the deep industry expertise and horizontal technology chops to help customers survive and thrive.
Privacy? AI for good? Those are whole topics on their own, extensively covered by journalists. We think for now it’s prudent to gain a better understanding of how far AI can go before we determine how far it should go and how it should be regulated. Protecting our personal data and privacy should be something that we most definitely care for – but generally we’d rather not stifle innovation at this point.
Also, check out this ETR Tutorial we created, which explains the spending methodology in more detail. Note: ETR is a separate company from Wikibon/SiliconANGLE. If you would like to cite or republish any of the company’s data, or inquire about its services, please contact ETR at email@example.com.
Cerebras Systems has unveiled its new Wafer Scale Engine 2 processor with a record-setting 2.6 trillion transistors and 850,000 AI-optimized cores. It’s built for supercomputing tasks, and it’s the second time since 2019 that Los Altos, California-based Cerebras has unveiled a chip that is basically an entire wafer.
Chipmakers normally slice a wafer from a 12-inch-diameter ingot of silicon to process in a chip factory. Once processed, the wafer is sliced into hundreds of separate chips that can be used in electronic hardware.
But Cerebras, started by SeaMicro founder Andrew Feldman, takes that wafer and makes a single, massive chip out of it. Each piece of the chip, dubbed a core, is interconnected in a sophisticated way to other cores. The interconnections are designed to keep all the cores functioning at high speeds so the transistors can work together as one.
Twice as good as the CS-1
Above: Comparing the CS-1 to the biggest GPU.
Image Credit: Cerebras
In 2019, Cerebras could fit 400,000 cores and 1.2 billion transistors on a wafer chip, the CS-1. It was built with a 16-nanometer manufacturing process. But the new chip is built with a high-end 7-nanometer process, meaning the width between circuits is seven billionths of a meter. With such miniaturization, Cerebras can cram a lot more transistors in the same 12-inch wafer, Feldman said. It cuts that circular wafer into a square that is eight inches by eight inches, and ships the device in that form.
“We have 123 times more cores and 1,000 times more memory on chip and 12,000 times more memory bandwidth and 45,000 times more fabric bandwidth,” Feldman said in an interview with VentureBeat. “We were aggressive on scaling geometry, and we made a set of microarchitecture improvements.”
Now Cerebras’ WSE-2 chip has more than twice as many cores and transistors. By comparison the largest graphics processing unit (GPU) has only 54 billion transistors — 2.55 trillion fewer transistors than the WSE-2. The WSE-2 also has 123 times more cores and 1,000 times more high performance on-chip high memory than GPU competitors. Many of the Cerebras cores are redundant in case one part fails.
“This is a great achievement, especially when considering that the world’s third largest chip is 2.55 trillion transistors smaller than the WSE-2,” said Linley Gwennap, principal analyst at The Linley Group, in a statement.
Feldman half-joked that this should prove that Cerebras is not a one-trick pony.
“What this avoids is all the complexity of trying to tie together lots of little things,” Feldman said. “When you have to build a cluster of GPUs, you have to spread your model across multiple nodes. You have to deal with device memory sizes and memory bandwidth constraints and communication and synchronization overheads.”
The CS-2’s specs
Above: TSMC put the CS-1 in a chip museum.
Image Credit: Cerebras
The WSE-2 will power the Cerebras CS-2, the industry’s fastest AI computer, designed and optimized for 7 nanometers and beyond. Manufactured by contract manufacturer TSMC, the WSE-2 more than doubles all performance characteristics on the chip — the transistor count, core count, memory, memory bandwidth, and fabric bandwidth — over the first generation WSE. The result is that on every performance metric, the WSE-2 is orders of magnitude larger and more performant than any competing GPU on the market, Feldman said.
TSMC put the first WSE-1 chip in a museum of innovation for chip technology in Taiwan.
“Cerebras does deliver the cores promised,” Patrick Moorhead, an analyst at Moor Insights & Strategy. “What the company is delivering is more along the lines of multiple clusters on a chip. It does appear to give Nvidia a run for its money but doesn’t run raw CUDA. That has become somewhat of a de facto standard. Nvidia solutions are more flexible as well as they can fit into nearly any server chassis.”
With every component optimized for AI work, the CS-2 delivers more compute performance at less space and less power than any other system, Feldman said. Depending on workload, from AI to high-performance computing, CS-2 delivers hundreds or thousands of times more performance than legacy alternatives, and it does so at a fraction of the power draw and space.
A single CS-2 replaces clusters of hundreds or thousands of graphics processing units (GPUs) that consume dozens of racks, use hundreds of kilowatts of power, and take months to configure and program. At only 26 inches tall, the CS-2 fits in one-third of a standard datacenter rack.
“Obviously, there are companies and entities interested in Cerebras’ wafer-scale solution for large data sets,” said Jim McGregor, principal analyst at Tirias Research, in an email. “But, there are many more opportunities at the enterprise level for the millions of other AI applications and still opportunities beyond what Cerebras could handle, which is why Nvidia has the SuprPod and Selene supercomputers.”
He added, “You also have to remember that Nvidia is targeting everything from AI robotics with Jenson to supercomputers. Cerebras is more of a niche platform. It will take some opportunities but will not match the breadth of what Nvidia is targeting. Besides, Nvidia is selling everything they can build.”
Lots of customers
Above: Comparing the new Cerebras chip to its rival, the Nvidia A100.
Image Credit: Cerebras
And the company has proven itself by shipping the first generation to customers. Over the past year, customers have deployed the Cerebras WSE and CS-1, including Argonne National Laboratory; Lawrence Livermore National Laboratory; Pittsburgh Supercomputing Center (PSC) for its Neocortex AI supercomputer; EPCC, the supercomputing center at the University of Edinburgh; pharmaceutical leader GlaxoSmithKline; Tokyo Electron Devices; and more. Customers praising the chip include those at GlaxoSmithKline and the Argonne National Laboratory.
Kim Branson, senior vice president at GlaxoSmithKline, said in a statement that the company has increased the complexity of the encoder models it generates while decreasing training time by 80 times. At Argonne, the chip is being used for cancer research and has reduced the experiment turnaround time on cancer models by more than 300 times.
“For drug discovery, we have other wins that we’ll be announcing over the next year in heavy manufacturing and pharma and biotech and military,” Feldman said.
The new chips will ship in the third quarter. Feldman said the company now has more than 300 engineers, with offices in Silicon Valley, Toronto, San Diego, and Tokyo.
VentureBeatVentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
up-to-date information on the subjects of interest to you