The Economics of TSMC’s Giga-Fabs
If you want to watch the video, it is below
In 2018, TSMC broke ground on Fab 18 near Tainan City in the south of Taiwan. Fab 18 is a monster. It sits on 103 acres and has a total floor space of 950,000 square meters (10.2 million square feet).
That is about 3 times the size of AT&T Stadium in Texas - home of the Dallas Cowboys. (Thanks for the correction that the stadium isn’t actually in Dallas).
In total, across all of its phases, Fab 18 will cost TSMC nearly $20 billion to build and operate. More than the cost of the USS Gerald R. Ford, the US Navy's most advanced aircraft carrier.
In this video, we are going to look at why TSMC's fabs are getting bigger and more expensive than ever before. And why that makes a lot of economic sense for the Taiwanese chip maker.
The Many Fabs
There are a lot of ways to classify a fab. Generally, the place to start would be to look at their sheer numerical output, denominated in the number of wafers.
TSMC’s bread and butter are logic fabs, which make microprocessors and the like. The company has three volume classifications:
Mini-fabs, with 10,000 to 30,000 wafer starts a month;
Currently, TSMC is building a $12 billion 5 nanometer fab in Arizona. Fab 21 as it is called sits on a 1,100 acre (445 hectare) plot. In Phase 1, the company is targeting 20,000 wafers by the first quarter of 2024. This would make it a mini-fab.
Mega-fabs, which do between 30,000 to 100,000 wafer starts a month;
And the Giga-fab, with capacity for over 100,000 wafer starts a month. This is a TSMC marketing term that they apparently trademark.
But the wafer start number by itself is kind of useless. You got to have some context. Different fabs have different levels of profitable production.
Modern day logic fabs for instance generally target 80,000 wafers a month.
Then, there are memory fabs, like the types owned by Samsung, Yangtze Memory and SK Hynix. These produce memory modules like DRAM and NAND.
A memory fab needs to do a whole lot more, typically exceeding 120,000 wafers. This and the fact that memory is a more generalized product is why the memory industry sees such brutal business cycles of overproduction.
You also have your more niche outputs. These fabs are going to be a lot smaller.
Analog chips, which convert signals between the physical world - light, heat, pressure, such and such - and the digital world;
CMOS image sensor chips. They go into your digital cameras;
MEMS, which are tiny little electro-mechanical systems. These are pretty cool and I will probably do a video about them in the future; TFT LCD panel fabs, and so on.
Scaling Philosophy
Giga-fabs are conceptually the same as the mega and mini fabs. They run the same instruments and processes, but scaled up and in higher quantity.
Usually, a development fab pioneers the first recipe for a particular node process. The company then needs to scale it to its other fabs so that they can use it too. Every company has their own way of doing this.
Intel's scaling paradigm is famously called "Copy EXACTLY!". It attempts to do that technology transfer without compromising quality or yield by implementing the existing recipe without any modifications whatsoever.
Early on, TSMC used this policy for their own fab scaling. Their version was "Copy Exact". But over time, they seemed to have tweaked it a little, allowing for more flexibility. Process engineers monitor what is going on with all their lines in all their fabs.
If a process engineer tweaks something and it works out to be a little better than the baseline, then all the other process lines can be re-aligned to follow suit. They call this new philosophy “fab matching”.
More Expensive
Fab facilities are starting to cost a whole lot more. In 1983, UMC spent about $200 million to build a 1.2 micron fab. In 1999, they spent $1.8 billion for a 0.35 micron fab. In 2007, a 65 nanometer fab would cost $10 billion.
In 2010, TSMC invested about $9.4 billion in total across four phases to build its 28 nanometer GigaFab in Taichung - Fab 15. Eight years later, that 5 nanometer fab in Tainan - Fab 18 - will cost nearly double of that.
The TSMC Giga-Fab's sheer size is partially why it costs tens of billions of dollars. Like I mentioned, the Giga-Fab is just a scaled-up big brother. More equipment, more money. But the equipment itself has also gotten more expensive over the years.
Much of those price increases is tied to the ballooning cost of cutting edge photolithography equipment. Such equipment accounts for 70% of total manufacturing costs and a staggering 20% of a fab's entire budget.
Photolithography is generally a bottleneck in the fabrication process, so foundries want to buy as many as they can feasibly afford and install.
These systems today cost a whole lot more as engineers struggle to overcome various physical limitations. The price tag for new, cutting-edge systems like EUV and its follow-up High-NA EUV are pushing $300 million per system.
These systems will not only cost more outright, their sheer size and complexity mean that TSMC and other foundries will have to build and maintain larger clean rooms to operate them.
Cleaner Rooms
Clean rooms are one of the core drivers of fab expense. Increasing demands on air purity - ranging from class 1 to class 10 and beyond - make them more expensive to build and run. While at the same time, they are getting larger.
TSMC's Fab 14 cleanroom is 31,000 square meters large or over 5.5 soccer fields. Fab 15's clean room is 104,000 square meters, 14.5 soccer fields. Fab 18 has them all beat with a 160,000 square meter clean room, 22.4 soccer fields.
The cleanliness requirements were ratcheted up for early EUV. TSMC's early EUV machines lacked something called a pellicle, a polymer film just 1 micrometer wide, commonly used to prevent stray particles from messing up the wafer projection.
ASML eventually developed a suitable pellicle, but before then, TSMC operated in ultra-strict cleanliness standards. Less than one particle over 52 nm wide, the size of a small virus, each week.
EUV added countless engineering complications, right down to the very floors. Since Taiwan has issues with earthquakes and its Tainan fabs are located near the High Speed Rail, the clean rooms have to be elevated and placed on dampeners to prevent vibration issues from ruining the wafers.
This adds additional complexity to using EUV, because those machines use CO2 lasers to generate EUV light. Much of the infrastructure for that machinery has to go underneath the cleanroom floor - another substantial challenge to overcome.
Do It Cheap?
With all these fabs costing tens of billions of dollars, you sorta have to think. In 2020, TSMC made $45 billion in revenue and had $23 billion of cash on their books. They have all these fabs, including the four Giga-fabs, and they cost billions to run.
What are the economics here? How does TSMC justify spending a quarter of their revenue on a single facility? Is this financially sustainable?
TSMC started building these increasingly larger fabs for a number of interlinked reasons. Their biggest concern is in maximizing their speed, capacity, and customer value.
You can do it right, do it fast, or do it cheap. Pick two. I have repeatedly seen that TSMC is not as concerned about doing it cheap than it is about doing it *fast*.
One example takes place during the Chinese New Year. Taiwan celebrates CNY by sending everyone home for a week to be with their family. But TSMC told its builders that it will pay an extra $140 (4000 NTD) per day per worker who works during the holiday.
Another example. TSMC's Night Hawk Army, R&D engineers who work overnight shifts. Inspired by Foxconn's hard driving 24/7 operations style, Night Hawk workers receive 30% higher salaries and 50% higher bonuses.
In Taiwanese culture, to work overnight is said to hurt your liver - so the shift is referred to as a "liver buster".
With this in mind, TSMC is not really a low-cost leader. For instance, it is known that Samsung Foundry frequently offers lower prices. Slashing prices is not TSMC's preferred way of doing business.
TSMC's Business Value
Rather, TSMC seeks to offer value. Their fabless customers tend to have good margins, and thus can pay higher prices if it means that they get good product on time.
An example of this strategy is how the company invested in upgrading its customers' experience and lowering their entry costs.
Starting in 1997, Morris Chang began pushing the company to become more of a service-oriented enterprise. They created the e-foundry initiative, which sought to better manage the flow of information between the foundry and its customers. Fabless companies can now know where exactly their order is in the development cycle.
TSMC also focused on bending the cost curve for end users with programs like its CyberShuttle service. Fabless companies need to create masks to prototype their chips - something called a New Tape Out or NTO.
But a NTO can cost millions of dollars, which small companies cannot always afford. So TSMC offered a service where different customers can put multiple NTOs on a single wafer, sharing their costs.
Faster Cycle Times
A fab needs to produce. You are investing immense capital costs to build the whole facility and fill it with equipment. If that equipment is not being used, then money is being wasted and customers start getting impatient and mean.
Thus, TSMC thinks about cycle time as much as it thinks about yield. Cycle time measures how much time it takes to run through a single layer on a wafer.
Cycle time is measured in days per mask layer or DPML. The lower your DPML, the better.
Faster cycle times are no longer a nice-to-have as process nodes get more sophisticated. Wafers have multiple layers that need to get prepped, etched and washed. Over the years, the layers have gone up in number as the process nodes advanced.
Back in late 1990s, a 180 nanometer chip has about 25 layers and it took about 2 days to do one of those layers. So the full 25 layers would take about 2 months. TSMC's N14 and N16 processes had 60 layers. N10 has about 78. The N7 process, 87 layers. And N5 would have been 115 layers had EUV not been used to replace a few - it stands right now at 81.
Had TSMC not improved their cycle times, one N5 wafer would take nearly 8 months to do. Such long lead times are unacceptable in the fast moving, super-competitive electronics market.
The Benefits of a Bigger Fab
Bigger fabs help contribute to those faster cycle times. The Giga-Fab is made up of many phases connected together by an Automatic Material Handling System. But the whole process goes only as fast as its bottleneck phase.
Equipment frequently breaks down or has to be taken offline for maintenance. For instance, ion implanters are down 30-40% of the cycle. When a machine goes down - planned or otherwise - every wafer in the production line has to wait for its turn in the machine.
So, having four tools in your clean room instead of one makes it less likely that all four go down at the same time. Wafer batches can immediately and automatically be routed to the next process step without bottlenecking the process.
Simulations done in 2006 found that, depending on the various tool workstations and utilization rates, growing a fab eight times over can reduce its cycle times by nearly 50%.
In order to take full advantage of these scale benefits, fab operators create sophisticated planning models and queue processes so that each wafer batch arrives at the right workstation at the ideal time.
Lead Customers
One other thing that TSMC has that other foundries do not is its massive roster of customers. Companies like Apple, AMD, Qualcomm, Nvidia and MediaTek are unique in the industry. They are able to afford higher prices while also buying in high volume. TSMC leverages this customer base to its advantage.
It is famously known that Apple on occasion provides a few billion dollars up front for a supplier to build a facility. In return, they get a guaranteed supply of a particular component.
I have heard speculation that Apple might be doing the same for TSMC. But I haven’t seen any evidence to confirm it. TSMC likes to own their fabs outright and they are more than capable of raising their own capital.
Rather, it is more that Apple’s buying heft allows TSMC to forecast their future demand better and justify a massive Giga-fab scale investment. It knows that it can rely on its lead customers to take capacity each year. Might not sound like that big a deal in this current chip shortage environment, but market crashes often happen.
Furthermore by servicing them, TSMC keeps these prestige customers from taking their profitable business elsewhere. Smaller foundries like GlobalFoundries are stuck with lower profit customers lacking in scale. So it’s no surprise that GlobalFoundries makes negative gross margins.
Really does make TSMC the Apple of the foundry business. Which is why like Apple, they take much of the profits in the whole industry.
Cannibalizing
What makes TSMC yet harder to compete with is how aggressively they eat their own children. TSMC offers its customers a variety of node processes. You have the leading edge, medium, and mature processes.
TSMC spends billions to develop a leading edge process for a premier customer like Apple. They will charge a lot for it, but because the yield is low and depreciation expense is high, gross margins won't be so great.
But once the leading edge is done, TSMC's engineers very rapidly turn around and start applying its benefits - not to the mature processes, but the medium ones with the explicit goal of cannibalizing those medium node users.
I earlier mentioned a specific example of this in action, where TSMC uses EUV to rapidly replace wafer cycles that used to have been done by Immersion DUV in older methodologies.
In 2018, TSMC debuted the N7 process step. A year later, it moved on to create N6, an evolutionary step introduced with the intention of rapidly replacing N7. And it has done so at a speed that's surprised many industry observers - from 15% to 50% in a single year.
I suspect N6 is very profitable for TSMC. And it echoes what I said in my previous video about the UK's IBM - the business and strategic importance of maintaining technical supremacy in the industry. When you are the best in the world at something, you will eventually find a way to monetize it.
Conclusion
TSMC is very secretive about their fabs. Very open about many other things - their product offerings, customer service strategy, financial performance, so on. But I have a hard time finding any material on their fabs and how they work. The paper trail goes cold some time in 2010.
It is interesting to compare this with ASML, who is very open about their engineering research findings. The European company is constantly publishing material about their latest efforts.
Who knows if there is a particular reason for this. But here's my irresponsible guess. In 2009, TSMC won a lawsuit against a competitor, SMIC, for stealing trade secrets and recipes. The Shanghai company quickly took share in the China market and abroad using these secrets.
After winning, I reckon TSMC shut down hard on any content about how their fabs work and what goes on inside. I guess it makes a whole lot of sense, but it definitely makes understanding how and what they do much more difficult to dig out.