The Growing Semiconductor Design Bottleneck
If you want to watch the video first, it is below:
In 1997, American chip consortium SEMATECH sounded an alarm to the industry about the chip design productivity gap.
They observed that integrated chip manufacturing capabilities were expanding at about 40% a year. At the time.
Yet IC design capabilities were only growing at about half of that rate. Thus, a potential crisis was brewing where design capabilities lag far behind manufacturing.
This crisis never took place for reasons we will discuss later. In turn, however, a new challenge for the industry has emerged: Verification.
In this video, we are going to look at why this previously unheralded step has become a rather big deal in today's competitive chip design world.
The Chip Design Productivity Boom
There are a few reasons why the SEMATECH chip productivity crisis never materialized.
The first is that the pace at which chip manufacturing advanced drastically slowed down as the science and its commercialization got harder.
I have done enough EUV videos to make this point clear.
The second is that EDA tools vastly scaled up their productivity. They are now capable of handling much more abstraction.
And third, designers have begun reusing chip IP to deal with product complexity and to get stuff out of the door faster. From 2007 to 2012 the creation of new logic has declined by 32% - replaced by pre-designed and purchased IP.
For instance, Cadence and Synopsys not only make the EDA software for their clients but also license out useful IP blocks for chip standard functions like I/O. Now designers can easily kick off their project with those prebuilt functions already done.
Thus, relatively small teams can still design a billion transistor chip using the latest industry software and repurposed IP. The data generally bears out this thesis.
A 2019 report by Mentor Graphics find that between 2007 and 2014, industry job growth for design engineers was just 3.8% a year. The number of chip designers on a single project have stayed largely the same.
That is good. But throughout the same period of time, the growth rate for verification engineers was 12.6%. And in 2018, there were more of the latter than the former on a project.
Functional Verification
It is now a good time to start talking about what this verification thing is actually about. The goal of chip verification is to identify and correct a design defect before it goes to the manufacturing stage. You want to avoid a situation where a microchip does not perform as expected.
There are many types of verification in the industry. Functional verification, timing verification, layout verification, functional test, and built-in self-test or BIST. Of these however, functional verification is the most time-consuming and resource-heavy process.
If you have not already, I recommend that you go through my earlier piece on EDA software - what it is and how it helps with the overall chip design process - for background.
In that video, I talked about logic design, also known as circuit design. This is where the engineer translates abstract requirements into circuits with the logic capable of meeting those requirements. I used the metaphor of a UX designer who crafts how a software feature might look, act, and feel.
The output of the logic design stage is a cluster of groupings of logic and memory circuits connected together with wires. This grouping is referred to as a "netlist". But the designer is not directly working with those circuits on an individual basis.
There are millions of circuits, this is not practical.
Rather, the engineer creates the logic using a "hardware description language" or HDL, a high level abstraction EDA tool. The two major languages are Verilog and VHDL. Both are open standards.
The engineer is able to create the circuits with this high level language. But how do we know that the whole thing works as it is intended, and covers all the relevant corner cases? That is what functional verification is for. Finding errors amidst a sea of HDL code.
A Practical Explanation
Okay, so that might have made intuitive sense but might have felt super abstract. Here is a basic example.
A company is hired by the city to build a controller for a traffic light directing traffic between two streets: Red Street and Blue Street.
The city tells the company to have the traffic light check the sensors embedded in its intersection. If a sensor shows traffic waiting at a street, then it shows a green light for that street for one minute.
The company writes up an algorithm to meet this specific spec.
They design a chip to execute this algorithm using HDL code.
The HDL can then be used to fab a chip that controls the light.
Then a problem emerges. The algorithm fulfills the requested spec.
But the resulting behavior neglects that the light needs to let cars from each side through in a fair manner.
The way the algorithm was written, it does not do this. Instead it checks Red Street first, and always thus lets the cars waiting at Red Street through. Blue Street is locked in, their cars unable to pass until Red Street is entirely empty.
In the software world, you can simply push up an update to fix this. But this is hardware, which means an entirely new chip needs to be made for the traffic light controller.
Now obviously, this is an ultra-simplified example. Obviously a company would not fab an entirely new chip for this rudimentary purpose. But you get the point.
This particular example addresses functionality. But a significant concern is also security. Today's devices are intimately used. You want to avoid hardware-level vulnerabilities that might expose people to data theft or worse.
Thus we might be able to say that Verification is the design process in reverse. In design you start with the specification and create implementations. With verification, you take the implementation and trace it back so to determine whether it meets the specification as well as the intent of the user.
Verification Life Cycle
So what does modern verification flows look like today? Of course, these things vary from company to company - same as with the chip design flows I discussed in the last video. But here is a skeleton outline.
Verification should be embedded throughout the entire design process. So you start with verification planning, creating the test plan, defining the IPs needed, and so on.
The next thing is to verify whether the architecture is correct. Making sure to catch any high level protocol errors while everything is still very abstract and at an early stage.
You might want to create "architectural models", which simulate the device's target use cases and workload. This helps flesh out the device's security, performance and power needs. It also aids with low-level software and firmware development, which is often being done at the same time.
After that is a very intensive step called pre-silicon verification. Here, pre-designed chip IPs are tested to see whether it works as advertised. If the IP has already been pre-verified, that can be skipped, a significant time save. Then the pre-designed IP is integrated into the overall SOC and tested for accuracy at an overall system level. This is repeated as other IPs are added to the final design.
As part of this pre-silicon verification step, it is general practice to prototype the hardware code in emulators or chips called Field Programmable Gate Arrays or FPGAs.
An FPGA is designed to be configurable after fabrication, meaning that they are ideal for testing hardware/software use cases like an OS boot.
Once everything works, the system goes to a fab for fabbing into a chip.
Once the fab does its job, the chip goes through another lengthy and detailed process called post-silicon verification. This is the last step before the chip goes out to customers.
Here, the major benefit is that you are able to run the use case tests on actual silicon hardware. The major drawback however, is that debugging is much more difficult since obviously you cannot easily dissect what exactly is going on inside the thing.
The Verification Gap
Over time, verification has come to take up a significant portion of total product development time. It is often said that functional verification alone takes up 70% of the design team's energy and time in the product development cycle.
I do want to note that this statistic is clearly not a scientific measurement because I have also seen claims of 40%, 50%, and 60% depending on the source. Regardless the point is clear: This stuff is time-consuming.
I spoke about the rapid growth in design verification engineer jobs in the past ten or so years. The same study finds that individual chip designers spend a lot of their working time on verification:
> In 2014, design engineers spent on average 47 percent of their time involved in design activities and 53 percent of their time in verification. While in 2018 we found that design engineers spent on average 54 percent of their time involved in design activities and 46 percent of their time in verification.
The expansion in verification jobs and time spent is because of what we call the "verification gap". Modern day verification tools cannot keep up with the accelerating complexity of today's modern chip designs.
The sheer number of test cases that exist, in some cases estimated to be around 10 to the 80th power, means that it is impossible to look at every possible test case in existence. The exploding numbers are exacerbated by a few modern design trends - two of which I want to specifically call out here.
Trend 1: The System on Chip
One of the fastest growing trends in modern chip design is related to the aforementioned SOC. An SOC integrates a whole bunch of different, often pre-designed IP components together on one microchip. The benefits of a SOC design are gains in performance, size, and power consumption.
A well known example is the Apple A-series chips. These are systems on chips that integrate a CPU with dedicated neural network hardware, GPU, image processing, and more.
The design work impact associated with building a SOC is not that great. But the verification impact gets very large. This is because you are essentially integrating other people's code to your own projects, and it all needs to be tested.
Imagine today's modern web programming projects, where a single finished project is reliant on bunches of other people's projects. Despite people's best efforts, bugs in these dependencies can easily cascade throughout the whole thing. Which is why a lot of software programmers concern themselves with code coverage.
Intel studied and classified the 7,855 bugs in its Pentium 4 processor design prior to tapeout. Pentium 4 was released in 2000. Old, I know.
They found that the two single most common bugs were because of careless coding - typos and cut-and-paste errors - and miscommunication. These type bugs are linearly proportional to the lines of code in a project. And since the projects are indeed growing larger at a rapid rate, that means more bugs are lurking inside and need to be found.
Trend 2: The Shortening Design Cycle
The second major trend in the industry has to do with the shortening industry design cycle. It used to be that teams had a lot of time, about 3-4 years, from the exploration step to production.
But intense competition has aggressively shortened that cycle. Companies like Nvidia consider their rapid iteration and release cycle a key part of their overall business strategy. And if one company does it, then the rest of them need to speed up lest they lose market share.
It increases the risk of shipping a product with an undetected bug. Even with multiple teams working in unison, a shorter product cycle makes it easier for errors and misunderstandings to slip through the cracks.
Recent reliable survey data for what are called "respins", an immensely costly situation where a chip has to go back to the drawing board, is hard to come by. Proprietary information, I guess.
But I managed to dig up something from a 2002 industry study that claims that over half of all chip designs in North America fail the very first time they are fabbed.
And a white paper from Wilson Research Group, commissioned by Siemens though so take it with a grain of salt, corroborates this. Saying that about 32% of projects in 2020 are able to achieve success on the first "spin". That means 68% fail.
New Solutions
The explosion in verification work in the late 2000s told companies that they had under-scoped the scope of the task. And it gave them a kick in the pants to mature their verification tools and the procedures.
Today's verification teams use simulation and FPGA emulation a whole lot more than before. Simulators are a backbone of EDA product offerings. And FPGA emulation allows for the hardware and software to be tested together. The one big drawback with FPGA of course, being that managing such an effort is a big engineering effort in of itself.
Another popular solution is a methodology called constrained random verification. It replaces traditional test methodologies with a system that generates a whole bunch of random tests. The verification engineer only needs to provide the system with a template.
The example I have seen used to explain this is a keyboard. You want to test if hitting the A-button properly creates the signal for the letter "A". You create a scenario where you hit the A-button a few times.
Well it worked fine in that one scenario, but perhaps it might fail in a different one? Like, after you hit the B-button? But how do you test for all the different scenarios in where you hit the letter “A”?
Thus, you tell the random test generator to generate a huge number of scenarios - random key presses in a variety of different sequences for your keyboard. If the A-button passes all of them, then you can be reasonably certain that the A-button works fine.
So kind of like that, but expanded from just 1 button to many possible hardware inputs.
There are many ways to go from here. Engineering teams are starting to apply machine learning and data mining to the constrained random verification procedure to automate the creation of the test template. That sounds pretty cool, but is far beyond the scope of this little video.
Conclusion
I have dabbled in web and iOS programming a little bit. One of the coolest parts were all the open source libraries, offering cool features for anyone to add to their own projects. Slap together a few libraries, wire them up and BOOM you have something that kind of looks professional on the surface.
Now of course, there are vast differences. Chip design teams are not afforded the same flexibilities as their software counterparts. But doing this video, I am a little surprised by the connections between that scenario and today's hardware SOCs.
Verification largely remains as much an art as a science. You can measure how much a simulator program touches a line of code, but that number says nothing about whether that line fully met its functionality.
As the industry continues to grind ahead into the future, and more integrated circuits enter our lives, this verification gap is going to matter a lot more in terms of functionality and security.