Floorplanning a Better Chip with AI
If you want to watch the video, it is below
Machine learning has been in the news a lot lately. Some of the early hype has died down, but the trend still lives on. And now it has really started to make waves in the chip design world.
Machine learning and AI in chip design is such a sprawling field that I started to lose myself in all the research. So I figured to just go into a recent breakthrough in the chip design field: Floorplanning.
Google has been applying the same AI prowess that allowed them to badly beat the best Go masters to this obscure, but important sub-category of the field.
And a recent paper in the field by (and I am going to butcher this name) Azalia Mirhoseini and Anna Goldie illustrated their success with this approach.
TechTechPotato already did a video mentioning this. It was a great piece. Go check it out.
For my take, I will try to avoid treading old ground and go a little bit deeper. Feel free to scream at me in the comments if I fail.
Floorplanning in Physical Design
I recommend that you go through my earlier piece on EDA software - what it is and how it helps with the overall chip design process.
Floorplanning is the first major step in physical design. If you recall from our discussion about how chips are designed, the physical design stage comes after the logic design stage.
After the logic designer finishes their work, the chip is a cluster of groupings of logic and memory circuits connected together with wires. This grouping is referred to as a "netlist".
In modern chip design, additional abstractions are introduced. Tens of millions of logic gates are grouped together into things called standard cells. Thousands of memory blocks are grouped into things called macro blocks.
The physical designer's job is put all those onto the chip canvas and then wire them up with tens of kilometers of wiring. Tens of kilometers of wiring on a chip the size of a fingernail - modern technology is amazing.
Considerations in Floorplanning
The goal of floorplanning is to place and arrange the blocks and superblocks in a way that best meets all the requirements without any overlaps. Moving blocks around. Sounds literally like child's play right?
One of the big goals is to minimize the amount of "white space", which is any space in the floorplan uncovered by a block.
In addition to this, however, the floorplanning chip designer needs to keep in mind a bundle of other factors.
One is that in modern VLSI design the area and size and shape of the chip's floorplan tends to be fixed. In other words, someone else has already decided that the chip cannot be larger than this or that size. This is often the case for a mobile phone, where the smaller the chip is, the more battery can be stuffed inside.
Another consideration is wire placement. The chip components should be placed and the wires between them should be connected in a way that minimizes latency, power consumption, and heat. Intel has run studies and found that 51% of a microprocessor's power is consumed when driving signals through its interconnects.
Traditional EDA tools are not really equipped for this. They are more qualified to position millions of small cells within specified parameters. Floorplanning presents the opposite problem, placing large blocks with little information about the chip's future parameters.
It is like trying to figure out where to place all the doors, hallways, wall sockets, water connections, and windows in a house. But without yet knowing where the rooms are and how they will be used. Oh and the house has over 10 million rooms.
The possibilities get simply mind boggling. If Go is a whole level more complicated than chess, then chip floorplanning is yet another level higher.
The Mirhoseini and Goldie paper says that chess has 10 to the 123rd power number of states.
Go, 10 to the 360th power.
Chip floorplanning? Up to 10 to the 9000th power.
Every time TSMC or Samsung debuts a new leading edge node, the netlists get even larger and the problem gets even harder.
Old Ways: Simulated Annealing
Currently, chip designers tackle the issue with a variety of approaches.
One of the most popular is called simulated annealing. Simulated annealing uses an objective equation to calculate a particular floorplan's cost. This cost is usually based on a few objective input factors - usually how long the wires are and how large the area size is.
By turning the floorplan into an equation, a computer can now iteratively (or greedily, as they say) search for a globally optimal solution. Kind of like how your graphing calculator tries to solve an equation back in math class.
There are a few flaws with simulated annealing. For one thing, the program might get stuck at a local optimum close to where it started its search. The program thus errs in declaring victory too early - unaware that an even better global optimum lies just over the hill.
Thus, sometimes we might ask the algorithm to modify its behavior when going up or down the "hills" of the cost curve. These "hill-climbing methods" help the algorithm to escape these local optimums in search of the best solution.
The sheer number of placement possibilities leads to very long search times even if the cost function is by itself not all that difficult to run. As a result, designers have introduced very creative tweaks to the approach.
One tweak would be to use "compacted floorplans" - where no modules can move left or right. This allows the floorplan to be represented as an ordered binary tree of nodes. Then we can use a computer to try to optimize each node's x and y coordinates in relation to the root node. Pretty creative I have to say.
There is more than one way to skin a cat. Before we close, here are two others methods: The Analytical Method and Partitioning.
The Analytical Method uses an objective mathematical formula and a set of constraints to try and calculate the smallest possible floorplan. It is like trying to solve for one of those algebra equations you get in school.
And Partitioning attempts to overcome the larger problem by breaking it into smaller circuits. You take passes and in each pass you divide the canvas into smaller sets, optimizing all the way down, and so on.
I am not really satisfied with any of these explanations, but this is not an electrical engineering course.
The reality is that none of these are by themselves a silver bullet. In the end, designers do this semi-manually with a potpourri of methods that best fit their circumstances and constraints as well as a healthy dose of human intuition. No one uses just one thing.
Since it takes so long to test and iterate, it becomes a real drag on chip team productivity and iteration speed. If we had years to develop and release a new chip design, then this would not be a game-breaker. But deadlines exist.
So the whole thing is ripe for trying a new approach. And that is where machine learning comes into the picture.
An Intro to Machine Learning
So what is the big deal about machine learning and AI? I will take a pause to briefly explain it.
The word AI means nothing now - a broad blanket term kind of like "the cloud". Many things have AI. Video game characters and vacuum machines for instance. Machine learning, however, does mean something very specific and is not necessarily interchangeable with AI.
Scientists have been developing AI since the 1950s. However those approaches centered on the concept of “symbolic AI”. Symbolic AI assumes that human knowledge can be approximated using large amounts of hand-written rules and sets.
This had its successes. But as it turned out, researchers underestimated how much implicit knowledge we humans all share about the world and ourselves. Basic things like, "If President Biden is in Washington, then his left foot is also in Washington." Common sense, essentially.
Machine learning is a new AI approach dedicated to overcoming this problem, enabled by recent advancements in GPU processing power. Rather than having humans write the rules, let data do it.
Neural networks are one implementation of the concept. You feed the neural network some data. The network's results are compared against pre-labelled results. The difference between the two is called the "error".
The goal is to minimize the error without over-fitting to the data. Depending on the results, tweaks are made and the whole thing retrained or retested.
Sometimes you are going to get some weird results. And in those situations you might have to make more extensive tweaks to the entire process.
But when done right, well-trained networks can indeed work uncannily well.
Machine Learning How to Floorplan
The Google team approached this problem like as if it were a game. And it kind of is. Like Go, there is a board (the chip canvas) and varying pieces to place on that board (your netlist blocks). There are even "win conditions", though these depend on the relative importance of the various floorplan evaluation metrics.
The goal thus would be to train a neural network capable of dynamically helping the chip designer with their floorplanning efforts. In other words, help them win the game.
So if you look at how AlphaGo works, a trained neural network helps the player identify the best moves and the winning percentages of those moves. The player can decide whether or not to actually place the stone, making the network a tool.
My best guess is that Google's AI Floorplanner acts similarly - though I cannot quite find an explanation in the paper of how the actual tool works. But it does go over how they created the tool.
First, the team set the broad parameters of the chip - netlist metadata, process technology node, for instance. These are fed into the neural network.
The neural network is then trained. This is done by "showing it" many episodes of states, actions and rewards. The computer is shown a chip canvas - a "state".
It then places a series of macro blocks (which if you recall from earlier are the aggregations of memory blocks) onto the chip canvas - the "actions". The standard cells or logic circuits are then placed with a standard EDA tool.
The final canvas is evaluated for its reward. Reward being negatively correlated to wirelength, congestion, and density. These are standard measures of a floorplan’s “favor”. The reward information is then provided back to the network for future training.
Over time, the network is shown enough "episodes" of this state+action+reward cycle that it can look at any chip "canvas" and know where to set its macros so to maximize the final reward.
The final results are interesting. In some ways, the neural network performs as well as a human. Most notably, it can do the job much faster.
During the design phase of Google's latest TPU chip, a human expert would have had to iterate for months on the floorplan with the latest EDA tools. He took the netlist code, manually placed the blocks, and then brewed coffee for 72 hours while the EDA evaluated a floorplan iteration.
With this slow iteration cycle, a TPU-v4 floorplan took about 6-8 weeks for a human to do. But the ML floorplanner, with its trained ability to look far ahead on the proverbial chessboard, was able to do it in just 24 hours.
The paper emphasized the speed. And indeed the speed gains are quite great and have significant worth. But the paper sometimes also said it was "better", and for that I am not quite sure.
Yes on occasion, the ML floor planner did better. But the human also did better on other occasions. And then sometimes the industry-standard EDA tool had them both beat. For the most part, on all metrics other than speed, the three methodologies were very close, usually within a 2-5% range.
And I think that is something to consider when it comes to this stuff. Machine learning is not superhuman magic. It is based on data created or curated by humans - and that generally means it will perform about as well as them.
Apple is probably the best chip designer out of the American tech giants. But Google is taking rapid strides in developing their own chip proficiency. Strategically, they seem to have recognized that having their own hardware is critical to achieving superior cost and performance thresholds.
They first began showing this in their databases, and now in their consumer hardware. The paper notes that the new neural network has already been applied to the chip design process of its latest Google tensor processing unit (TPU).
They have also started to bulk up their hiring presence in Taiwan, so we should expect to see a lot more out of Google in the chip space going forward.
One last thing, if machine learning can help place macro blocks onto a chip canvas much faster than a human can, then there are many other possible use cases for this type of skill. These include hardware design, city planning, and vaccine planning and distribution.
The possibilities are very intriguing and I look forward to seeing this technology appear in more use cases in the future.