ICE5 FPGA utilization

The biggest ICE5UL is "4K" LUTs - but actually that is already highly misleading since

Naming Actual LUTS
"1K" 1100
"2K" 2048
"4K" 3520

eh where did the other 500 LUTs go?

So far my HDL going in it has a headline count of 2269 LUT after the Synopsys synthesis tools ran on it.

Input Design Statistics
    Number of LUTs          :   2269

So there's plenty of room yet, right?

Design Legalization

Structurally the ICE5 groups his LUTs in 8s in "PLB"s, Programmable Logic Blocks. Intra-PLB LUTS have privileged access to each other, especially in terms of fast carry generation.

https://warmcat.com/ice5-plb.png

So this structure creates a discontiguity between LUTs that are inside the same PLB and outside. If you instantiate a counter for example, the LUTs doing the bits for the counter need to go in the same PLB to get access to the hardware carry acceleration.

After the generic synthesis from the Synopsys tools - which gives the 2269 LUT figure, there is a "Design Legalization" step. The generic synthesis does not know about PLBs and other special device restrictions, and the "Legalization" step does, so it modifies the synthesized RTL to meet the restrictions by placing the LUTs in PLBs and adding "feedthru LUTs".

No documentation I could find either in the package or in Google discusses the exact set of rules being done during legalization. But the results are really expensive, for my design anyway

Design Legalization Statistics
    Number of feedthru LUTs inserted to legalize input of DFFs      :   508
    Number of feedthru LUTs inserted for LUTs driving multiple DFFs :   0
    Number of LUTs replicated for LUTs driving multiple DFFs        :   5
    Number of feedthru LUTs inserted to legalize output of CARRYs   :   93
    Number of feedthru LUTs inserted to legalize global signals     :   1
    Number of feedthru CARRYs inserted to legalize input of CARRYs  :   3
    Number of inserted LUTs to Legalize IOs with PIN_TYPE= 01xxxx   :   16
    Number of inserted LUTs to Legalize IOs with PIN_TYPE= 10xxxx   :   20
    Number of inserted LUTs to Legalize IOs with PIN_TYPE= 11xxxx   :   0
    Total LUTs inserted                                             :   643
    Total CARRYs inserted                                           :   3

What exactly made the "input of DFFs" 'illegal'? ("Number of feedthru LUTs inserted to legalize output of CARRYs" I can maybe understand, it looks like if you generate any CARRYs, you have to ripple it to the end of the PLB where it has an external carry output. But that's only 93 LUTs.)

Basically it added 646 LUTs to the design, bloating it by 28%. The key post-legalization headline figures become

Device Utilization Summary
LogicCells                  :   2945/3520
PLBs                        :   411/440

If more PLBs are initially required in the design than are available, the tools try to reallocate LUTs to reduce the number of PLBs. There seem to be placement choices that make the PLB count somewhat fluid, even if it was initially overcommitted; presumably this is about merging two partially used PLBs into just one more completely used one. Eg initially (it calls itself an error, but it is not fatal yet):

E2070: Unable to fit the design into the selected device. Number of PLBs in design = 451, available in device = 440

Then after the placer step runs

Device Utilization Summary
    LogicCells                  :   3258/3520
    PLBs                        :   437/440

and the Place and Route seems to complete OK. Even so, this makes the "new normal" living borrowed time... and the timing of the routed design takes a hit from not being able to place the PLBs as desired.

But if that doesn't work out, your design simply cannot be implemented...

E2070: Unable to fit the design into the selected device. Number of PLBs in design = 462, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 447, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 446, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 446, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 446, available in device = 440
E2071: Placement feasibility check failed
E2055: Error while doing placement of the design

... and the synthesis stops dead.

So before you need to worry about running out of LUTs, you need to worry about running out of PLBs.

Nondeterministic packing

Running out of resources is never going to be pleasant but it's particularly difficult to deal with on ICE5. Although there is a rough relationship between Synthesis LUT usage and post-legalization usage - more LUTs means more PLBs, it is unpredictable.

Here is a plot of my VHDL synthesis results showing LUT consumption vs PLB consumption, from the last few days

https://warmcat.com/plot-lut-plb.png

You can see even if you are at around 2800 post-legalization LUT usage (80% occupied) you may be using 93% of the PLBs already. Conversely once you are right up against the PLB limit, you can stay there over a spread of 250 LUTs being used or not. So it is very difficult to know when you will actually fail.

Even more confusing there where two times after I started monitoring this that removing logic from the HDL, reducing the LUT usage, increased the corresponding PLB usage, presumably due to some quirk of the packer step.

Basically the PLB usage figure is telling you if even one of the LUTs in the 8-LUT PLB is in use. There's no way to see if you can use whatever spare LUTs are left in the PLB except try it out. Considering this state of affairs starts at a synthesis LUT usage of 2600, on a device that notionally has ~1000 additional free LUTs, you should significantly derate your expectation of how many LUTs are going to be usable before you enter a kind of twilight scavenging world every time you place and route your design.