In this paper, CASCADE---a standard super-cell based design methodology, its supporting automated design flow, and associated design tools, are presented for three-dimensional (3D) implementations of a class of interconnect-heavy application- specific very large scale integrated (VLSI) circuits. In CASCADE, a system is first partitioned and synthesized using standard 2D design tools to a set of super-cells with the same height and varying width. With this, the 3D design is reduced to 3D super-cell placement and 3D via assignment. A congestion-driven simulated annealing method is used to find a 3D placement of super-cells to minimize the total wire length, the longest wire, the number of 3D vias and routing density. To efficiently estimate the routing density of a 3D grid space within the optimization loop, a simple probabilistic congestion model with an incremental congestion computation has been developed. Once the super-cell placement is fixed, the problem of assigning 3D-vias to accomplish minimal 2D routing densities and uniform 3D via distribution is solved by an efficient min-cost max-flow method. The proposed methods have been implemented and tested on a set of ISPD98 circuit benchmarks. Experimental results have shown that the proposed congestion-driven 3D super-cell placement and flow-based 3D via assignment tools have yielded satisfactory placement with small area, low-congestion, short wire length, few and uniformly distributed 3D vias. Further, an excellent correlation between routing density estimation by our model and the actual routing performed by a commercial router has been observed. We have applied the proposed 3D design methodology, tools, and flows to tape out an over 4-million-gate Low-Density Parity-Check (LDPC) decoder in a 3-tier 0.18mm FDSOI 3D CMOS process manufactured by MIT Lincoln Laboratory. The post-layout simulation of this DRC-clean layout design showed an about 10´ improvement on the power-delay-area product compared to a 2D implementation in the same process.