Edge processing architectures in today’s autonomous and AI military systems, process an ever growing amount of sensor data. Many of these systems or devices used for edge processing applications in forward-deployed environments need to be small, rugged and agile. To handle this extreme workload, system architects must design boards using the fastest field-programmable gate array (FPGA) devices and multicore processors. These devices cannot provide peak performance without massive amounts of high-speed DDR4 memory for resident data and real-time execution. Faced with additional challenges, the system architect must design these systems to meet the size, weight and power (SWaP) constraints of smaller, more agile edge processing platforms integral to our warfighters’ mission success. To support the system requirements, each embedded board within the system could need a minimum of 64GB of memory per processor, equating to more than 128 separate commercial-grade memory devices or multiple dual inline memory modules (DIMM), for layout on a printed circuit board. This is not a feasible solution for the embedded boards at the core of ultra-compact edge processing architectures in military systems operating in harsh, forward-deployed environments. Instead, high-density, military-grade memory manufactured with state-of-the-art 3D packaging technology must be utilized for space and power savings, while maintaining reliability in harsh environments.
The problems are stacking up
The complexity of die stacking and wire bonding increases with each additional die required for high-density memory design, such as Mercury’s single 16GB DDR4 device or a custom multi-chip module (MCM) device. With so many circuits in a tightly stacked configuration, signal integrity is a requirement at the forefront of design considerations. The two main components of compromised SI in the context of this discussion are crosstalk and return loss performance.
- Crosstalk is the unwanted voltage noise coupling due to strong mutual inductance and mutual capacitance. More simply stated; it is the interference to a signal in one circuit caused by the signal transmission in an adjacent circuit in the die stack.
- Return loss is the loss of signal energy due to impedance discontinuities, which reflect a portion of the signals energy back to its source instead of carrying through to the final termination.
Left unaddressed, these performance issues limit data speeds in stacked memory devices, thereby comprising overall system performance and reliability. In mission-critical military applications, this can lead to catastrophic system failure.
Traditional die stacking design topologies have their limits
Traditional multi-chip stacking design methodologies use a branch topology, where multiple transmission lines are routed from the same electrical node. This is an effective design method for DDR2 and DDR3 devices as it enables the required data rates and densities those generations of devices can deliver. Skillfully designed stacked, DDR4 devices are feasible with this method, as seen in Mercury’s 4GB DDR4 device. However, there are inherent limitations for high capacities as the increased termination path or bus length causes signal distortion and limits the maximum bandwidth of the transmission line due to reflections.
As the number of die stacked increases, cross talk and return loss performace continue to degrade to a detrimental point. Branch topology reaches its maximum capability thereby ruling out this method for use in highly dense, high-speed DDR4 and DDR5 devices. SI engineers must look at alternative design methodologies to enable the next generation of smaller, more agile military systems for the highest density DDR4 and DDR5 devices.
Eye diagrams are used in this blog to illustrate the quality or bit-error performance of the high-speed digital signal(s). Each eye diagram provides a visual representation of millions of transmitted bits, with signal amplitude on the vertical axis and unit interval (UI), or bit period, on the horizontal axis. As digital signals are degraded by frequency dependent losses such as crosstalk, the actual signal deviates from the ideal signal. The deviation from the signal amplitude is called noise, and deviation from signal time is called jitter. These unwanted frequency-dependent losses compromise signal quality, ultimately reducing the signal-to-noise-ratio (SNR) below receiver detection thresholds and producing bit errors. The diamond in the center represents the bit error compliance mask with both minimum amplitude and minimum bit period limits. If any bits violate the bit error compliance mask, the digital signal fails to meet the minimum performance requirements.
High-density DDR4 innovation realized
Now to reach the high-speed requirements of DDR4, SI engineers face two main challenges:
- Reducing crosstalk, prominent with designs using non-transverse electromagnetic (TEM) conduits such as a redistribution layer (RDL) and bond-wire.
- Meeting a minimum of -12 dB return loss performance.
The solution: Enhancements to the interconnect layer as a result of a coplanar topology that supports higher frequency operations than branch topology. This method shortens the path between the two terminations, while eliminating stubs, consequently improving signal integrity and timing. To achieve this, routing signal paths sequentially from one die to the next eliminates reflections associated with stubs or extra traces previously seen in branched designs. High-speed data rates are achieved through the creation of a contiguous signal return path and linear bus path by using microstrip transmission line technology. Additionally, considerations made to signal and return path trace geometry further enable higher data rates and improvements to return loss.
With this topology, achieving a return loss of -16 dB through a delicate balance with crosstalk enables the miniaturization of 18 memory devices in a single compact package while offering 2666 Mbps date rates over military temperature ranges. With this achievement, Mercury introduced the first 16GB DDR4 device. However, while return loss is optimized with this method, improvement to crosstalk performance is still needed to meet DDR5 data speeds.
The path to military-grade DDR5
With expected double bandwidth and density over DDR4, along with improvements to power and channel efficiency, advanced edge processing architectures will utilize DDR5 devices to increase performance. However, even with Mercury’s advancements in the coplanar topology for a high-density multi-chip package previously discussed, the higher data speeds for DDR5 still cannot be attained.
Further improvements to crosstalk performance and the inter-die network are necessary. Developing a unique multi-planar ground and signal trace layout applied to the RDL increases crosstalk isolation resulting in a performance improvement of 6dB. No other known die stacking design methodology is available today for the commercialization of high-density DDR5 in a singular device with data rates up to 6400 Mbps.
With the DDR5 JEDEC standard still in development, commercial DDR5 devices are set to release in 2019. Mercury’s military-grade, high-density devices supporting speeds up to 6400 Mbps using our new advanced topology techniques will follow shortly after with release in 2020. Designers and users of next-generation military edge processing systems will soon realize the maximum performance of their high-speed multi-core edge processing systems due to the integration of high capacity, high-speed stacked DDR5 while simultaneously benefiting from a much smaller system footprint.