SemiAnalysis: Intel's EMIB-T Closes Gap with TSMC in Advanced Packaging Race, Custom HBM Reshapes AI Accelerator Economics

ECTC 2026 Conference, July 2, 2026

The advanced packaging arms race took a decisive turn at this year's Electronic Components and Technology Conference, where Intel disclosed the most complete picture yet of its EMIB-T roadmap and validated performance at bump pitches that directly challenge TSMC's CoWoS platform dominance. Meanwhile, Marvell's technical deep-dive into custom HBM reveals how accelerator designers can reclaim 60% of die area currently consumed by memory interfaces, a shift with profound implications for GPU economics as Nvidia prepares to deploy the technology in its Feynman architecture.

Intel EMIB-T Roadmap Targets Google TPU v9 Win

Intel's presence at ECTC was overwhelming, with 12 papers presented compared to TSMC's meager three, and the technical substance matched the volume. The company demonstrated EMIB-T functioning at 36 micron bump pitch on packages with twice the reticle-sized silicon content, representing a 65% increase in bump density versus the 45 micron pitch deployed in Granite Rapids. More importantly, Intel is now expanding validation to 4.5 times reticle silicon packages, with certification targeted by year-end 2026.

The pitch roadmap extends further. Intel showed a test vehicle operating at 25 micron bump pitch connecting two one-reticle dies through a single 3 millimeter by 18 millimeter EMIB-T bridge. Below 25 microns, however, Intel acknowledged that solder volume constraints become severe enough that the limiter shifts from bridge routing density to bump formation, placement accuracy, and assembly yield.

Perhaps most striking was Intel's quarter-panel demonstration: a 240 millimeter by 240 millimeter test vehicle equivalent to roughly 67 reticles of area. The booth sample exhibited severe warpage at that scale, but the message was clear. Intel is evaluating advanced lithography approaches to maintain overlay tolerances at quarter-panel and even full-panel dimensions, though substrate handling and panel-level patterning remain first-order constraints.

The EMIB-T architecture itself has evolved substantially beyond the embedded bridges shipping in current products. Intel's cross-sections revealed 10 metal layers including four routing layers, with metal-insulator-metal capacitors integrated between M1 and M2. The through-silicon vias that give EMIB-T its name enable vertical power delivery directly through the bridge, reducing DC voltage drop by 68% to 80% compared to conventional EMIB where power must spread laterally through package and die-side routing.

Intel disclosed a capacitance density of 500 nanofarads per square millimeter for the on-bridge MIM capacitors, roughly comparable to Intel 18A process technology. The company claims these capacitors improve power delivery network AC impedance by more than 82% versus an EMIB-T package without bridge capacitors, directly addressing the HBM4E power delivery challenge.

For HBM4E specifically, Intel simulated channel performance at speeds from 12 to 16 gigabits per second. At 12 gigabits per second, the company showed approximately 67% unit interval eye width without receiver equalization, improving to 72.5% with a one-tap decision feedback equalizer. The UI eye width remained above 60% at all tested speeds with modest pad capacitance reduction.

Despite these advances, Intel remains behind TSMC on several vectors. TSMC has already deployed deep-trench capacitor integration and is further along on integrated voltage regulators and active local silicon interconnect. Intel has disclosed substrate-core embedded deep-trench capacitor concepts and capacitors exceeding 2,500 nanofarads per square millimeter, but neither has appeared in shipping EMIB products. EMIB-T narrows the gap meaningfully, but Intel is still catching up to an ecosystem that has been executing in volume for years. The disclosures strongly suggest EMIB-T is positioned for Google's TPU v9, representing Intel's most credible path back into large-package AI accelerator manufacturing.

Marvell Custom HBM Solves the Shoreline Problem

Marvell's ECTC presentations finally provided the package-level detail behind custom HBM, a concept the company announced at its 2024 Industry Analyst Day but left frustratingly vague until now. The economics are straightforward and brutal: JEDEC-standard HBM forces every accelerator to implement standard PHYs and route an extremely wide parallel interface with standardized pad placement. As packages grow larger and HBM speeds increase, that fixed boundary makes it progressively harder to optimize shoreline, routing density, power delivery and signal integrity.

Custom HBM keeps the DRAM core dies unchanged but replaces the base die with a custom version fabricated on an advanced logic process. That custom base die integrates the HBM controller, management and monitoring capabilities, custom logic, and expansion interfaces. Marvell claims this approach reduces the host ASIC footprint dedicated to HBM PHYs and associated logic by approximately 60%, directly freeing area for more compute, cache or I/O.

The routing improvement is equally significant. Marvell's example used 1,024 channels at 32 gigabits per second, reaching 4.1 terabytes per second, equivalent to a 2,048-bit JEDEC HBM4E interface at 16 gigabits per second. The custom interface shortened interposer channel length from 6.5 millimeters to 1.5 millimeters, allowing Marvell to maintain the same nine routing layers and two micron line and space while increasing bandwidth.

Marvell's implementation uses an organic redistribution layer interposer rather than silicon, reducing packaging costs. Organic RDL is limited to much coarser line and space than the silicon interposers in CoWoS-S or the silicon bridges in CoWoS-L and EMIB-T, forcing Marvell to rely on customized shielding and routing patterns in different sections to maximize bandwidth density while controlling crosstalk.

The strategic implications extend beyond a single product. At GTC, Nvidia announced that Feynman would use custom HBM, and the rationale aligns with Marvell's: higher bandwidth, lower power, and dramatically less accelerator die area consumed by HBM interfaces. SemiAnalysis estimates that approximately 16% of Rubin GPU die area is dedicated to HBM-related logic and PHYs. Custom HBM allows Nvidia to offload much of that burden onto the HBM base die, recovering silicon for revenue-generating compute.

Custom HBM also enables expansion interfaces beyond the standard link. Rather than forcing all memory traffic through the limited accelerator die shoreline, the base die can function as a secondary memory controller and fan out to additional memory, whether higher-capacity lower-bandwidth LPDDR or even a second layer of HBM. This architecture is directly relevant to AMD's upcoming MI450 and future MI500 GPUs, which will support LPDDR for increased memory capacity.

HBM4E Interposer Complexity Doubles

Samsung's HBM4E interposer presentation quantified the packaging challenge facing the industry. HBM4E pushes data rates to 12 gigabits per second and above while doubling the I/O pin count, increasing routing complexity to the point where HBM4E could require twice the interposer layers versus HBM3E and five times as many as HBM2. Power consumption is expected to increase 86% versus HBM3E and 5.6 times versus HBM2 due to the increased I/O count and higher data rates.

Samsung proposed an eight-layer silicon interposer it claims reduces layer count by 20% versus the estimated requirement. The interposer uses a repeated two-signal one-ground staggered arrangement to shield high-speed signals, with 75% of layers allocated for signal routing. The design incorporates ultra-high-density capacitors, likely similar to Intel EMIB-T MIM capacitors or TSMC CoWoS deep-trench capacitors, but these can only be placed on the M1 layer, which is also heavily used for signal routing.

If routing is unbalanced, capacitors get pushed to one side of the interface, creating uneven power delivery network behavior between logic and HBM sides. Samsung's layout redistributes routing across M1 and other layers so capacitors can be placed more evenly across the entire interface, reducing PDN impedance and voltage noise while keeping routing density manageable.

Samsung also addressed HBM thermals, particularly with hybrid bonding. With 16-high HBM, thermal resistance remains acceptable, but future generations moving to 20-high and 24-high HBM require new approaches. Samsung compared thermal compression bonding and hybrid copper bonding for HBM on 2.5D GPU packages similar to Nvidia Blackwell, with two GPU dies and eight HBM stacks. Internal HBM thermal resistance dropped by 12.2% with air cooling and 12.9% with liquid cooling using hybrid copper bonding. Total HBM thermal resistance dropped by 3.5% with air cooling and 7.7% with liquid cooling.

The thermal benefit is uneven because hybrid copper bonding only addresses part of the thermal network. Samsung separated the path into internal resistance, system-level resistance, and GPU-to-HBM crosstalk. Internal resistance and crosstalk dropped by approximately 12.5% and 9.8% respectively, but system-level resistance including thermal interface materials and cooling increased by roughly 2.3%.

As more power moves into the HBM base die, such as in memory-bound workloads or custom HBM implementations where the memory controller and more logic move into the base die, GPU-to-HBM thermal crosstalk becomes a smaller share of total thermal resistance, falling from 13% at baseline base-die power to 5% at three times base-die power. Samsung estimates that moving to hybrid copper bonding could allow inlet temperatures to rise by one to two degrees Celsius at constant package power, or package power to increase by approximately 4% at constant temperature, with cooling power falling by roughly 7%.

Microfluidic Cooling Enables Five-Kilowatt Packages

TSMC demonstrated direct-to-silicon cooling on a large GPU-like test vehicle on CoWoS-R, using a 3.3 times reticle interposer with four SoC dies and eight HBM stacks. The company compared three approaches: a conventional lidded cold plate package, a lidless cold plate package, and its micropillar direct-to-silicon design where micropillars were formed directly onto the backside of the SoC dies.

With conventional cooling at one to two liters per minute, a lidded package dissipated 1.9 to 2.3 kilowatts while the lidless package dissipated 2.5 to 3.0 kilowatts, using relatively warm 40 degree Celsius deionized water. Both solutions saturate beyond four liters per minute because thermal interface material becomes the bottleneck. The micropillar test vehicle matched the lidless cold plate result at two liters per minute, then pulled ahead at higher flow rates, dissipating four kilowatts at four liters per minute and 5.3 kilowatts at eight liters per minute. Across the full test vehicle, TSMC reported uniform power dissipation above five kilowatts.

Microsoft took a different approach, using straight microchannels etched into GPU silicon rather than micropillars. More significantly, Microsoft tested on an actual Nvidia GH200 GPU rather than a thermal test vehicle, capturing real thermal distribution and hotspots more accurately. Microsoft tested a variety of workloads on the GPU including HPCG and HPL, each with different compute and memory stress characteristics.

Across these workloads, Microsoft reported 51% to 60% lower junction-to-inlet thermal resistance for the GPU at a one liter per minute flow rate. The HBM improved less, at only 27% to 37%, because it was still cooled through a cold plate and thermal interface material. Overall, the package achieved a 50% reduction in thermal resistance.

Microsoft also provided preliminary reliability data, critical for datacenter deployments requiring high reliability and low downtime. Over six months, Microsoft recorded only nine potential clogging events across approximately 4,370 observations. The rate declined over time, suggesting early instability after installation followed by a more stable operating period. Even after six months, there was no measurable silicon erosion in the microchannels. At the node level, the GH200 successfully completed three weeks of repeated benchmarking followed by a one-week continuous run at stable package power. Microsoft is still testing cluster-level mean time between failures and availability.

Marvell and Lightmatter Push Optical Interconnects

Marvell's presentations on its Optical Multi-Chip Interconnect Bridge and Photonic Fabric, both acquired through its purchase of Celestial AI earlier this year, revealed a more practical near-term approach to photonic integration than full photonic interposers. Rather than fabricating a multi-reticle photonic interposer with challenging yield implications from reticle stitching, Marvell embeds a photonic integrated circuit in the organic RDL interposer only where needed, using electrical bridges in other regions.

As the PIC is embedded in the RDL, its grating couplers would normally be obstructed after overmolding. Marvell places a silicon or glass optical block over the grating region before molding to maintain an optical path to the top surface where the fiber array unit can be attached. Marvell's OMIB test vehicle has one primary XPU die and six EIC dies on top, with six PICs, six electrical bridges and 12 deep-trench capacitor dies embedded in the interposer. The roughly two times reticle RDL interposer uses four layers at two micron line and space.

Marvell showed a conceptual multi-die XPU with optical chip-to-chip interconnects to reduce latency and hop count. The company claims that OMIB removes shoreline limitations since the same bridge can route both on-package die-to-die links and external optical interconnects. Marvell cites a bandwidth density of 1.8 terabits per second per square millimeter with this approach.

In the near term, vertically stacked optical engines like TSMC's COUPE are more achievable than OMIB-style connections or a full photonic interposer. Marvell connects the EIC and PIC using microbumps at 50 micron pitch, then mounts the resulting engine to either the package substrate or an interposer. The substrate configuration can use a UCIe-S-like parallel bus at a coarse 130 micron C4 pitch, while the interposer configuration can use a UCIe-A interface at a tighter 40 to 45 micron pitch. Marvell favors the substrate approach due to its simplicity and better thermal isolation.

Marvell tested an optical engine using a five nanometer EIC, likely TSMC N5, with four 56 gigabits per second transmit-receive pairs for 224 gigabits per second in each direction. The design uses electro-absorption modulators instead of the micro-ring modulators preferred by other companies, citing better thermal stability and a wider operating wavelength range. While these advantages are real, SemiAnalysis believes that EAMs will prove difficult to manufacture at scale.

Marvell also compared thermal characteristics of an optical engine connected via UCIe-S on substrate and UCIe-A both on a silicon interposer and over a silicon bridge. Under full XPU load, the PIC temperature rose by less than five degrees Celsius on the substrate, versus approximately 25 degrees Celsius on the interposer and approximately 20 degrees Celsius with the bridge. The organic substrate's low thermal conductivity and relatively large millimeter-scale air gap isolate the PIC. In both UCIe-A configurations, the fine-pitch silicon close to the XPU provides a low-resistance thermal path.

The thermal transients occur within approximately 30 milliseconds of an XPU power-state change. The PIC heats at approximately 10 degrees Celsius per second on the organic substrate, versus approximately 100 degrees Celsius per second with the bridge and approximately 120 degrees Celsius per second on the interposer. Marvell argues that the EAM bias voltage can be adjusted electronically fast enough to track these changes, while ring modulators require heater-and-feedback loops constrained by slower time constants.

Lightmatter provided a much deeper look at the assembly process, fiber attachment, and packaging results for integrating the multi-reticle photonic interposer with ASIC chiplets in its Passage M1000. The test vehicle uses chip-on-wafer assembly to attach 15 ASIC chiplets to a four-tile M1000 interposer. SemiAnalysis estimates the interposer measures approximately 2,100 square millimeters, about half of the 4,000 square millimeter eight-tile configuration shown at Hot Chips 2025.

Attaching a silicon interposer of this size to an organic substrate creates severe warpage. The module reached approximately 59 microns of warpage at the 260 degree Celsius reflow temperature, and approximately 56 microns after cooling back to room temperature. With a 118 micron thick interposer and C4 bumps at approximately 176 micron pitch, this is enough to compromise joint formation. Lightmatter used a magnetic fixture to hold the substrate flat during attachment and reported greater than 95% electrical assembly yield, with healthy microbump and C4 joints across the package.

Lightmatter used a thermal test chip with four independently powered quadrants, each dissipating 170 watts, resulting in a power density of 1.47 watts per square millimeter across the 369 square millimeter active area. At this power, the photonic interposer reached approximately 100 degrees Celsius using a 25 degree Celsius coolant flowing at 1.8 liters per minute per kilowatt. This validates cooling 680 watts from a concentrated test-chip area in a package designed for more than 900 watts across nearly three reticles of ASIC silicon.

Hybrid Bonding Approaches 450 Nanometer Pitch

Progress in hybrid copper bonding centered on two material approaches addressing the persistent challenge of maintaining extremely flat and clean interfaces while reducing bonding temperature. The first uses organic dielectrics whose mechanical compliance increases tolerance to particles and surface roughness while reducing bonding stress. Mitsui Chemicals and ASE demonstrated pressure-less copper and polymer bonding at 200 degrees Celsius and 10 micron pitch. TOK and NYCU demonstrated a 10-second bonding process at 150 degrees Celsius, with samples bonded at 200 degrees Celsius maintaining stable resistance through reliability testing.

The second approach uses fine-grain copper. Its higher grain-boundary density accelerates copper diffusion at lower temperatures, with subsequent grain growth increasing conductivity. Intel combined fine-grain copper with a low-temperature dielectric stack, achieving uniform wafer bonding after 175 degree Celsius and 200 degree Celsius anneals. Electrical yield was around 60% in two of three samples, although Intel described these results as a lower bound due to test vehicle and probing limitations. The experiments used wafer-to-wafer test vehicles rather than the die-to-wafer process targeted by the technology.

The most aggressive pitch came from Applied Materials and EV Group, which demonstrated 450 nanometer pitch wafer-to-wafer bonding at 98% yield across a chain of 20 million links. Failure analysis associated open links with carbon-rich benzotriazole residue at the copper interface. A PVD TaN and Ta barrier stack significantly improved yield. CEA-Leti separately achieved greater than 97% yield after a 100 degree Celsius anneal without plasma activation.

Together, these results demonstrate that reducing pitch and bonding temperature requires the copper, dielectric, chemical mechanical polishing, surface preparation, and annealing to be co-optimized to achieve hybrid bonding with low warpage and no cracking. Continued refinement by material suppliers and equipment vendors should improve post-bond yield from 2027 onward.

Glass Substrates Progress But SeWaRe Remains Unsolved

Glass substrate momentum dimmed somewhat this year, with fewer innovative papers presented at ECTC. The unsolved problem remains SeWaRe, the lateral crack that begins at a diced glass edge under RDL stress. Georgia Tech characterized the failure experimentally while Corning used finite element analysis, peridynamics, and analytical fracture mechanics to model its propagation, showing that stiff copper layers drove cracks towards the glass midplane while compliant polymer layers changed the crack path. Corning also found that low coefficient of thermal expansion polymers combined with appropriate glass selection could reduce failure risk.

STATS ChipPAC investigated assembly and reliability of large glass-core packages. Its 74 millimeter by 74 millimeter glass-core packages failed every test segment without edge coating, while edge-coated packages completed assembly and reliability testing without abnormalities. The edge coating also reduced warpage by 33.5% relative to uncoated glass-core packages. Build-up pull-back and edge coating increasingly look like requirements for reliable glass-core substrate assembly.

On a positive note, Intel demonstrated an industry-first 510 millimeter by 515 millimeter, 24-layer glass-core panel with fully copper-filled through-glass vias, two embedded EMIB bridges, and optical waveguides co-formed between the TGVs. The large prototype was displayed at Intel's booth and processed on existing organic-substrate lines, while singulated units showed no SeWaRe after thermal shock testing. As OSAT adopters, Amkor and STATS ChipPAC measured 30% to 40% lower substrate-level warpage with a thinner glass core than their organic references, although assembly defects and TGV filling problems show the process remains immature. Glass is making real progress, but this year's data still support manufacturing development rather than high-volume adoption.

RDL Approaches One Micron Line and Space

RDL line and space continues to shrink even as package sizes grow, driven primarily by UCIe 3.0 which supports speeds up to 64 gigatransfers per second for future ASIC-to-ASIC and ASIC-to-HBM links. The roadmap has progressed from 10 micron line and space around 2015 to two microns today, with one micron emerging as the next target. Reaching the submicron era will require major changes to both RDL routing architectures and manufacturing processes, with the process shifting from semi-additive plating towards damascene for sub-two micron copper, where chemical mechanical polishing planarization and low-shrinkage dielectrics become the key gating steps.

Resonac used polymer damascene and panel CMP to form two micron line and space on a 320 millimeter by 320 millimeter glass panel, including a four-layer via-and-trench structure. Imec and Fujifilm pushed damascene to one micron line and space on 300 millimeter wafers. Ushio resolved 1.5 micron line and space over an 18-reticle field without stitching, with 16 exposures covering a full 510 millimeter by 515 millimeter panel. Sumitomo Bakelite and Georgia Tech showed a fully imidized liquid dielectric with only 4% cure shrinkage at a relatively low temperature of 200 degrees Celsius and a fine two micron line and space.

As the most advanced RDL manufacturer, TSMC collaborated with GUC to present work on eight-layer RDL scaling, believed to be the near-term limit of the CoWoS-R platform. GUC demonstrated an STCO-based design and validation flow for integrating a 64-bit UCIe-A interface fabricated on TSMC N3 and integrated on an eight-layer CoWoS-R RDL. Its STCO framework uses ground-signal-ground interleaved transmission lines to control crosstalk and skew, while simulations show that C4-side integrated passive devices provide localized decoupling and reduce voltage fluctuation at the chiplet microbumps.

The design targets 16 to 36 gigatransfers per second with a 64-bit, 10-column UCIe-A interface at 45 micron bump pitch. Signal traces were routed at two micron line and space across six layers, with the seventh reserved for power delivery. The test chip achieved a measured on-die eye width of 0.77 unit interval at 32 gigatransfers per second, while simulations showed an eye width of 0.74 unit interval at 36 gigatransfers per second. The results demonstrate that organic interposers can meet signal and power integrity requirements for heterogeneous chiplet systems.

DruckFin