# Solid State Drive End to End Reliability Prediction, Characterization and Control

Mohd Azman Abdul Latif, Erwan Basiron

will

Abstract—A flaw or drift from expected operational performance in one component (NAND, PMIC, controller, DRAM, etc.) may affect the reliability of the entire Solid State Drive (SSD) system. Therefore, it is important to ensure the required quality of each individual component through qualification testing specified using standards or user requirements. Qualification testing is time-consuming and comes at a substantial cost for product manufacturers. A highly technical team, from all the eminent stakeholders is embarking on reliability prediction from beginning of new product development, identify critical to reliability parameters, perform full-blown characterization to embed margin into product reliability and establish control to ensure the product reliability is sustainable in the mass production. The paper discuss a comprehensive development framework, comprehending SSD end to end from design to assembly, in-line inspection, in-line testing and will be able to predict and to validate the product reliability at the early stage of new product development. During the design stage, the SSD will go through intense reliability margin investigation with focus on assembly process attributes, process equipment control, in-process metrology and also comprehending forward looking product roadmap. Once these pillars are completed, the next step is to perform process characterization and build up reliability prediction modeling. Next, for the design validation process, the reliability prediction specifically solder joint simulator will be established. The SSD will be stratified into Non-Operating and Operating tests with focus on solder joint reliability and connectivity/component latent failures by prevention through design intervention and containment through Temperature Cycle Test (TCT). Some of the SSDs will be subjected to the physical solder joint analysis called Dye and Pry (DP) and Cross Section analysis. The result will be feedbacked to the simulation team for any corrective actions required to further improve the design. Once the SSD is validated and is proven working, it will be subjected to implementation of the monitor phase whereby Design for Assembly (DFA) rules will be updated. At this stage, the design change, process and equipment parameters are in control. Predictable product reliability at early product development will enable on-time sample qualification delivery to customer and will optimize product development validation, effective development resource and will avoid forced late investment to bandage the end-oflife product failures. Understanding the critical to reliability parameters earlier will allow focus on increasing the product margin that will increase customer confidence to product reliability.

Keywords-e2e reliability prediction, SSD, TCT, Solder Joint Reliability, NUDD, connectivity issues, qualifications, characterization and control.

#### I. INTRODUCTION

HE rapid development of semiconductor integration L technology and micro-machining technology has been driving the packaging technology of electronic devices to highdensity, high-reliability direction. Due to nearly ideal packaging density and excellent high frequency performance, Ball Grid Array (BGA) technology is becoming the mainstream in the microelectronic packaging industry. With the significant improvement of the integration of microelectronic circuits, the number of solder joints is becoming more and more vulnerable, and the diameter of solder joint is becoming smaller and smaller, thus making failures more likely to occur. It is proven that the 70% of electronic device failures are caused by electronic package, while the main reason responsible for package failures is the solder joint [1].

Reliability is the most critical requirement in data centers; attainment is the biggest challenge due to heterogeneous integration of mismatched Coefficient of Thermal Expansion (CTE) entities in Enterprise SSD. Failure to achieve this target will result into high warranty cost and at times loss in business. In addition, one of the biggest Time-To-Market (TTM) killers is due to CTE mismatch issue.

Fig. 1 is the typical product Lifecycle. The end in mind of this work is to start on the design simulation way upfront, prior Physical SSD assembly Development phase, shift-left from the current practice at Engineering Validation phase.

#### II. THE FRAMEWORK

The need for a structured reliability assessment framework throughout the product development life cycle has been identified with special focus on the following to minimize the reliability risks introduced due to the following factors:

- Integrity of material, manufacturing process, design and analysis methodology.
- Robust testing and inspection plan which is built on a foundation of statistics.

The definition of reliability considered in this framework is "the ability of product to perform its intended function without failure for a specified period of time under its field operation conditions at a predefined confidence level" [10]. In-line with this definition, the risks related to the failure mechanisms which will impact the life of the product are considered in the assessment framework [2]. Mitigation of risks are identified by means of design and process controls. Attention is given to risks since the material manufacturing process itself is different from the conventional method of mass manufacturing and subtractive manufacturing of components.

For this reliability framework, the Stratification into Non-Operating and Operating with focus on solder joint reliability

Mohd Azman Abdul Latif and Erwan Basiron are with the SSD Reliability Engineering, Package Technology Development and Integration, Malaysia (email: mabdul2my@gmail.com, mohderwan1555@gmail.com).

and connectivity/component latent failures by prevention through design intervention and containment through

temperature acceleration and cycling, see Fig. 2.



Fig. 2 The proposed reliability assessment framework

## A. NUDD Reliability Margin Investigation

The development of this reliability framework is solely depending on a risk assessment utilizing the NUDD process (New, Unique, Different, Difficult). This risk assessment is comprehensive. It includes lessons learned from previous issues comprised of technology development issues, manufacturing issues as well as customer issues. All these ingredients will be feedbacked to the design for corrective actions and improvements.

For every new introduction of SSD component, it will go through New Electrical Component Request (NECR) process, which will trigger the team to perform NUDD analysis. The intention is for early detection of every component quality and reliability constraint. This stage also a phase for component selection at Design Phase. Critical component selection includes the following:

- 1. Controller
- 2. NAND
- 3. Dynamic random access memory (DRAM)
- 4. Connector
- 5. Power Management (PMIC/Regulator)
- 6. Clock
- 7. Oscillator

During this phase, the component structure is being analyzed, simulated and characterized. The Structure includes:

- 1. Quad-Flat-No-Leads (QFN)
- 2. Ball Grid Array (BGA)
- 3. Fine pitch
- 4. Weight (Zarrow Equation)
- 5. Bottom termination

NUDD reliability margin investigation will require extensive parameters from below attributes:

- 1. Assembly Attributes
- Stand-off Height
- Package & PCB Warpage (CTE mismatch)
- Pad Design Size

- Solder Metallurgy
- PCB Hybrid Non-Solder Mask Defined (NSMD)/Solder Mask Defined (SMD)
- New solder paste material
- 2. Process & Equipment Control
- Stencil Printing Parameter
- Reflow Profile Parameter
- Assembly Excursion Killer Defect
- 3. In-Process Metrology
- Standoff Height
- Warpage

#### B. Simulation and Characterization

In order to build reliability model, a failure physics model according to failure mechanism must be done firstly. For common mechanical failure, such as fatigue, wear out and erosion, some failure physics models have proposed in the literature. We take creep-fatigue failure as an example, the process of reliability model based on creep-fatigue mechanism is given [3].

For creep-fatigue problem of structures, failure arises as a result of cumulative damage [4]; the total damage is given as:

$$\sum D = \sum_{i=1}^{m} \frac{n_i}{n_i} = \sum_{i=1}^{m} \frac{n_i}{c} \left( \lambda \left( T_{i,t_i} \right) \triangle \epsilon_{pi} \right)^n \tag{1}$$

where n and C are material parameters, p  $\Delta \epsilon$  is plastic strain range, m, i are load levels and their sequence number, ni are the number of cycles required for each load level.  $\lambda$  is creep correction coefficient.

If the cumulative damage attains to specified damage D0, the structure is considered failed. The equation of creep fatigue of the structure is:

$$Z = D_0 - \sum D = D_0 - \sum D = \sum_{i=1}^m \frac{n_i}{N_i} = \sum_{i=1}^m \frac{n_i}{C} (T_{i,t_i}) \triangle \epsilon_{pi})^n \le 0$$
(2)



Fig. 3 Modeling flow of the reliability predictions

Fig. 3 illustrates the modeling flow of the reliability predictions.

## Reliability Prediction Analysis

It is based on a comprehensive simulation from solder joint focusing on the solder joint (SJ) height calculation.

## Solder Joint Simulation

Solder shear strain increases with smaller stand-off height. To reduce solder shear strain, stand-off height increase is needed to gain temperature cycling margins. Fig. 4 is an illustration of the analytical model of how solder shear strain happens and its mechanisms.



 $t_s$ : solder standoff height  $G_s$ : solder effective shear modulus  $G_s t_s \propto$  solder vol.

Fig. 4 Analytical model of how solder shear strain happens and its mechanisms

Solder Joint Height Calculation

The simulation was done based on the TCT profile of 0  $^{\circ}$ C to 100  $^{\circ}$ C with ramp rate of 40 minutes/cycle. This is the most stringent TCT profile to ensure simulation is covering the worst-case scenario.

| ◆ 570um                         |                                    |          | 550um                            | 570um                          |
|---------------------------------|------------------------------------|----------|----------------------------------|--------------------------------|
|                                 |                                    |          | •                                | +                              |
| The critical<br>functional ball |                                    | POR      | High standoff<br>(const. volume) | High standoff<br>(high volume) |
| <u>Pkg</u><br>side              | SED                                | 1.90E-01 | 1.46E-01                         | 1.78E-01                       |
|                                 | Predicted<br><u>charz</u> life (X) | 2470     | 3210                             | 2636                           |
|                                 | Acceleration<br>Factor             | 1        | 1.299                            | 1.067                          |
| PCB<br>side                     | SED                                | 4.64E-01 | 4.57E-01                         | 4.32E-01                       |
|                                 | Predicted<br><u>charz</u> life (X) | 1016     | 1032                             | 1091                           |
|                                 | Acceleration<br>Factor             | 1        | 1.015                            | 1.074                          |

Fig. 5 Solder Joint Reliability Simulation Result

Based on the simulation result in Fig. 5, it shows that the Height (vol.) increases from x  $\mu$ m - y  $\mu$ m may gain z% life. Wiese et al. [5] also found that improved accuracy was obtained in simulating stress-strain hysteresis loops when plastic flow was incorporated into the model. Clech [6] simulated a measured hysteresis loop by Hall [11], and found that stress range was over-predicted when only steady state creep was incorporated.

# Inline X-Ray Solder Joint Height

After the simulation is done, the SJ height will be verified using the Inline X-ray machine. The samples will be coming from samples that have gone through cross-section analysis, see Fig. 6 on how the X-ray analysis is done.

The typical solder joint failures detected by this technique included with the X-ray are: broken solder joint, shorts or

bridging, voiding (an open area void of a solder located inside or on the surface of a solder joint), wetting problem of cut surfaces, insufficient through-hole solder joints, detection of delamination (separated layers within a device structure), missing solder fillets or connection points, misalignment (lifted or tombstone), poor coplanarity (open or misaligned), unacceptable size and shape variations in solder volume, etc. [7].



Fig. 6 Inline X-Ray Solder Joint Height Analysis

#### TCT DP Modeling/Validation

The objective of the DP modeling is to predict potential pass or fail criteria as the TCT progressing. Since TCT will take few weeks to complete, having outcome prediction at the middle readout will help the team to buy time for re-qualification if needed.

Prior to explaining the details of DP modeling, DP procedure has to be understood first.

## **DP** Procedure

Printed Circuit Board Assembly (PCBA) are soaked in red dye (Dykem) under vacuum to allow the dye to penetrate any cracks. Then the PCBA will be baked to dry for several hours at 125 °C. The components are then Pried from the PCBA. The exposed interfaces on both the PCBA and components are inspected using an optical microscope. DP will highlight the degree of cracking in a solder joint. The Dye Pry Procedure is as per IPC-A-610G (Acceptability of Electronic Assemblies, section 8.3.12) [8], see the illustration in Fig. 7 of how DP analysis is done.



Fig. 7 DP Analysis

The DP modeling is critical to predict the percentage of the dye penetration after the DP analysis is done based on the actual samples. This upfront work will ease the decision whether to build the samples for qualification which requires huge investment and the DP analysis is very time consuming. Therefore, the modeling needs to be done way upfront especially during the mechanical builds prior to any functional sample build for New Product Introduction (NPI) qualifications.

Fig. 8 shows the schematic of five typical failure modes on Non-Via-in-Pad and Via-in-Pad solder joint structures.



Fig. 8 Five Typical Failure Modes

- Failure mode A: IMC crack on component side. It is a typical failure for lead-free solder, especially for Sn-4Ag-0.5Cu solder.
- Failure mode B: Bulk solder crack at solder ball neck close to component side.
- Failure mode C: IMC crack on PCB side. It is also a typical failure for lead-free solder. Frequency of occurrence depends on solder joint strength in component side. If solder joint strength is higher at component side, this failure could occur more frequently.
- Failure Mode D: The crack occurs in PCB buildup layer.
- Failure Mode E: Crack happens in via structure, which is one of the weakest point in Via-in-Pad structure.

From Fig. 7, it is observed that IMC crack is a brittle failure while bulk solder failure is close to ductile failure. The location of crack depends on where the weakest point is. Normally for lead-free solder, the weakest point is in IMC. With the improvement of solder joint reliability at package side, the IMC on PWB side is prone to crack too. The strength of IMC layer depends on solder, reflow process, pad finishing, and etc. Compared with lead-free solder, Sn-37Pb is more susceptible to bulk solder failure. Sometimes, the failure of combination of modes A and B may occur too. Crack starts from IMC and propagates into bulk solder or vice [9].

Fig. 9 shows the Bivariate Fit Curve of the DP Prediction.



Fig. 9 DP Modeling Analysis

Example of cum crack calculation: (ModeC) 4X0.25 + (ModeE) 1X0.5 = 1.5



Fig. 10 Cumulative Ball Crack Analysis

The modeling objective is to find the cumulative ball crack based on the Solder Join Failure Modes.

Fig. 10 is an example of the cumulative ball crack illustration.

The cumulative crack analysis, for every BGA component will be compared against historical highly correlated known pass/fail component as the crack progress from Time0 to the completion of TCT, Fig. 8. We use statistical analysis, see Fig. 11 as a snapshot summary of the DP prediction outcome.

#### C. Validation

After the reliability simulation is done and is proven working verified across manufacturing conditions, the actual samples will be built under the following phases.

- Board Level Reliability (BLR)-Daisy Chain
- Non-operating Mechanical Line Validation Build
- Accelerated TCT/Thermal Shock
- Hardware (HW) Validation Build

| Model                                                | DP Pass                      | DP Fail                     |
|------------------------------------------------------|------------------------------|-----------------------------|
| All Modes Total<br>Ball Crack @<br>T250              | < 20<br>@ 82% R <sup>2</sup> | >21<br>@ 76% R <sup>2</sup> |
| Mode C Anchor<br>& Functional<br>Cum Crack @<br>T250 | < 16<br>@ 78% R <sup>2</sup> | >20<br>@ 75% R <sup>2</sup> |

## Fig. 11 DP Prediction Analysis Result

#### D.Implement and Monitor

Once the design modeling and simulation are validated using the actual samples across all NPI milestones, the DFA Design Rules will be frozen and updated, All the Design Change, Process & Equipment parameter control will be finalized and will become a Plan of Record (POR).

#### III. CONCLUSION AND FUTURE SCOPE

A potentially useful approach to Packaging Design for Reliability in the product design and development process is to configure a reliability prediction model as a design trade-off tool which shows design engineers the reliability consequences of their design choices which impact Solder Joint Performance. We have piloted the initial version of this tool using the actual product and standard data and practices available to most reliability engineers.

Future work on the prediction-based design for reliability tool will address the following issue: Modeling stresses at the level of failure modes rather than at a part level will allow a more detailed model of how stresses affect an assembly's reliability. Hence, a version of the model in which the multiplicative reliability model is applied to failure modes, and then the failure mode results are added up to arrive at a component failure rate model will be explored.

#### ACKNOWLEDGMENT

The authors would like to thank Western Digital Batu Kawan Penang specifically SSD Reliability laboratory team and SSD Package Technology Development and Integration team, especially for the project support and funding.

#### References

- G. O. Young, "Synthetic structure of industrial plastics (Book style with paper title and editor)," in *Plastics*, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp. 15–64.
- [2] J.T. Selvik, T. Aven, A framework for reliability and risk centered maintenance, Reliability Engineering & System Safety, Volume 96, Issue 2, 2011
- [3] G. K. Mui, X. -. Wu, K. X. Hu, C. -. Yeh and K. Wyatt, "Solder joint formation simulation and finite element analysis," 1997 Proceedings 47th Electronic Components and Technology Conference, San Jose, CA, USA, 1997, pp. 436-443, doi: 10.1109/ECTC.1997.606207.
- [4] Lu, Hua & Shi, Helen & Zhou, Ming. (2006). Thermally induced deformation of solder joints in real packages: Measurement and analysis. Microelectronics Reliability. 46. 1148-1159. 10.1016/j.microrel.2005.10.002.
- [5] Wiese, S. & Rzepka, S. (2004). Time-independent elastic-plastic behaviour of solder materials. Microelectronics Reliability. 44. 1893-1900. 10.1016/j.microrel.2004.04.015.

- [6] Jean-Baptiste Libot, Joël Alexis, Olivier Dalverny, Lionel Arnaud, Philippe Milesi, et al. Experimental SAC305 shear stress-strain hysteresis loop construction using Hall's one-dimensional model based on strain gages measurements. Journal of Electronic Packaging, American Society of Mechanical Engineers, 2019, 141 (2), pp.021002-1-021002-15. ff10.1115/1.4042806ff. ffhal-02359785
- [7] M. S. Laghari, R. Hijer and G. A. Khuwaja, "Efficient techniques for BGA solder joint identification in low resolution X-ray images," 2011 IEEE GCC Conference and Exhibition (GCC), Dubai, 2011, pp. 128-131, doi: 10.1109/IEEEGCC.2011.5752465.Jing-en Luan, Tong Yan Tee, Xueren Zhang and E. Hussab, "Solder joint failure modes, mechanisms, and life prediction models of IC packages under board level drop impact," 2005 6th International Conference on Electronic Packaging Technology, Shenzhen, 2005, pp. 382-388, doi: 10.1109/ICEPT.2005.1564714.
- [8] IPC-A-610G (Acceptability of Electronic Assemblies, section 8.3.12
- [9] Jing-en Luan, Tong Yan Tee, Xueren Zhang and E. Hussab, "Solder joint failure modes, mechanisms, and life prediction models of IC packages under board level drop impact," 2005 6th International Conference on Electronic Packaging Technology, Shenzhen, 2005, pp. 382-388, doi: 10.1109/ICEPT.2005.1564714.
- [10] Reliasoft Corporation (2004, May) Reliability Basics, Reliability HotWire e-Magazine (39). Retrieved from https://www.weibull.com/hotwire/issue39/index.htm
- [11] P.M. Hall, "Forces, moments, and displacements during thermal chamber cycling of leadless ceramic chip carriers soldered to printed boards", IEEE Transactions on Components, Hybrids, and Manufacturing Technology, Vol. CHMT-7, No. 4, Dec. 1984, pp. 314-327.



Mohd Azman Abdul Latif received the B.Sc. degree in Electrical Engineering from Lehigh University, Bethlehem, PA, USA in 1997. He then received M.Sc. degree in Electrical and Electronics Engineering from Petronas University of Technology (UTP) in 2012. He was with Intel Corporation from 1997 until 2014 in the Quality and Reliability Department. His area of focus was in Burn-In Development and in System-On-Chip (SOC) Q&R

development. In late 2014, he joined SanDisk Corporation start-up team focusing on Q&R and now with Western Digital as Solid State Device (SSD) Reliability Manager.

His current research interests include Thermal Cycle Test Enhancement in SSD, Accelerated Life Test, e2e reliability prediction, characterization and control.



**Erwan Basiron** received the B.Eng. Degree in Mechanical Engineering from UiTM Shah Alam, Malaysia. After graduation, he worked with Intel Corporation from 1997 until 2015 in the Packaging Assembly Development and Manufacturing on flip chip thermal compression bonding. After 2015, he joined SanDisk Corporation, working on SSD Assembly technology development and recently starting up Technology Development Q&R function

within development team.