The Information Source for Systems Testability and Diagnostics

Reliability Terms

Basic Reliability – The duration or probability of failure-free performance under stated conditions. (Compare with Logistics Reliability and Mission Reliability)

Compensating Features– Special inspections, tests, controls, instructions, drawing notes or other provisions applied to a single point failure mode item to improve reliability and lessen chances of failure.

Compensating Provisions– Actions that are available or can be taken to negate or mitigate the effect of a failure on a system.

Conditional Probability  – The probability of an event, given that another event is known to have occurred.

Corrective Action  – An identification or actions, automatic or manual, which can be taken to circumvent a failure. Also, a documented design, process, procedure, or materials change implemented and validated to correct the cause of failure or design deficiency.

Correlated or Sympathetic Failure – The inability of two or more items to perform their function as the result of some single event, thus possibly negating redundancy and acting as a single point failure mode.

Criticality – A relative measure that combines both the consequences (i.e., severity) of a particular failure mode and its frequency of occurrence.

Criticality Analysis (CA) – A procedure by which each potential failure mode is ranked according to the combined influence of severity and probability of occurrence.

Criticality Matrix – A graphical representation of the failure mode and effects, usually graphed as probability of occurrence vs. severity level.

Degradation Analysis – Analysis involving the measurement and extrapolation of degradation or performance data that can be directly related to the presumed failure of the product in question.

Derating – A method of altering the failure rate of a component based on stresses caused by operating the component beyond its specifications.

Durability – A measure of useful life.

Duty Cycle – The percentage of time that a system, device or component is active.

End Effect – In a FMEA, the consequence(s) of a failure mode on the operation, function, or status of the highest indenture level. (Compare with Local Effect and Next Higher Level Effect).

Exponential Distribution – The most widely known and used distribution in reliability evaluation of systems, most often used where the rate at which events occur does not vary. A Weibull Distribution reduces to an exponential distribution when the beta (slope) parameter is set to 1.0.

Fail Operational – The ability to sustain a failure and retain sufficient operational capability for save mission continuation.

Fail Safe – The ability to sustain a failure and retain the capability to successfully terminate the mission.

Failure – The loss of ability of a system, device or process to perform a required function. The manifestation of a fault. (See also Hardware Failure and Software Failure).

Failure Analysis – The logical, systematic examination of a system to identify the probability, causes, and consequences of potential failures.

Failure Cause– The circumstances during design, manufacturing or use which have induced or activated a failure mechanism. The basic reason(s) for a failure. (Compare with Failure Mechanism).

Failure Distribution – A mathematical model that describes the probability of failure ocurring over time. Also known as a probability density function (pdf), this function can be utilized to determine the probability that a failure takes place in a given time interval

Failure Effect – The consequences a failure has on the operation, function, or status of a device. (See also Object Failure Effect and Design Failure Effect).

Failure Mechanism– The physical, chemical, electrical, thermal or other process that results in failure. For a system, the failure mechanism is the process of error propagation following a component failure which leads to a system failure. (Compare with Failure Cause).

Failure Mode – The characteristic manner in which a failure occurs. Within a failure mode diagnostic model, failure modes represent specific ways in which a system, device or process can fail. (See also Object Failure Mode).

Failure Modes and Effects Analysis (FMEA) – An inductive, bottom-up method of analyzing the system effects of individual failure modes.

Failure Modes, Effects and Criticality Analysis (FMECA) – A FMEA that also includes criticality calculations for each failure mode and effect.

Failure Rate – A function that describes the number of failures to a system, device or component that can be expected to take place over a given unit of time, most often expressed as the number of failures per million hours. It can be computed as the inverse of MTBF.

Failure Reporting and Corrective Action System (FRACAS) – A process by which failures are identified and analyzed so that corrective actions can be implemented back into the design and/or manufacturing process.

Fault Avoidance – Actions taken to assure a fault cannot occur.

Fault Tolerance – The built-in capability of a system to provide continued correct execution in the presence of one or more failures.

Fault Tree – A tree-like representation of failure causes that can be used to determine the probability of the outcomes of those failures.

Fault Tree Analysis – An analysis approach in which each potential system failure is traced back to all faults that could cause the failure. It is a top-down approach, whereas the FMEA is a bottom-up approach.

Functional Failure Mode Effects Analysis (Functional FMEA or FFMEA) – A FMEA that identifies the potential impact of each functional failure mode on mission success.

Functional Hazard Analysis (FHA) – An analysis of the effects, risk, severity and probability of potential faults that is performed during the specification and design stages, typically before failure modes are identified. The FHA then becomes the baseline for the FMECA's developed later.

Indenture Level – a designation which identifies an item's relative complexity as an assembly or function. In a system, the first indenture level is the system. Examples of lower indenture levels could be system segments (level 2), prime items (level 3), subsystems (level 4), components (level 5), subassemblies or circuit cards (level 6), and parts (level 7).

Local Effect – In a FMEA, the consequence(s) of a failure mode on the operation, function, or status of the specific item being analyzed. (Compare with Next Higher Level Effect and End Effect).

Logistics Related Reliability – The probability that no corrective (or unscheduled) maintenance, unscheduled removals, and/or unscheduled demands for spare parts will occur following the completion of a specific mission profile.

Logistics Reliability – The ability of a system to perform failure free, under specified operating conditions and time without demand on the support system, measured as a mean time between maintenance actions. Also, a measure of a system's ability to operate without logistics support. All failures, whether the mission is or can be completed, are counted.

Lognormal Distribution – A failure distribution similar to the Normal distribution except that the logarithm of the values of random variables, rather than the values themselves, are assumed to be normally distributed.

Logistics Supports Analysis (LSA) – A systems engineering and design process selectively applied during all life cycle phases of the system/equipment to help ensure supportability objectives are met. (MIL-STD-1785)

LSA Record (LSAR) – That portion of LSA documentation consisting of detailed data pertaining to the identification of logistic support resource requirements of an equipment.

Mean Mission Duration  – For on-orbit space systems, the average time the system is operational before a missional critical failure occurs. The mean mission duration is equivalent to mean time to failure for nonrepairable ground systems.

Mean Time Between Critical Failure (MTBCF) – The mean time between failures of mission-essential functions, calculated as the ratio of active hours (those excluding scheduled maintenance) and the number of critical failures.

Mean Time Between Downing Events (MTBDE) – A measure calculated as the total uptime over the number of downing events.

Mean Time Between Failure (MTBF) – The mean equipment operating time between failures of any type, calculated by dividing uptime by the total number of failures.

Mean Time To Failure (MTTF) – A system, subsystem or device's mean time to failure, as calculated at a specific point in time. This differs from MTBF in that it changes over time as the system is maintained.

Mean Time To First Failure (MTTFF) – The Mean Time to Failure starting from when the system is first made to be Mission Capable.

Mean Time to Repair (MTTR) – The total amount of time spent performing all corrective maintenance repairs divided by the total number of those repairs.

Mission Reliability – The probability that the system is operable and capable of performing its required function for a stated mission duration or for a specified time into a mission. (Compare with Basic Reliability and Logistics Reliability).

Mixed Weibull Distribution – A variation of the Weibull Distribution used to model data with distinct subpopulations that may represent different failure characteristics over the lifetime of a product. Each subpopulation has a separate Weibull parameters calculated and the results are combined in a mixed Weibull distribution to represent all of the subpopulations in one function.

Monte Carlo Simulation – A method of generating values from a know distribution for the purposes of experimentation. This is often accomplished by generating uniform random variables and using them in an inverse reliability equation to produce failure times that would conform to the desired input distribution.

Next Higher Level Effect – In a FMEA, the consequence(s) of a failure mode on the operation, function, or status of the items in the next indenture level above the indenture level under consideration. (Compare with Local Effect and End Effect).

Normal Distribution – A widely used distribution that is symmetric, allowing the curve to be defined by a mean and a standard deviation.

Probabilistic Risk Assessment – An approach for documenting risk profiles based on the failures.

Probability Density Function – A mathematical model that describes the probability of failure ocurring over time. This function can be utilized to determine the probability that a failure takes place in a given time interval.

Rayleigh Distribution - A Weibull distribution whose beta (slope) value is equal to 2.0.

Redundancy – The existence of backup equipment that can be used to perform primary functions, in the event that the primary equipment should fail. Also, the existence of more than one way of performing a function.

Reliability – The probability that a system will perform satisfactorily for a given time when used under specified operating conditions. More generally, reliability is the capacity of parts, components, equipment, products and systems to perform their required functions for desired periods of time without failure, in specified environments and with a desired confidence. (See also Basic Reliability and Mission Reliability). Also, the engineering discipline concerned with predicting, monitoring, testing, and improving the reliability of a system, device or process.

Reliability Analysis – A quantification of the sources of failures in a system, with emphasis on the most significant contributors towards the overall system unreliability, in order to correct them and therefore improve the reliability of the fielded system.

Reliability Block Diagram (RBD) – A diagram the represents how the components (represented by "blocks") are arranged and related reliability-wise in a larger system. This is often, but not necessarily, the same as the way that the components are physically related.

Reliability Distribution Curve – A curve that characterizes the changes to the probability of failure over time.

Reliability Engineering – The set of design, development and manufacturing tasks by which reliability is achieved.

Reliability Growth – The improvement in a reliability parameter caused by successfully correcting design or manufacturing deficiencies.

Reliability Prediction – The primary calculation in Reliability Analysis, referred to as the Failure Rate or number of failures during a period of time.

Safety Case – A final study that provides proof that the system will remain acceptably safe for a particular failure scenario.

Severity – The worst potential consequence of a failure, determined by the degree of injury, property damage, or system damage that could ultimately occur.

Single Point Failure (SPF) – Any single hardware failure or software error which results in irreversible degradation of item mission performance below contractually specified levels. The failure of an item which would result in failure of the system and is not compensated for by redundancy or alternative operational procedures.

Single Point Failure Mode (SPFM) – The way or manner in which the sinlge point failure of an item occurs.

Software Reliability – The probability that software will contribute to failure-free system performance for a specified time under specified conditions. The probability depends on information input into the system, system use, and the existence of software faults encountered during input. Calculated as the total CPU processing time over the number to software failures.

Steady-State Failure Rate – The constant failure rate after one year of operation.

Undetectable Failure – A postulated failure mode in a FMEA for which there is no failure detection method by which the operator is made aware of the failure.

Uniform Distribution – A simple failure calculation algorithm where a random number is simply limited to a range.

Weibull Distribution – A statistical distribution that is widely used for matching field data, due to its versatility and the fact that the Weibull probability density function can assume different shapes based on the parameter (beta factor) values.