Industrial IC design: Functional Safety and Smart Embedded Processing
Blog: NASSCOM Official Blog
Among all market segments addressed by the integrated circuit (IC) industry, Industrial is probably the most diverse with applications ranging from plant automation, power conversion (including conventional and alternative), remote monitoring, robotics, security, battery management, man machine interfaces (MMI), general sense & actuation etc. Add to this the natural adjacency to automotive and medical, and the segmentation becomes even more diverse.
Numerous factors contribute to the acceptance of an Industrial IC including richness of Analog interfaces, wired and wireless connectivity interfaces supported, availability of a wide selection of package footprints, FLASH/SRAM content, reliability metrics in terms of range of operating conditions supported, richness of available reference boards and software eco-system, generality / specificity of the industrial problem being addressed by the IC etc. While these factors contribute significantly to the viability and successful adoption of an Industrial targeted IC, we focus on 2 specific criteria that are considered critical.
Although there is a bewildering array of independent safety standards with relevance to Industrial (EN50128/9, IEC61511, IEC60880, IEC62061, IEC61513, IEC61800, to name just a few), they fall under the overarching umbrella of the IEC 61508 “Functional Safety of Electrical / Electronic / Programmable Electronic safety-related systems” standard. Many of these standards do not provide technical measures or engineering recommendations that can be used as a concrete guideline towards achieving a specific SIL (Safety Integrity Level) for an Industrial IC – restricting themselves to prescribing life-cycle / process related (eg: documentation, reviews, management) considerations. In addition, there are other related standards like the ISO26262 (“Functional safety in E/E systems in road vehicles”) which, though is automotive-targeted, finds interest in the Industrial world due to related safety considerations plus the fact that many of the primary Industrial IC vendors are also players in the Automotive IC world.
A few years ago, the general approach towards functional safety was largely one of self-regulation; due to the fragmentation of the Industrial segment (often with more than 10,000 customers in a single geography with dramatically divergent silicon volume pick-up requirements from each), many Industrial semiconductor IC vendors did not deem it worth sufficient ROI to perform formal certifications against numerous industrial safety standards. Instead, they created devices adhering to a ‘robust minima’ of safety features drawn from proven internal design recipes. The silicon vendor would then support – often utilizing specialized third-party consulting houses – the eventual industrial customer as he takes his product through a specific flavor of certification relevant to his industry segment. Today, the voluntary ‘robust minima’ approach has changed to the need for mandatory, formal certification by a recognized agency (eg. TUV) against IEC61508 of every component that went into silicon design – be it physical libraries, EDA tools, S/W toolchains in addition to the IC design flow itself.
From an IC engineering perspective, numerous safety-centric techniques are employed to improve the SIL level of a device. Some general techniques (in the logic domain) include Error Correction Code (ECC) protection on memories, parity on buses, Memory Protection Units (MPUs) for access control, watch-dog timers for monitoring program flow health, run-time (as against boot-time) diagnostics of hardware units, multi-bit implementation of key control bits with majority voting schemes etc. Devices aiming to achieve higher SIL levels would implement specific variations to basic safety monitoring features – four examples of which are listed below:
- The basic watchdog functionality is enhanced with the following features:
- The watchdog is clocked by a clock that is frequency and phase asynchronous to the CPU clock (on which the code is executing) – in other words using an independent time-base for the monitoring process.
- The code that holds off the watchdog periodically, itself is monitored in hardware to ensure that the ‘holding’ off occurs only within a pre-defined time window (this is to offer some protection against code that has gone rogue but is still regularly and incorrectly holding off the watchdog).
- While many serial communication standards include a signature computation scheme embedded in the protocol (and implemented in hardware in the link-layer), an additional common hardware engine is included in the ASIC with the following features:
- Capable of checking signatures over varying length payloads
- Programmable to implement standard CRCs (eg CRC16, CRC32) in addition to being programmable for arbitrary polynomials
- Fly-by-mode of CRC checking of blocks of data as they ‘fly-by’ between a peripheral and memory.
- Dedicated DMA-mode wherein memory-resident blocks of data are subjected to CRC checks.
- ‘Idiot’-proofing of critical interfaces (eg: making it impossible to abruptly stop a high-speed motor) by including specific hardware features
- Ensuring guaranteed real-time interrupt performance for critical interfaces by carefully crafting the interrupt architecture (eg: hardware vectoring, prioritization and pre-emption of several 100 interrupt sources rather than just a few 10s in the consumer world)
- Enforcing privilege levels and root-of-trust authentication mechanisms in hardware such that an interface cannot be activated / deactivated except from a particular run-time privilege level.
- Borrowing from safety-critical automotive applications (eg; power-train, airbag deployment etc), Functional Redundancy Check (FRC) implementations have started appearing in ultra-safety critical Industrial usages (eg: nuclear plant automation). FRC is an extreme implementation of program execution sequence monitoring at both temporal and logical levels wherein an entire processor core or even an entire computational subsystem is duplicated (sometimes even in triplicate) such that one operates in lock-step with the other and reports / initiates fail safe procedures as soon as the lock-step is lost. Since the cost involved in terms of chip die-size is usually prohibitive, FRC implementations are not common-place except in the most safety critical Industrial applications.
In the physical domain, safety-centric physical design techniques include symmetry avoidance in layout (especially on clock paths), physical over-design and closure against stringent IR/EM/OCV/DFX targets.
Smart Embedded Processing
The challenge of achieving increased performance levels at reduced power budgets has been the primary driver for the proliferation of increasingly ‘smart’ Industrial control devices. The industrial IC space has been historically dominated for decades by the 8/16-bit computing architectures with hundreds of millions of devices deployed in the form of custom ASICs or ‘general-purpose’ Industrial MCUs. Over the past few years, 32-bit architectures (predominantly the ARM Cortex-M and R series) have forced their way into these spaces offering at one stroke significantly improved computational horsepower along with 8-bit class code and power efficiencies. These 32-bit architectures along with a robust tool and software ecosystem brought near-100% C-based – as against the historical assembly-based flow that used to be the norm – to the Industrial mainstream.
The integration of dedicated Floating-Point Engines (FPE) along with the main integer unit has allowed greater computational accuracies in Industrial applications (eg. high performance motor control) at the same time allowing these applications to be natively developed at a higher and more natural abstraction level (eg. in Matlab) without having to bother about the effort and inaccuracies involved in manual or semi-automated floating-to-fixed-point conversions. The FPE integration has followed the route of either a fully IEEE 754 compliant unit in the case of Industrial MCUs or stripped down FPEs supporting only a subset of floating point math operations in custom implementations targeting a specific industrial application. The majority of these FPE implementations for Industrial have limited themselves to single precision mathematical formats.
Embedded ‘smarts’ in the Industrial domain come in many unique ways; even a relatively routine function as a timer is usually unrecognizable from their counterparts in the consumer world – with the same timer being able to perform fractional counting at tens of frequencies, being scaled by tens of pre-scalers, using triggers from tens of peripherals, being able to skip/swallow pulses, being able to perform count value averaging, being able to self-calibrate time-bases against precision clock references, being able to perform multi-channel DMA un-assisted by an external DMA-controller etc. – all of these in addition to performing standard industrial control functions like compare or capture based timing. Often these timers are implemented as complex inter-linked state machines that run into several tens of kilo-gates; in the industrial SoC world the humble timer is neither humble nor a timer.
Devices of today integrate hardware acceleration engines for single (or groups of) sensor / actuator interfaces. For eg; in an electric meter application, the Metrology ASIC may integrate a dedicated Hall-Effect Sensor Engine that detects signal change, signal filtering, pattern matching, position storage and completion of the control loop without CPU intervention. On the ‘output’ side, (Pulse Width Modulation) PWM modules that once used to be ‘just’ duty-cycle modulation counters have become so sophisticated that it is not uncommon to see complete micro-coded hardware engines being integrated as programmable off-load engines.
Product teams designing ICs targeting the Industrial market need to carefully evaluate the challenges and trade-offs required to achieve certifiable functional safety ratings and intelligent computational capability in order to enable successful market adoption of these devices