Author: Warren Mitchell
In Part 1 one of this blog post, we described an important group of technical challenges process industry organizations face as they develop the ‘data foundations’ for their digital transformation and Industry 4.0 programs. In Part 1, we described the connectivity, network, security and data transport challenges these companies deal with, managing diverse OT data sets on the scale that is now possible with modern IT. In this post, we discuss the strategies and approaches organizations have been taking in addressing these challenges and make recommendations.
Solutions to Address OT Data Communication Challenges
Many across the process industries have began their data foundation programs experimenting with single edge to cloud connectivity solutions, which interface individual data sources to an OT or enterprise IT data lake. Each data source is handled independently, requiring the integration of a variety of connectivity, network tunneling, security, compression, load balancing and Store and Forward (SaF) solutions, as organizations have recognized the various challenges described. Often, starting with enterprise historian, companies have developed proprietary solutions or found commercial point connectivity solutions that allow them to reliably transport historical data as well as stream live data directly to on premise applications or even cloud base platforms. For many, this has enabled their data science teams to begin the thoughtful exploration of their operational data and act on a variety of use cases in modern cloud environments.
Additionally, the industrial connectivity landscape has grown rapidly in recent years as organizations look for efficient ways to manage their data. Most OT vendors have or are developing edge gateways which decouple the communication from their platforms and helps future-proof their systems as the demand for operational data grows. The edge-to-cloud architecture has emerged and is certainly growing as organizations explore connectivity options and establish wider communications with their operational data.
While these approaches have been widely adopted and successfully used, they are proving to also have challenges. Where the integration of a limited volume of data or a small number of data sources is required, point solutions for connectivity, tunneling, security, compression, load balancing, and SaF is a viable means of landing operational data in a host system like an OT or enterprise cloud. In some instances, a single vendor may supply solutions for more than one of these important communication functions. Additionally, edge-to-cloud solutions also work well provided company security policies and industry regulations accommodate these types of architectures. SCADA platforms collecting remote well site or wind turbine data is an example of where this architecture has routinely been adopted. In organizations with large industrial operations however, most organizations typically would not allow volumes of operational data to be transported around existing network and data security infrastructure. In these instances, network tunneling solutions must be adopted to transport data across firewalls and independent IP address domains.
For a growing number of companies, as their digital programs expand and encompass multiple operations or even the enterprise, the requirement to manage and support dozens or even hundreds of individual point solutions becomes onerous. The challenges associated with managing the communications between existing data sources and a host of widely deployed applications are familiar to most process industry companies. As illustrated in the figure below, the number of connectivity solutions can grow geometrically as the number of software applications and data sources grow.
Connector ServicesIn the Connector Landscape column in the figure above, several connector service objects are shown configured in L1 all the way to the cloud in L5. Connectors have several functions:
First, connectors are implemented to communicate with the available interface software on the host system. It is important to point out that multiple connectors communicating with individual data sources from the same location can be coordinated to communicate through single TCP connections in existing firewalls. The inmation ‘core’ service shown brokers the communication between individual data sources and the system itself. The connectors are intelligent and can be configured in several ways to address some of the communication challenges we’ve described. By default, data is encrypted using the latest standards and compression is done with ‘Snappy’ lossless data compression ensuring secure and efficient data transport. Importantly, the connectors can also be configured and tuned to control the network bandwidth they consume on the networks they reside on. This capability ensures the transportation of data from the host does not disrupt network in any way for any production critical system. As was described, in Part 1, each of these functions has been a challenge in the data foundation projects of process industry organizations over the past several years.
Second, connectors provide native store and forward (SaF) capabilities such that if a connection is lost between the connector service and the inmation core service, data is buffered by the connector until the connection returns. The buffered data is then transported and backfilled in the inmation repository once the network connection returns. This functionality ensures raw data is not lost to the final host system eliminating the requirement to cleanse or prepare incomplete data sets prior to downstream analytics. Clearly, missing data makes it difficult to report accurate production, build reliable inferred property models, run fault detection algorithms, or develop more advanced data science experiments for example.
Third, as microservices, connectors can execute software scripts, so they can be configured to build in additional functionality at the edge if desired. Scripts running as part of the connector service can be used to contextualize data, perform analytics or even spawn entirely new objects for the system to exploit. Connectors can provide sophisticated edge computing functions as required at any data source for a wide variety of purposes. Inmation uses an open source scripting engine called Lua for this purpose. Lua is well known in the computer gaming industry for its speed and efficiency (www.lua.org) and is an integral part of the platform.
System:inmation was developed using MongoDB as it’s data repository. Classified as a NoSQL database, MongoDB uses JSON-like documents with flexible schemas to store data of all types. Unlike legacy process historian platforms, the inmation repository is capable of efficiently storing and recalling any and all operational data types commonly used across the process industries. Most today recognize that it is not only the time series operational data generated in the plant control systems that is useful. The consolidation, synchronization, organization and data modeling functions of inmation bring context to the combined data set that is in practice, impossible otherwise. Arguably, it is the present inability to efficiently bring data together and deliver it in context to a user’s job function that is a major barrier to the digital transformation of the process industries. The capabilities of modern database technologies, in combination with the requisite connectivity, data modeling and visualization functionality of platforms like sytem:inmation, have finally begun to eliminate barriers in these businesses. Today, modern cloud computing and database technologies are completely transforming the OT data world as they allow businesses to manage diverse OT data in high performance, scalable environments with a myriad of open-source and proprietary toolsets. What was impossible merely a decade ago is being accomplished today by organizations who today manage high resolution data streams spanning their enterprises and numbering in the millions.
IT/OT Integration :
A primary objective in the development of inmation was to provide straightforward, open, high performance access to operational data to users across process industry enterprises. To accomplish this, multiple open interfaces were developed for system:inmation providing access to consolidated and contextualized information from the inmation repository.
OPC-UA (Unified Architecture):
As described, OPC is a platform independent, open industrial communication standard used to expose the OT data from industrial systems. For engineering, operations and other OT users, inmation was developed with a fully compliant OPC-UA server making connectivity to commonly used OT visualization and analytic toolsets straightforward. Today, most OT applications have been developed with OPC clients for this purpose. The OPC-UA and OPC classic interfaces exposes real-time, historical, and alarm/event data per requirements of the standard for example.
A WebAPI (Application Programming Interface) was also developed to provide access to system:inmation. The WebAPI was developed using Remote Procedure-Call’s (RPC’s) and is hosted in a Windows Service. It can be used by any external application, as an interface using the HTTP, or WebSocket interface and is used commonly to connect external applications to the data repository. As an example, VisualKPI from Transpara (www.transpara.com) is a mature web based visualization and KPI reporting tool that leverages the inmation WebAPI extensively.
Lua is a proven, robust and very fast open-source scripting language that has been embedded in the inmation platform (www.lua.org). Lua gives inmation the ability to customize the platform for the needs of individual users or use cases at hand. Lua can be used to fetch data, create objects, modify object properties, or perform calculations and contextualize data for example. Lua is key to the flexibility and customizability of system:inmation.
API clients have been developed for the following environments: .Net, Node.JS, NodeRED, and Python utilizing the Inmation WebAPI. These client API’s allow end-users to work in a variety of environments leveraging a potentially enormous variety of tools and applications. These client API’s allow users to read and write values to objects in the system:inmation namespace, read historical data, subscribe and unsubscribe to/from data changes and much more.
Data Studio :
Finally, Data Studio is the main client application for system inmation. It is designed to be a secure and singular configuration/management interface for the platform. Through Data Studio, users are able to access the entire network of data sources, the database and various API’s. User profiles and security settings provide access to only those elements of the system permitted. The Data Studio interface and customizable toolset gives regular users the ability to create, configure and control their workspace.