Written by:
Dr Emma Taylor CEng FIMechE FSaRS, Head of Digital Safety, RazorSecure
An award-winning engineer with more than 30 years of experience across transport, aerospace and energy.
Recently, I took part in the kick off meeting of the new IEC rail cybersecurity standard (IEC63542) along with national delegation representatives from around the world. This is the latest step forward towards creating a harmonised implementation of rail cybersecurity, integrating with the broader regulatory framework. It builds on the CENELEC TS50701, and will fully incorporate the expertise and experience of manufacturers and operators from across the globe through consensus. But, closer to home, I’ve been working with some of our partners to have a think about exactly what are the challenges when you are working to create a clear and consistent framework that incorporates regulatory requirements around all aspects of software-intensive digital systems. We're building software-based systems, but how do we know we're doing the right thing at each step of the way, and how do we support the watchmen?
With more than 100 years of direct engineering, software, safety and cybersecurity experience between between myself and the blog contributors, we’ve used our personal plentiful experience of where things are a bit difficult to bring forward some insights. This blog outlines some of our collective thoughts on the specific challenges ahead as we move from principles and goals to requirements and implementation. If you are starting to consider what practical looks like when it comes to implementing cybersecurity alongside existing safety regulation, and wondering where the biggest challenges might be, I hope that this may be of use to you:
Cybersecurity cannot just be ‘added’ on top of safety, as it may generate ambiguity.
Many excellent standards and frameworks exist, both for general and tailored software, and for cyber security assurance, but these haven’t yet been brought together into one integrated process. A recent Code of Practice published by the IET on safety-security highlighted some of the reasons why and identified points of coordination. This means it is harder to identify and calibrate what good looks like, as when bringing them together an ambiguity arises, which would ultimately affect people’s abilities to be sure of what is compliant and what is not against that combined set of standards and frameworks.
A practical approach using real world scenarios can be used to help create clarity.
A number of software-related incidents have been analysed and reported widely, and more are discussed within individual organisations. By default, these have been examined because they caused or could have caused safety-related issues. These are rich in data and understanding, and by using these as reference cases, we can test whether our existing safety and cybersecurity standards and frameworks are a complete solution as they stand. Put simply, if you had applied the standards and frameworks ‘as is’, would you have been able to spot the incident before or as it occurs, identify the immediate cause and the causal factors, and when they occurred in the system life cycle. Based on this would you be able to recommend what measures needed to be taken. It is easy in hindsight to do this, which is why it is important to train people on both existing real world scenarios as well as what might happen in the future i.e. create reasonably foreseeable scenarios. It’s best to build on what is already there, building on our existing understanding of digital-physical systems, in order to take a hands-on case study based approach to developing knowledge and understanding.
A system-level view of a digitally-informed system is required.
Building on the position that the existing standards don’t yet fit fully together, and that a practical case study based approach is needed to help find the gaps, it’s worthwhile thinking clearly about what framework to use in order to decide whether what is there is ‘good enough’ to meet requirements. Our perspective is that in order to be effective, such a framework needs to be more about how the addition of software intensive digital technology affects the resilience, safety and security of the overall system, rather than compliance with individual requirements and standards, and how potential defects in process and compliance can be easily identified. This is a true system-level view.
Software intensive digital technology affects system resilience, driving the need for a whole system view.
The question is then, what exactly is the boundary of that digitally-informed system? TS50701 provides a good starting point. We believe that it is essential that all duty holder systems that contribute to the operation of the railway are included where their loss could impact on the ability of the company to operate. This includes control equipment, monitoring systems, storage systems & company administration. A holistic approach is needed to be rigorous. Typically this thinking triggers the use of systems engineering standards and theoretical models, but this may take things in the wrong direction in terms of practicalities. A simple ‘seed’ is needed as a ‘starter’ to help set a baseline i.e. what ‘good’ looks like.
Four common factors can be used to describe what ‘good’ looks like.
Building on recent publications such as the IET CoP, we’ve created four simple ‘seed’ questions that clearly address safety, software and cyber security, and can provide a framework for organisations to “manage the health and safety risks that fall out of cyber security failures e.g. Overcrowding; disruption; signalling failures etc.”. Does an organisation:
Have an awareness of the external environment including threats and vulnerabilities, supplier relationship and supply chain management (understand whole system)
Understand the hazardous states, the chain of events (causal sequences) that lead to hazardous states and the part that digital technology can play, even if not previously seen or considered very unlikely to occur (understand what the whole system might do i.e. reasonably foreseeable scenarios)
Implement continuous monitoring of the system’s condition or state, identifying (pre-)hazardous states and responding appropriately (understand what the whole system is doing at any particular time)
Implement management of risk arising from external environmental changes and internal changes to the system’s condition or state (understand how to maintain control of a changing complex system)
Asking simple questions around supply chain (‘who made this’) and taking a consequence-driven view (‘what happens if it goes wrong’) helps as well.
These common factor ‘seed’ questions are also consistent with supply chain management, and highlight the through lifecycle challenges associated with software procurement (specification, design, build, implementation and on-going operation/maintenance) and help to distinguish between updates (fixing issues) and upgrades (planned improvements), although there isn’t often a clear line between reactive and proactive digital maintenance. It can be useful to create a bowtie-type model to start to create links between causes and consequences, although it will be a challenge to create a comprehensive map.
There may be opportunities for cross-sector learning.
For example other highly regulated sectors might provide a useful steer on setting the level of what good looks like and how much detail can be generated. Working across multiple sectors using NIS as an inspection framework it has been possible to set a baseline against the NIS CAF framework in a number of sectors e.g. not achieved, partially achieved, achieved against Indicators of Good Practice. This was achieved despite challenges caused by a combination of lack of available information provided by the companies and the need to further develop skills across all involved in the activity. It might not yet be possible to stretch three levels to five (such as what’s used in the software maturity model SW-CMM’s Initial, Repeatable, Defined, Managed, Optimised) but the field is rapidly changing. The more that more people can assess the same practical scenarios and consistently come up with the same answer, the greater the accuracy of the assessment, and granularity of recommendations on what needs improving first.
Diverse backgrounds and individual experience from multiple perspectives adds value.
Getting a good cross-sector perspective needs a diverse range of people and backgrounds to look at the problem. No single aspect of safety, reliability (resilience of operations) and security can address the likelihood and impact of software-based and cyber security incidents. A holistic approach across all three themes is required for rigour, and forms the basis of the combined core themes of our approach. We, like everyone else, may be constrained by our prior experience, including in writing this blog, but we continue to engage widely with people with a wide range of experience, engineering and non-engineering, across all types of professional standards and certifications, up to senior roles in industry, academia and government. We’d like to encourage people to build broad teams and consult widely.
Dive into the hidden interfaces between IT and OT systems.
Just as IT systems may be used to host maintenance information for OT systems on rolling stock and trains, encryption keys for those systems may also be managed through IT infrastructure (PKI). Within a sector, different organisations' interfaces between IT and OT vary considerably in definition, maturity and effectiveness, and as informed by our recent work on Digital Maintenance, and various depot visits it is also important to consider the use of IT and OT in depots and mobile equipment, as well as as part of signalling infrastructure and rolling stock. Map out and look at data flows to get an understanding of what systems manage what information, and identify where any interruption in one area might lead to operational problems in another area. It’s a practical systems thinking approach.
Blending safety and cyber security is inherently a complex problem.
What works well for one established mainline (‘heavy’ rail) operator in one country might not be appropriate for a new urban metro development in another one. Local factors, commercial pressures and technical maturity all play a part in setting what good looks like and what can reasonably be achieved. This complexity doesn’t necessarily mean that it’s a complex starting point.
Emma and her RazorSecure colleagues would like to acknowledge the valuable contribution made by Mike StJohn-Green, Evan Jones of Complete Cyber and Richard Thomas of BCRRE for their contribution to this article, including the ideas and concepts around an integrated approach, and highlighting the dependencies and constraints and priorities in this blog. RazorSecure would also like to thank their colleagues in various CENELEC, ISO and IEC projects for their contribution to creating a common understanding in the important topics of safety and cybersecurity, for the transport sector, and more broadly.