Technology Infrastructure Resilience
Resilience refers to the capacity of a system to effectively respond to, and adapt to, changes without significant degradation to its overall functioning and structure. Infrastructure resilience refers to the capacity of an economy’s technology infrastructure to deal with external and internal alterations that may affect its continued operations.1 This resilience of an infrastructure system is a product of both external factors, as to the system’s dependence upon its environment, and internal factors as to the system’s network structure, its overall degree of connectivity and its dependence upon centralized hubs and critical bridging links. With the modeling of infrastructure robustness, we are interested in how failures occur, how they spread within the system and how resilient the system is to those failures, and researchers are particularly interested in critical infrastructure for obvious reasons. We are so dependent upon these infrastructure systems that we hardly notice them until a fault occurs. Therefore, the ability to model and analyze the behavior of these critical infrastructures and their interdependencies is of vital importance.
Critical infrastructures are defined by US Homeland Security as such: ‘Critical infrastructure is the backbone of our nation’s economy, security, and health. We know it as the power we use in our homes, the water we drink, the transportation that moves us, and the communication systems we rely on to stay in touch with friends and family. Critical infrastructure is the assets, systems, and networks, whether physical or virtual, so vital to the United States that their incapacitation or destruction would have a debilitating effect on security, national economic security, national public health or safety, or any combination thereof.’
Due to a number of features to the industrial age model of design and technology development, our industrial infrastructure has evolved to become highly unsustainable and along many dimensions, we might say fragile. Key factors to the industrial age model that have contributed to this are its linear model of take, make and dispose that requires a high input of resources from the environment, also its centralized structure that creates critical hubs and its model of batch processing that requires standardization, thus reducing the diversity in the system. Added to this, globalization and information technology have networked our world creating many interdependencies between different infrastructure systems. The Amsterdam electricity exchange, for example, was the first power exchange to be entirely conducted through the Internet, making the electrical infrastructure dependent upon their IT infrastructure. And today almost all of our products depend upon the working of a globally distributed supply network. Thus, we are increasingly dependent upon global networks whose complex inter-linkages and interdependencies we only partially understand. With every new shock to the system like the financial crisis of 2008, we become more aware of these global networks and the need to be able to properly model and analyze them.
Resiliency & Robustness
What we are really interested in is the continued functioning of these infrastructure systems, and this what we call their resilience. Resiliency is the capacity for a system to maintain functionality despite the occurrence of some internal or external perturbation to the system, which is very similar to robustness, the ability to withstand or overcome adverse conditions. We can understand robustness along a number of different parameters primarily relating to the system’s dependency upon its external environment and the internal structure and make-up of the system. In terms of the system’s dependency on its environment, we are asking: What inputs or range of inputs does the system require? Because the technology infrastructure that runs our global economy is a dynamical system, like all dynamical systems it requires an almost constant input of resources to maintain that dynamical state 24/7 around the globe. These infrastructure systems need a constant input of resources and energy from the environment. Without it, they will start to degrade very quickly. Like all dynamical systems, they are in a precarious situation, engineers and administrators trying to maintain their high- level of functionality and resource throughput when things can go wrong at any time.
As we all know, our modern infrastructure systems have developed to become highly dependent upon a particular subset of energy and resource inputs. This has become a key source of vulnerability as everything from plastic to shampoo to hairspray to all forms of manufactured products are dependent upon the stable input of petroleum, and of course all forms of energy likewise from heating, to transportation to electrical generation. As an analogy we might think of a tree that receives all of its nutrients from its trunk that then ramifies out to all the branches. Being so dependent upon a single input value is a vulnerability that reduces the system’s robustness. Moving towards distributed generation will help to diversify this set of input values and increase its dependability. Moving towards a circular economy is another factor that reduces dependency upon the input of raw materials into the system. We can also think about this in terms of connectivity. Can the dynamical system ensure its continued access to sufficient resources required for its functioning? Thus, we are interested in what will happen if we remove one or more of these linkages. With the advent of network science much of this analysis can now be done using network theory, as a system with a high level of dependency upon a single input would be a centralized network, whilst diversifying these dependencies would result in a distributed network, which are known to be more robust. Network analysis of infrastructure systems is becoming a key tool and rising topic of research.
Next, we want to consider the internal structure and makeup to the system. Again, we can represent this as a network. We want to know how centralized the network is as a centralized system. Such as a hub and spoke air traffic network will be susceptible to strategic attack, taking down one major hub will drastically reduce the network’s level of connectivity and may result in its disintegration. This is why distributed systems like the peer-to-peer file sharing networks are typically very robust. They often come under attack from law enforcement agencies due to copyright violations, but because the system is distributed there is no single node or cluster of major nodes through which you can damage the entire network. These distributed networks typically have a low level of specialization between components, meaning any node’s function can be easily replaced by another or simply duplicated to another location. The first generation of Internet peer-to-peer networks like Napster resided on a single server. Due to this, it was possible to take the network down. The second and third generations of P2P networks are able to operate without any central server, thus eliminating the central vulnerability by connecting users directly to each other remotely.
This kind of distributed network has a very low level of criticality. They are extremely resilient and can be, for all intents and purposes, virtually impossible to destroy. And this is in strong contrast to many of our centralized industrial systems such as broadcast media, cities and airports, that all exhibit a high level of criticality because the networks are dependent upon centralized nodes. But it is not just dependence upon a single set of major hubs that is important to robustness, but also dependence upon a limited number of linkages. These critical linkages between nodes are called bridging connections. Peer-to-peer networks also have a high level of resilience, owing to their low level of linkage criticality. Any linkage between two computers can be replaced by using a proxy server as an alternative pathway, meaning the network is not dependent upon any specific connection. This independence from any particular node or edge is central to achieving robustness.
Next, we need to consider how failures spread within the system. A primary consideration here is the overall degree of connectivity to the network. With a relatively isolated system like a small farm in a rural community, failures don’t spread very far. Isolation through low connectivity is the most basic contagion mechanism. But if we take an urban center like central Hong Kong, a dense network of many interconnected infrastructure systems have to be working for it to run smoothly. Small glitches propagate quickly. In these highly interconnected and coordinated systems, we can also get positive feedback loops that can work to amplify some small change into a large effect. This is the butterfly effect that we previously mentioned, and it is often the source of major systemic shocks such as bank runs or cascading failures in power grids.
Key barriers to disaster propagation are redundancy and buffers. These can be engineered into the network, and are also an emergent phenomenon of maintaining diversity within the system. There is often a trade-off between diversity and optimization. Supply chain networks are a good example of this. Holding just the right amount of inventory is crucial to optimizing costs. After all, inventory costs are incurred every hour of every day in areas including warehouse storage, heating and electricity, staffing, product decay and obsolescence, making for a strong drive towards every increasing optimization and just-in-time practices, which can lead to self-organized criticality where we reduce the diversity of the components and the buffers between them to such a low level that we position the entire network at a critical point where a small event can trigger an avalanche of failures. And there is a core tension here between optimization of components and the system’s overall robustness. It takes intelligent design and management to integrate both, thus maintaining an efficient, sustainable system.
As different technologies and systems converge, the interconnections and interdependencies across different infrastructure systems increase, and so does the level of unknown linkages increases. A basic premise of complexity theory is that we never know all of the linkages in these complex systems. As an example of unknown interdependencies, we might think of the 2011 flooding in Indonesia, a country that accounts for approximately 25% of the world’s computer hard disk production. This flooding caused a disruption to the manufacturing supply chains for automobile production and a global shortage of hard disks, which lasted throughout 2012. Now some people know that Indonesia is a major producer of hard drives. Less people know these hard drives are in our cars and very few know the dependency of the automotive industry on this critical supply linkage.
This is an example of teeny linkage in the vast complex system of our global economy, that is both interdependent but also no one manages or fully understand. There is no instruction manual where every interdependency is listed. This is the nature of our distributed globalized world, where IT enables people to set up their own networks. Companies, financial institutions, engineers, software developers, criminal gangs, government security agencies, and hackers – they just set up these connections. They don’t have to tell anyone, there is no government of globalization to keep track of it all. We simply do not know all of these connections, and often we only really find out about all these linkages when the system breaks down. Because we can never know all of the linkages within a complex system, we can never say it is fully fault tolerant and instead often the best option is building robustness into the system through diversity.
1. (2017). Dhs.gov. Retrieved 6 July 2017, from https://www.dhs.gov/xlibrary/assets/niac/niac-a-framework-for-establishing-critical-infrastructure-resilience-goals-2010-10-19.pdf