I’m soon finishing my PhD and this blog was supposed to be my research diary. I’m not sure if I will keep it after my PhD, but in case I don’t, I would like to dedicate a series of post to the authors and the books that have shaped my thinking during my doctoral research. This first post is dedicated to John Holland’s book ‘Signals and Boundaries: Building blocks for Complex Adaptive Systems‘.
Researchers on regime shifts are still debating what constitutes a ‘real’ regime shift, what can be considered evidence, and what are the best techniques for detection. However, in my opinion, much of the contestation arguments are due to ambiguous terminology and arbitrary selection of system boundaries both in space and time. When I started reading Holland’s work I was searching for rigorous methods that help distinguishing system boundaries. What I found was an elegant set of ideas about the fundamental principles of complex adaptive system and the challenges ahead when it comes to the computational techniques required to capture their dynamics.
The first two chapters outline the fundamental principles of complex adaptive systems (CAS), this is features that are shared by markets, ecosystems, cells, language, interacting atoms, or governments. CAS are characterised by: i) a diversity of interacting elements (agents), ii) recirculation mechanisms that allow multiplier effects (reproduction or trends), iii) niches and hierarchies that allow pockets of experimentation, mutation, and ultimately iv) coevolution. CAS scholars not only need to acknowledge such features on their systems of study, but they also need to investigate the mechanism that give rise to such features. On chapter two Holland introduces the need of a theory for the exploration of such mechanisms. He first reviews the existing theories used to study CAS and their shortcomings: Lodka-Volterra type of models (ordinary differential equations), cellular automata, agent-based models, artificial chemistry models, neural networks, classifier systems and network theory. I really enjoyed that part. Then he introduces the requirements for a signal / boundary theory, an ambitious theory that would take the most of the previous attempts while overcoming their shortcomings. The requirements are a formal grammar that specifies allowable combination of building blocks (genes, letters, species, etc), an underlying geometry that allow for inhomogeneities, a grammar that allow programmable agents to execute signal-processing programs that includes reproduction. Armed with these concepts the book develops by taking you step by step of what is needed to recreate features of CAS: agents and signal processing, networks and flows, adaptation, recombination and reproduction, boundaries and hierarchies. The last chapters tie all the parts together and formalise the computational procedures for creating the grammar and mathematically represent such models.
Chapters 7 and 8 were the ones relevant for my questions about system’s boundaries. Holland relies strongly on ‘billiard ball’ and urn models used in chemistry to explain reactions between agents and how membranes evolved to permeate certain elements and not others. His simple prose and clear examples helped me to better appreciate basic probability theory and how the problem of differentiating systems units are so similar to the problem of identifying network communities, or defining niches in ecology. The data should be able to reveal the system boundaries if one has complete information of the interaction of its elements. That’s seldom the case in ecology, but there is certainly growing datasets to play around in other disciplines such as cell phone data, emails, or trade. See for example language community identification for Belgium based on cell phone data: paper here and other visualisations here by Vincent Blondel’s group. Although the data driven identification of system boundaries is possible, it will never by a sharp line as one would expect. For example, not all genes on your body are yours, microbial communities living inside us mix their genes with the ones in our cells, help us maintain a good health and even influence our mood.
Although the book applies a lot of probability theory to represent interactions, a couple of quotes draw my attention on the limits of statistical approaches to study complex systems.
In attempting to answer the questions [how do agents arise? How do agents specialise? How do agents aggregate into hierarchical organizations?], it is important to examine the formation of agent boundaries, both internal and external, and the effects of those boundaries on the flows. This emphasis directs attention to building blocks that can be combined to define boundaries. The building blocks must, of course, be based on available data. Though there are extensive data sets for most signal/boundary systems, and we can rather easily derive a great array of reliable, sophisticated statistics from such data, such statistics do not, of themselves, reveal building blocks or mechanisms. Anatoly Rapoport, one of the founders of mathematical biology, long ago pointed out that you cannot learn the rules of chess by keeping only the statistics of observed moves (Rapoport 1960). We confront the same difficulty when using statistics to study signal/boundary interactions. The interactions are just too complex (nonlinear) to allow theory to be built with the linear techniques of statistics. (Holland, 2012:38)
And then toward the end of the book he makes a nice bottom up definition of niche, contrary to the one we have in Ecology, while going back again to the argument of statistical approaches to understanding mechanisms underlying complexity:
The concept of community within a network (Newman, Barabasi, and Watts 2006) provides a starting point that leads naturally to an overarching definition of niche. A niche isa diverse array of agents that regularly exchange resources and depend on that exchange for continued existence. Most signal/boundary systems exhibit counterparts of ecological niche interactions – symbiosis, mutualism, predation, and the like. From this definition of niche, it is relatively easy to move to an evolutionary dynamic of niches, because the conditional actions that underpin the interlocking activities can be defined and compared in a uniform way by means of tag-sensitive rules (See chapter 3).
Defining niches in terms of tag-based rules leads to an important mechanism-oriented question: Can mechanisms for manipulating tags (such as recombination), in combination with selection for the ability to collect resources, lead from simple niches to complex niches? The conditional actions of the agents lead to non-additive effects that cannot be usefully averaged, so a purely statistical approach isn’t likely to provide answers to this question. Statistical approaches wind up in the same cul-de-sac as statistical approaches to understanding computer programs. The ‘trends’ suggested by a series of ‘snapshots’ based on the average over agents’ activities rarely give reliable predictions or opportunities for control – instead of ‘clearing’ of a market, we get ‘bubbles’ and ‘crashes’. (Holland, 2012:287)
I’ve long questioning the role of statistics for identifying causality in ecosystems; most statistical methods rely on linear regression which often implies avoiding co-linearity and assuming independence. Most statistical procedures are not suitable for understanding feedback mechanisms, from neural networks to structural equation modelling. Only recently a different approach has been developed that embrace the interdependent nature of variables in ecosystems: convergent cross mapping. It still to be seen what we can learn about causality in ecosystems by using such methods.
Holland’s book was an inspiring companion on the public transportation of Stockholm while commuting to the climbing hall, that’s where I read most of the book.