Freitag, 22. September 2017

Computational meta'omics for microbial community studies

Segata et al. (2013): Computational meta'omics for microbial community studies

This article reviews "the technological and computational meta’omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges". As the abstract says, the technologies that are already available allow to "comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts".

What kinds of approaches is this review about? The authors write:
Although the ubiquity and complexity of microbial communities have been well studied for decades, advances in high-throughput sequencing have provided new tools that supplement culture-based approaches both in their molecular detail and in their accessibility to a broad scientific community. [...] More recently, genome-wide sequencing approaches, such as metagenomics and metatranscriptomics, have further expanded the experimental tools available for studying the microbiome. Such ‘meta’omic’ approaches expose the genes, transcripts, and eventually proteins and metabolites from thousands of microbes to analysis of biochemical function and systems-level microbial interactions. [...] Metagenomic, metatranscriptomic, and other wholecommunity functional assays provide new ways to study complex ecosystems involving host organisms, biogeochemical environments, pathogens, biochemistry and metabolism, and the interactions among them. Interaction modeling is particularly relevant for human health, and current host–microbe–microbiome systems most often rely on mouse models of the interplay of commensal microbes, pathogens, and hosts. [...] [I]ntegrative meta’omic approaches and advanced computational tools are key for a system-level understanding of relevant biomedical and environmental processes[.]
What is the aim of a meta’omic study and how is it done? Quoting the authors of this paper:
A meta’omic study typically aims to identify a panel of microbial organisms, genes, variants, pathways, or metabolic functions characterizing the microbial community populating an uncultured sample. [...] Metagenomic sequencing, if performed at a sufficiently high coverage, can in some cases allow reconstruction of complete genomes of organisms in a community. [...] [R]ecent years have seen an explosion of metagenome-specific assemblers, which use strategies to tease apart sequencing artifacts from true biological ambiguity within communities. [...] Whole-genome assembly from metagenomes is impossible in most cases, and such assemblers instead aim to provide the largest reliable and useful contigs achievable from their input sequence reads.
These approaches "rely on reference genome catalogs" such as the Human Microbiome Project and the Genomic Encyclopedia of Bacteria and Archaea, which "are systematically filling the gaps in the sequenced portion of the phylogeny".

Another purpose of this is "gene function annotation and metabolic reconstruction":
Microbial communities can be seen not only as groups of individual microbes, but also as collections of biochemical functions affecting and responding to an environment or host organism. Metagenomics can thus also identify the genes and pathways carried by a microbial community, and metatranscriptomics can profile their expressed function. [...] Functional profiling using reference information can be based either on reference genome read mapping (at the nucleotide level) or on translated protein database searches.
Meta’omics can also be used to investigate "microbial ecosystem interaction and association networks", but:
All of these current approaches, however, identify only the descriptive covariation of multiple microbes; they characterize neither the mechanisms of nor the regulatory ramifications of such variation. There is thus a pressing need for multiorganism metabolic models to explain such interactions and for a systems-level understanding of their effect on microbial signaling and growth.
Metatranscriptomics in particular can be used to unravel community expression patterns:
Most current meta’omic tools and studies focus on metagenomic DNA sequencing, but metatranscriptomics is becoming increasingly practical as a window into the regulation and dynamics of microbial community transcription. [...] The major challenge faced in metatranscriptomics is the isolation of microbial mRNA, which usually makes up only a small percentage of total microbial RNA and an even smaller proportion of total RNA if host nucleotides are present.
Single-cell sequencing provides an alternative approach to accessing novel information about uncultured microbes. [...] Current single-cell approaches first isolate single microbial cells by sorting them, lyse them separately, amplify and label them separately, and sequence the resulting pool. The subsequent analysis of single-cell sequence data thus relies much more heavily than do meta’omics on assembly, but fortunately in a less-challenging setting. Recently, elegant combinations of both single-cell genomics and metagenomics have begun to emerge, e.g., in the sequencing of a novel, low-salinity ammonia-oxidizing archaeon from an enrichment culture. Such a combinatorial approach may continue to prove very useful, as the single-cell perspective on novel organism-specific sequences tends to complement whole-metagenome and metatranscriptome overviews of diverse communities.
Meta’omics provides an important tool for studying evolution within microbial communities, which can occur on two very different time scales. Over the course of days, weeks, or the years of a host’s lifetime, microbial genome plasticity allows remarkably rapid acquisitions of novel mutations and laterally transferred genes. Over the course of millennia, however, the overall structure of host-associated communities, their phylogenetic composition, and their microbial pan-genomes can evolve more slowly in tandem with their hosts’ physiology and immune systems. [...] Characterizing the coevolution of quickly evolving complex microbial communities with relatively slowly evolving eukaryotic hosts remains a challenging and largely unexplored field.
One of the ultimate goals of microbial community systems biology is to develop predictive models of the whole-community response to changing stimuli, be it their temperature or pH in the environment, or dietary components in a host gut. Such models may be mechanistic, relying on joint metabolic networks as discussed above, or a descriptive systems biology of microbial physiological ‘rules’ may emerge as a simpler alternative. No unifying approach yet exists, although meta’omic data have provided training input for several first attempts. [...] Given the complexity of most ‘wild’ microbial communities, one of the most promising approaches for such validation has been in the construction of model microbial communities. These have been successful both entirely in vitro, by scaling up the ex vivo coculture of multiple organisms, and when associated with hosts in vivo.
The authors conclude:
In combination with innovative computational models, meta’omics in such environments and in vivo will continue to improve our understanding of microbial community systems biology.

Exploring atomic resolution physiology using molecular dynamics simulations

Dror et al. (2010): Exploring atomic resolution physiology on a femtosecond to millisecond timescale using molecular dynamics simulations

The article begins with a dramatic introduction:
Recent dramatic methodological advances have made all-atom molecular dynamics (MD) simulations an ever more useful partner to experiment because MD simulations capture the atomic resolution behavior of biological systems on timescales spanning 12 orders of magnitude, covering a spatiotemporal domain where experimental characterization is often difficult if not impossible.
The motivation for this:
Computational models, especially those arising from MD simulations, are useful because they can provide crucial mechanistic insights that may be difficult or impossible to garner otherwise[.]
This is further explained in the introduction:
An all-atom MD simulation typically comprises thousands to millions of individual atoms representing, for example, all the atoms of a membrane protein and of the surrounding lipid bilayer and water bath. The simulation progresses in a series of short, discrete time steps; the force on each atom is computed at each time step, and the position and velocity of each atom are then updated according to Newton’s laws of motion. Each atom in the system under study is thus followed intimately: its position in space, relative to all the other atoms, is known at all times during the simulation. This exquisite spatial resolution is accompanied by the unique ability to observe atomic motion over an extremely broad range of timescales—12 orders of magnitude - from about 1 femtosecond (10^-15 s), less than the time it takes for a chemical bond to vibrate, to >1 ms (10^-3 s), the time it takes for some proteins to fold, for a substrate to be actively transported across a membrane, or for an action potential to be initiated by the opening of voltage-gated sodium channels. MD simulations thus allow access to a spatiotemporal domain that is difficult to probe experimentally.
What is this for? The authors write:
Simulations can be particularly valuable for membrane proteins, for which experimental characterization of structural dynamics tends to be challenging. [...] A wide variety of physiological processes are amenable to study at the atomic level by MD simulation. Examples relevant to membrane protein function include the active transport of solutes across bilayers by antiporters and symporters; the passive transport of water, ions, and other solutes by structurally diverse channels; the interconversion of transmembrane electrochemical gradients and chemical potential energy by pumps such as the F1F0-ATPase and the Na+/K+-ATPase; the transmission of extracellular stimuli to the cell interior by G protein–coupled receptors (GPCRs) and tyrosine kinase receptors; and the structural coupling of cells and organelles to one another by integrins and membrane curvature modulators.
The paper further presents several case studies, such as "Permeation through a water channel: aquaporin 0 (AQP0)", "Reconciling discordant experimental results: ß2-adrenergic receptor (ß2AR)" and "Permeation and gating of an ion channel: Kv1.2".

As "major strengths and limitations of MD as a technique for molecular physiology", the authors primarily suggest "accessible timescales" ("MD simulations have historically been most powerful for simulating motions that take place on submicrosecond timescales"). A further paragraph in this chapter deals with "accuracy and errors". Also, "system size" is to be considered when designing an MD simulation study, and:
Classical MD simulations treat covalent bonds as unchanging. To simulate chemical reactions, one must use alternative techniques such as quantum mechanics/molecular mechanics simulations, in which the bulk of the system is simulated as in classical MD, but a small part is evaluated using more computationally intensive quantum mechanical approaches.

Computational imaging in cell biology

Eils et al. (2003): Computational imaging in cell biology

This paper deals with "computational methods that (semi-) automatically quantify objects, distances, concentrations, and velocities of cells and subcellular structures" and thus generate quantitative data that "provide the basis for mathematical modeling of protein kinetics and biochemical signaling networks".

In the introduction, the authors write:
Fluorescent dyes such as fluorescein and rhodamine, together with recombinant fluorescent protein technology and voltage- and pH-sensitive dyes allow virtually any cellular structure to be tagged. In combination with techniques in live cells like FRAP and fluorescence resonance energy transfer, it is now possible to obtain spatio-temporal, biochemical, and biophysical information about the cell in a manner not imaginable before.
This is continued by an elaboration on "methods for segmentation and tracking of cells".
Nowadays, techniques for fully automated analysis and time–space visualization of time series from living cells involve either segmentation and tracking of individual structures, or continuous motion estimation. For tracking a large number of small particles that move individually and independently from each other, single particle tracking approaches are most appropriate.
For the determination of more complex movement, two independent approaches were initially developed, but recently have been merged. Optical flow methods estimate the local motion directly from local gray value changes in image sequences. Image registration aims at identifying and allocating certain objects in the real world as they appear in an internal computer model. The main application of image registration in cell biology is the automated correction of rotational and translational movements over time (rigid transformation). This allows the identification of local dynamics, in particular when the movement is a result of the superposition of two or more independent dynamics. Registration also helps to identify global movements when local changes are artifacts and should be neglected.
Several paragraphs follow that explain how these methods work. The paper also mentions computer vision, visualization and quantitative image analysis.
A great advantage of the combination of segmentation and surface reconstruction is the immediate access to quantitative information that corresponds to visual data. These approaches were designed to deal particularly with the high degree of anisotropy typical for 4-D live-cell recordings and to directly estimate quantitative parameters, e.g., the gray values in the segmented area of corresponding images can be measured to determine the amount and concentration of fluorescently labeled proteins in the segmented cellular compartments.
A challenge for future work is to better understand the biomechanical behavior of cellular structures, e.g., cellular membranes, by fitting a biophysical model to the data - an approach already successfully implemented in various fields of medical image analysis.
Finally, the paper mentions a couple of applications and concludes:
In combination with models of biochemical processes and regulatory networks, computational imaging as part of the emerging field of systems biology will lead to the identification of novel principles of cellular regulation derived from the huge amount of experimental data that are currently generated.

Applications of genome-scale metabolic reconstructions

Oberhardt et al. (2009): Applications of genome-scale metabolic reconstructions

This is a review that examines "the many uses and future directions of genome-scale metabolic reconstructions" and highlights "trends and opportunities in the field that will make the greatest impact on many fields of biology" ten years after the publication of the first genome-scale metabolic reconstruction, a metabolic model of Haemophilus influenzae (Edwards et al. (1999): Systems properties of the Haemophilus influenzae Rd metabolic genotype).
[T]oday [more than] 50 genome-scale metabolic reconstructions have been published[.] [...] Of all organisms that have been analyzed through a constraint-based metabolic reconstruction, Escherichia coli has gained the most attention as a model organism.
Since there has already been a review focusing on E. coli (Feist et al. (2008): The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli), this paper excludes E. coli and focuses on the other organisms instead.

The papers this review is about can be put into five different categories:
(1) contextualization of high-throughput data, (2) guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4) interrogation of multi-species relationships, and (5) network property discovery[.]
The authors summarize the process of metabolic reconstruction as follows:
First, an initial reconstruction is built from gene-annotation data coupled with information from online databases such as KEGG and EXPASY, which link known genes to functional categories and help bridge the genotype–phenotype gap. Second, the initial reconstruction is curated through an examination of the primary literature. Then, the reconstruction as a knowledge base is converted into a mathematical model that can be analyzed through constraint-based approaches. Third, the reconstruction is validated through comparison of model predictions to phenotypic data. In a final fourth step, a metabolic reconstruction is subjected to continued wet- and dry-lab cycles, which improve accuracy and allow investigation of key hypotheses.
What data does this process deliver to us? The authors write:
Through gap analysis and subsequent pathway analysis, studies have elucidated both the stoichiometry of certain reactions and the most efficient pathways for production of certain metabolites, and in some cases have even proposed methods for engineering more efficient strains. Also, it is common for reconstruction efforts to provide high-quality estimates of cellular parameters such as growth yield, specific fluxes, P/O ratio, and ATP maintenance costs, and these theoretical values are often used for hypothesis building or validation in biological studies. Several published metabolic reconstruction studies also include in silico predictions for minimal medium design.
Which organisms have been reconstructed and what kind of data have we gained by this? The paper provides the following answer:
Metabolic GENREs of prokaryotes encompass an average of 600 metabolites, 650 genes, and 800 reactions, whereas metabolic GENREs of eukaryotes include on average 1200 metabolites, 1000 genes, and 1500 reactions. Excluding the two existing reconstructions of Homo sapiens metabolism lowers the average eukaryotic network size to 800, 800, and 1300, metabolites, genes, and reactions, respectively, a closer but still higher distribution to that of prokaryotes. [...] Existing reconstructions span the domains Eukaryota, Bacteria, and Archaea. The most represented domain is bacteria, with 25 species reconstructed.
Now comes something that is interesting for us - the relationship with Computational Systems Biology:
With biology increasingly becoming a data-rich field, an emerging challenge has been determining how to organize, sort, interrelate, and contextualize all of the high-throughput datasets now available. This challenge has motivated the field of top–down systems biology, wherein statistical analyses of high-throughput data are used to infer biochemical network structures and functions.
This metabolic data is "often linked with other data types, such as protein expression data, protein–protein interaction data, protein–metabolite interaction data, and physical interaction data." It can also be used for metabolic engineering, which is "the use of recombinant DNA technology to selectively alter cell metabolism and improve a targeted cellular function".

Regarding hypothesis-driven discovery, the authors write:
Gene microarrays serve as a prime example; a traditional hypothesis-driven study might include examination of 1 or 2 genes in a microarray that are of particular interest. This approach would ignore the thousands of other genes on the chip, however, and could miss important information or trends embedded in those data. Therefore, a systematic framework for incorporating genome-scale data available from multiple high-throughput methods would allow hypothesis-driven biology to benefit from the full range of tools available today. Metabolic GENREs represent concise collections of existing hypotheses, and taken together as a broad context they enable systematic identification of new hypotheses that can be tested and resolved. Therefore, they represent a crucial framework for incorporating the flood of biological data now available into the biological discovery process.
Metabolic GENREs intrinsically represent a simplification of cellular function. The distinct biochemical networks categorized by scientists (e.g. metabolism, regulation, and signaling) blend together in a living cell, creating a far more complicated web of interactions than is convenient or possible to model. This web is fundamentally stochastic, and co-habits the cell with many other simultaneous phenomena including transcription and translation, protein modification, cell division, adhesion, motility, and mechanical transduction of external forces. The very simplifications that make metabolic GENREs powerful tools also make them challenging to use for the study of totally unknown or novel phenomena.
About the interrogation of multi-species relationships the authors write:
A promising direction for computational systems biology is the incorporation of network-level analysis into the field of comparative genomics, which is currently driven by bioinformatics. [...] However, most multi-species analyses reported to date have involved either sub-genome-scale metabolic models or models that have not been carefully annotated. [...] Of the five categories of uses of metabolic GENREs described in this paper, multi-species studies have been represented the least in literature so far. With more genome-scale metabolic models being built and an increased focus on studying multicellular systems, however, we anticipate that this field will see a major increase in activity in the coming years.
Finally, regarding the fifth category, network property discovery, the main point conveyed by the authors of this paper is:
The field of computational systems biology has produced a rich array of methods for network-based analysis, offering tremendous insight into the functioning of metabolic networks. However, many of these methods produce results that can be difficult to link to observable phenotypes. Forging this link poses the greatest challenge toward development of useful network-based tools. For instance, several methods exist to analyze redundancy in metabolic networks. Although these techniques define ‘redundancy’ intuitively in terms of the number of available paths between a given set of inputs and outputs, relating ‘redundancy’ to an observable phenotype poses a difficult challenge.
Each chapter of the paper comes along with a wealth of examples and references to concrete research projects that illustrate what has been done in the respective fields so far.

A Strategy for Integrative Computational Physiology

Hunter et al. (2005): A Strategy for Integrative Computational Physiology

This paper describes a "quantitative modeling framework" being developed "under the auspices of the Physiome and Bioengineering Committee (co-chaired by P. Hunter and A. Popel) of the International Union of Physiological Sciences (IUPS)" that can deal with organ function "through knowledge of molecular and cellular processes within the constraints of structure-function relations at the tissue level".

It follows what other authors have called a "top-down approach":
The challenge is to develop mathematical models of structure-function relations appropriate to each (limited) spatial and temporal domain and then to link the parameters of a model at one scale to a more detailed description of structure and function at the level below.
In the authors' opinion, the concept of a "field" as defined by physicists of the 19th century is essential for this endeavour:
The application of continuum field concepts and constitutive laws, whose parameters are derived from separate, finer-scale models, is the key to linking molecular systems biology (with its characterization of molecular processes and pathways) to larger-scale systems physiology (with its characterization of the integrated function of the body’s organ systems).
The authors also write how this branch of science should be called in their opinion:
The appropriate name for this application of physical and engineering principles to physiology is computational physiology. The term systems biology, currently inappropriately limited to the molecular scale, needs to be associated with all spatial scales.
Next, the authors state that computational modeling must be applied "at the scale of whole organs", "at the tissue level" and "even at the protein level".
Good progress is being made on modeling the anatomy and biophysics of the heart, the lungs, the digestive system, and the musculoskeletal system. [...] Linking the organ and organ systems together to yield models that can predict and interpret multiorgan physiological behavior is the focus of systems physiology. [...] The organ-level models [...] are based on finite-element models of the anatomic fields (geometry and tissue structure) encoded in a markup language called FieldML (
For "modeling cell function", a framework "has been developed over the past five years by the Bioengineering Institute at the University of Auckland". It employs a markup language called CellML. At the URL there are about 300 models in various categories, such as signal transduction or metabolic pathway models.

The next chapter of the paper focuses on models of the heart. The authors explain:
Molecular dynamics (MD) models of the atomic structure of ion channels, pumps, exchangers, etc. are needed that can predict the open-channel permeation of the channels, the voltage dependence of the channel permeability, and the time- and voltage-dependent gating behavior. [...] MD calculations, based on ~100,000 atoms in current models, are very expensive and are typically run for periods of only 10 ns. Sometimes homology modeling is used in combination with MD simulation to generate, test, and refine models of mammalian potassium channels based on bacterial templates. The structures of sodium and calcium channels are also on the horizon, as well as those of key pumps and exchangers.
A major challenge now is to develop coarse-grained models of these ion channels and other proteins with parameters calculated from the MD models. This will allow the models to include transient gating behavior for time intervals up to ~100 ms. [...] One of the challenges now for the Heart Physiome Project is to derive the parameters of the Hodgkin-Huxley or Markov models from the MD models via coarse-grained intermediate models as the molecular structures of these proteins become available.
The next stage of development of cell models will need to take account of the spatial distribution of proteins within a cell and subcellular compartments, where second messengers (Ca2+, IP3, cAMP, etc.) are localized. [...] Developing 3-D models at the cellular level will help to fill the large gap in spatial scales between proteins and intact cells.
Current work is linking myocardial mechanics to the fluid mechanics of blood flow in the ventricles and to the function of the heart valves. Future work will need to include models of the Purkinje network and the autonomic nervous system.
In their conclusions, the authors appear to be very optimistic:
Anatomically and biophysically based models of 4 of the 12 organ systems in the human body are now quite well developed at the organ and tissue levels (the cardiovascular, respiratory, digestive, and musculoskeletal systems). Others (the lymphatic system, the kidney and urinary system, the skin, the female reproductive system, and the special sense organs) are at an early stage of development, and the remainder (the endocrine, male reproductive, and brain and nervous systems) will be addressed over the next few years.
An important goal for the Physiome Project is also to use this modeling framework to help interpret clinical images for diagnostic purposes and to aid in the development of new medical devices. Another goal is to apply the anatomically and physiologically based models to virtual surgery, surgical training, and education. A longer-term goal is to help lower the cost of drug discovery by providing a rational multiscale and multiphysics modeling-based framework for dealing with the enormous complexity of physiological systems in the human body.

Donnerstag, 21. September 2017

Computational Cell Biology: Spatiotemporal Simulation of Cellular Events

Slepchenko et al. (2002): Computational Cell Biology: Spatiotemporal Simulation of Cellular Events

This is an introduction to Computational Cell Biology focusing on the system the authors developed, which is called "Virtual Cell". It also mentions several other programs, in particular StochSim and MCell. To illustrate their ideas, the authors provide examples respective to "RNA trafficking" and "neuronal calcium dynamics".

The paper first mentions a couple of pieces of technology that have contributed to the progress of Cell Biology in general in the past twenty years:
Confocal and two-photon excited fluorescence microscopies permit investigators to study the structure and dynamics of living cells with submicrometer three-dimensional (3D) spatial resolution and with time resolutions as fast as milliseconds. These quantitative microscopies can be combined with fluorescent indicators and fluorescent protein constructs to enable the study of the spatiotemporal behavior of individual molecules in cells. Patch clamp electrophysiological recording can be used to study ion currents through single-channel proteins or across the entire cell membrane. All these techniques can be further combined with methods to impart specific perturbations to cells such as photorelease of caged compounds to deliver controlled doses of second messengers or laser tweezer manipulations to determine the response of cells to mechanical stresses.
With all these advances, scientists have gained the following data:
Massive structural biology efforts have produced extensive databases of 3D protein structures. High-throughput molecular biology and molecular genetics technologies have led to descriptions of the full genomes of several organisms, including, of course, the human genome. More recently, highthroughput proteomics technologies promise to catalog, for a given state of a given cell, the dynamic levels of and interactions between all proteins and their posttranslational modifications.
To "link all the molecular-level data to the cellular processes that can be probed with the microscope", computational approaches are needed.

Regarding the mathematical knowledge required to implement these approaches, the authors write:
The concentrations of reacting molecular species as a function of time in a well-mixed reactor can be obtained by solving ordinary differential equations (ODEs) that specify the rate of change of each species as a function of the concentrations of the molecules in the system. If membrane transport and electrical potential are to be included in the model, the rate expressions can become more complex but can still be formulated in terms of a system of ODEs. However, when diffusion of molecules within the complex geometry of a cell is also considered, the resultant “reaction/diffusion” system requires the solution of partial differential equations (PDEs) that describe variations in concentration over space and time.
The finite volume method, developed originally for problems in heat transfer, is especially well-suited to simulations in cell biological systems. It is closely related to finite difference methods but allows for good control of boundary conditions and surface profile assumptions while preserving the conservative nature of the equations. Most importantly, the finite volume formalism accommodates the heterogeneous spatial organization of cellular compartments. [...] Within such elements, the rate of change of the concentration of a given molecular species is simply the sum of fluxes entering the volume element from its adjacent neighbors plus the rate of production of the given species via reactions. [...] Linear solvers based on Krylov space approximations, such as the conjugate gradient method, in conjunction with a preconditioner (an operator that approximates the inverse of the matrix but can be applied at a low computational cost), become powerful and robust. There are commercial packages that implement a range of Krylov space methods, as well as many of the well-knownpreconditioners (e.g., PCGPAK, Scientific Computing Associates, New Haven, Connecticut).
When can we use deterministic models and when do we have to use stochastic models instead? The authors write:
If the number of molecules involved in a process is relatively small, the fluctuations can become important. In this case, the continuous description is no longer sufficient and stochastic effects have to be included in a model. Single-channel ionic currents are one such example. [...] Stochastic fluctuations of macromolecules are crucial for understanding the dynamics of vesicles and granules driven by competing molecular motors. In the case of a relatively small number of participating particles, a system that would be described deterministically by reaction-diffusion PDEs requires fully stochastic treatment. In this approach, diffusion is described as Brownian random walks of individual particles, and chemical kinetics is simulated as stochastic reaction events. Numerical stochastic simulations in this case are based on pseudo-random-number generation. They are often called Monte Carlo simulations (the term, originally introduced by Ulam and von Neumann in the days of the Manhattan Project) since throwing a dice is actually a way to generate a random number.
They also provide an example of a stochastic model:
As an example, in the Hodgkin-Huxley model, the membrane voltage is treated as a continuous deterministic variable described through a set of differential equations, whereas the single channel behavior is random. A natural way to introduce stochasticity in the model is to replace open probabilities by the actual numbers of open channels. In fact, Hodgkin and Huxley introduced variables in their model to represent the proportion of open gates for various ions. The number of open channels is random and is governed by a corresponding Markov kinetic model that explicitly incorporates the internal workings of the ion channels. Mathematically, the membrane potential is now described by a stochastic differential equation with a discrete random process.
The authors further mention two papers on stochastic methods from Gillespie, which he deems "especially relevant" for Computational Cell Biology (Gillespie (1977): Exact stochastic simulation of coupled chemical reactions; Gillespie (2001): Approximate accelerated stochastic simulation of chemically reacting systems). Regarding the pros and cons of the algorithm described in these papers, the authors write:
The extraordinary efficiency of the Gillespie stochastic kinetics algorithm is achieved by restricting the decision process to selecting which reaction will occur and adjusting the time step accordingly. Focusing exclusively on the reaction avoids consideration of the properties of individual reactive species as discrete entities, which minimizes processing time when the number of reacting species is large. However, processing time increases in proportion to the number of different reactions. Furthermore, the Gillespie approach does not easily accommodate the existence of multiple states of different substrates, which may affect their reactivities, and since individual reactive species are not identified as discrete elements, their states, positions, and velocities within the reaction volume cannot be followed over time.
This type of approach has been utilized in the Virtual Cell to combine the deterministic description of a continuously distributed species (RNA) with the stochastic treatment of discrete particles (RNA granules)[.]
What follows is a review of programs used in Computational Neuroscience. The authors mention the programs NEURON and GENESIS, the two of which "use cable theory to treat the dynamics of electrical signals in the complex geometries of neurons", which "solves the equation for membrane potential in a series of connected segments with the overall topology of the neuron". Further, he mentions the model description language NMODL which has been added to NEURON and the interface KINETIKIT which makes GENESIS work with chemical kinetics.

The authors also write about software that is supposed "to build complex biochemical reaction pathways and numerically simulate the time course of the individual molecular species within them", such as GEPASI, Jarnac/Scamp, DBSolve, Berekeley Madonna, ECELL, BioSpice and JSIM.

Then, the authors introduce StochSim:
In this program individual molecules or molecular complexes are represented as discrete software objects or intracellular automata. The time step is set to accommodate the most rapid reaction in the system. [...] When a reaction occurs the system is updated according to the stoichiometry of the reaction. Molecules that exist in more than one state are encoded as “multi-state molecules” using a series of binary flags to represent different states of the molecule such as conformation, ligand binding, or covalent modification. The flags can modify the reactivity of the molecule, and reactions can modify the flags associated with a multi-state molecule.
Compared to the Gillespie algorithm, StochSim is supposed to be faster "in systems where molecules can exist in multiple states".

Next, they write about MCell, "a general Monte Carlo simulator of cellular microphysiology":
MCell utilizes Monte Carlo randomwalk and chemical reaction algorithms using pseudo-randomnumber generation. One of MCell’s convenient features is checkpointing, which involves stopping and restarting a simulation as many times as desired. [...] To speed up simulations, MCell is optimized by using 3D spatial partitioning that makes computing speed virtually independent of microdomain geometric complexity. Running parallel computations, another way to speed up Monte Carlo simulations, is also being pursued in MCell.
The paper mentions "microphysiology of synaptic transmission, [...] statistical chemistry, diffusion theory, single-channel simulation and data analysis, noise analysis, and Markov processes" as possible applications of MCell.

Finally comes the main part of the publication, which is about the authors' own program, Virtual Cell:
Simulations of both nonspatial (i.e., ODEs) and spatial (PDEs) models can be performed. For nonspatial models, compartments are assigned appropriate volume fractions relative to their parents in the model topology and surface-to-volume ratios for the proper treatment of membrane fluxes. In spatial models, the segmented regions within a 1D, 2D, or 3D image are connected to the corresponding compartments in the topology. The geometry is prepared for a model in a separate Geometry workspace and can come from a segmented experimental image or can be defined analytically.
The Virtual Cell software displays spatial and nonspatial simulation solutions for the variables over time. The spatial data viewer displays a single plane section of a 3D data set and can sample the solution along an arbitrary curve (piecewise linear or Bezier spline) or at a set of points. Membranes are displayed as curves superimposed on the volume mesh, and membrane variables are displayed along these curves. The nonspatial data viewer plots any number of variables over time on the same plot.
The authors summarize two studies conducted using Virtual Cell, which are about a "model of Calcium dynamics in a neuronal cell" and "stochastic models for RNA trafficking", and conclude:
The Virtual Cell program has several important advantages for stochastic modeling in eukaryotic cells. First, realistic image-based cell geometries are used to define intracellular reaction volumes, which constrain the stochastic behavior of intracellular reactants in unexpected ways. Second, definitions of reactive species can include multiple states described as either discrete parameters or continuous variables, which provide extraordinary contextual richness and behavioral versatility. Third, dynamic transformation and translocation of multiple individual reactive species can be tracked over time, facilitating integration of spatially heterogeneous stochastic models with simultaneous deterministic reaction/diffusion models. A major future challenge for the Virtual Cell will be to integrate dynamic shape changes in the reaction volume within the powerful and flexible stochastic modeling platform already developed. If this can be accomplished, the holy grail of stochastic modeling of cell motility may be attainable using the Virtual Cell.
In the last chapter of the publication, the authors address future challenges for Computational Cell Biology. Among other things, they write:
To improve stability, accuracy, and overall efficiency of numerical simulations, the issues of reaction stiffness in the PDEs, more accurate representation of irregular boundaries, and choice of effective linear solvers need to be addressed. [...] [A]dditional features are being developed, including modeling membrane potential, stochastic processes, lateral diffusion in membranes, and one-dimensional structures such as microtubules and microfilaments. [...] Also needed are computational tools to treat cell structural dynamics to enable the construction of models of such processes as cell migration or mitosis.

Mittwoch, 20. September 2017

Computational disease modeling – fact or fiction?

Tegnér et al. (2009): Computational disease modeling - fact or fiction?

In the Abstract, we can learn about the two main approaches towards computational systems biology:
There are two conceptual traditions in biological computational-modeling. The bottom-up approach emphasizes complex intracellular molecular models and is well represented within the systems biology community. On the other hand, the physics-inspired top-down modeling strategy identifies and selects features of (presumably) essential relevance to the phenomena of interest and combines available data in models of modest complexity.
[T]he development of predictive hierarchical models spanning several scales beyond intracellular molecular networks was identified as a major objective. This contrasts with the current focus within the systems biology community on complex molecular modeling.
A couple of more quotes from the paper:
Successful modeling of diseases is greatly facilitated by standards for data-collection and storage, interoperable representation, and computational tools enabling pattern/network analysis and modeling. There are several important initiatives in this direction, such as the ELIXIR program providing sustainable bioinformatics infrastructure for biomedical data in Europe. Similar initiatives are in progress in the USA and Asia.
Across different application areas, a key question concerns the handling of model uncertainty. This refers to the fact that for any biological system there are numerous competing models. Any discursive model of a biological system therefore involves uncertainty and incompleteness. Computational model selection has to cope systematically with the fact that there could be additional relevant interactions and components beyond those that are represented in the discursive model. For instance, there is often insufficient experimental determination of kinetic values for mechanisms contemplated in a verbal model, leading to serious indetermination of parameters in a computational model. Hence, biological models, unlike models describing physical laws, are as a rule highly over-parameterized with respect to the available data. This means that different regions of the parameter space can describe the available data equally well from a statistical point-of-view.
A successful strategy in computational neuroscience has been to identify minimal models that adequately describe and predict the biology, but at the potential price of selecting a too narrowly focused model. This approach is justified if adequate knowledge of the underlying mechanisms involved in a given condition exists.
An alternative approach, recently employed within the systems biology and computational neuroscience fields, is to search for parameter dimensions (as opposed to individual parameter sets) that are important for model performance. This concept of model ensembles represents a promising approach.
[A] mechanistic model is not very helpful unless there are experimental means to assess its predictive validity[.]
It appears that the systems biology community focuses on intracellular networks whereas computational neuroscience emphasizes top-down modeling.
It must also be recognized that top-down models of insufficient richness may excessively constrain model space and lose predictive ability.
There is a lack of theory for how to integrate model selection with constraint propagation across several layers of biological organization. Development of such a theory could be useful in modeling complex diseases even when only sparse data is available. One useful practical first approximation is the notion of disease networks – i.e. network representations of shared attributes among different diseases and their (potential) molecular underpinnings.
[In computational systems biology], much attention is given to formal methods of model selection and datadriven model construction. In contrast, in computational neuroscience (with the notable exception of computational neuroimaging), formal model selection methods are almost completely absent.