Studying molecular evolution presents a lot of opportunities to venture into interesting side quests, such as the content of this post. Here, I showcase ways to visualise complex networks with some degree of interactivity.
These little renderings were conceived for lectures, talks and videos and rarely make their way into an academic publication, despite their didactic value. The reason is very simple, publications are a two dimensional, almost static version of media. However, technologies such as Jupyter, WebGL and other rendering backend libraries have been available for a long time now, and their corresponding api’s made very easy to use for everybody.
Viewer I.#
The recommended way to play with the viewer, is on single window, which can be accessed here and a legacy version with draggable nodes here
What is this network? : The genetic code.#
On its simplest form at least, the theory is also the prefect segway to understand the origin of important features present in living systems, such as the concepts of robustness of phenotypes to perturbations of the genotype as well as evolvability or the origin of the great variability observed in the phenotypes.
These evolutionary features are also somewhat universal, and these are observed in systems of very different scale and nature, such as molecules, gene-regulation circuits and networks and metabolic networks.
Both, robustness and evolvability are linked to the topology of the neutral genotype networks, which are defined by all the possible genotypes as nodes and linked if these are a mutation away.
Take for instance the following table containing the genetic code:
Genetic code | |||||
---|---|---|---|---|---|
Phe UUU | Ser UCU | Tyr UAU | Cys UGU | ||
Phe UUC | Ser UCC | Tyr UAC | Cys UGC | ||
Leu UUA | Ser UCA | Stp UAA | Stp UGA | ||
Leu UUG | Ser UCG | Stp UAG | Trp UGG | ||
Leu CUU | Pro CCU | His CAU | Arg CGU | ||
Leu CUC | Pro CCC | His CAC | Arg CGC | ||
Leu CUA | Pro CCA | Gln CAA | Arg CGA | ||
Leu CUG | Pro CCG | Gln CAG | Arg CGG | ||
Ile AUU | Thr ACU | Asn AAU | Ser AGU | ||
Ile AUC | Thr ACC | Asn AAC | Ser AGC | ||
Ile AUA | Thr ACA | Lys AAA | Arg AGA | ||
Met AUG | Thr ACG | Lys AAG | Arg AGG | ||
Val GUU | Ala GCU | Asp GAU | Gly GGU | ||
Val GUC | Ala GCC | Asp GAC | Gly GGC | ||
Val GUA | Ala GCA | Glu GAA | Gly GGA | ||
Val GUG | Ala GCG | Glu GAG | Gly GGG |
Each cell contains, on the left, the aminoacid (phenotype) associated with the codon sequence (genotype) on the right. In general for an alphabet of four characters like \( \{U,G,A,C\} \) there are \( 4^L \) unique sequences of length \( L \). Each one with \( 3 L\) neighbour sequences one character away from it. Using this and grouping the codons by aminoacid we get the following plot.
Now, the most important bit. Evolvability and how is it reflected in these diagrams. By choosing an aminoacid and plotting only the links from its nodes to the rest of the system, we can see how easy is to navigate the network from one aminoacid to all the other ones by only a few mutations, specially for those phenotypes with the largest number of codons.
As we can see from these awesome plots (to my taste) even this simple system contains enough complexity to make the graphical representation of the main ideas quite convoluted. For comparison purposes and fun, all the information so far presented is shown in the same form in the viewer already presented at the beginning of this entry, and also, in the same format as the plots above, in the following renderer.
Viewer II.#
Final comments.#
Finally, these are links for the repository containing the code, as well as for the viewers.