Navigation ...
|
|
|
Scientific Writing ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other Writing ...
|
|
|
|
|
|
Reference details
Author(s)
| Year
| Title
| Reference
| View/Download
|
Greg Warr , Les Hatton | 2022f | Reproducibility script for JSysBiology submission on protein multiplicity | | WarrHatton_JSysBioAug2022.zip |
Synopsis and invited feedback
This work was or is being reviewed by domain-specific experts appointed independently.
If you would like to provide feedback just e-mail me here.
Synopsis
| Invited Feedback
| Importance (/10, author rated :-) )
|
If we consider proteins as a single evolving system then knowledge of the emergent, global properties of this system is essential to understanding their evolution. Here we focus on the global property of multiplicity, defined as the number of species (or their equivalent) in which a given protein occurs identically. Conservation of Hartley-Shannon Information (CoHSI) is a probabilistic theory of discrete systems based in information theory and statistical mechanics (and thus mechanism-independent) that makes the following prediction. Proteins of identical length and identical sequence of amino acids that are shared between species will show a Zipfian power-law distribution of multiplicity. This prediction was tested by interrogation of the full UniProtKB/TrEMBL protein sequence database (219,174,961 entries in release 2021_03), which was found to contain over 13 million such sequences (over 6% of the database) whose multiplicities ranged from 2 to approximately 10,600 species or equivalent, distributed across the 3 domains of life as well as the viruses. The multiplicities of these proteins show a distribution of remarkable mathematical precision; when the number of proteins with particular multiplicities was plotted in rank order an extremely precise and statistically highly robust Zipfian power-law was seen, satisfying criteria of both necessity and sufficiency. The power-law spans over 5,000-fold in multiplicity and over one million-fold in the number of different sequences of a given multiplicity. The viral sequences contribute to the precision of the power law distribution even though they represent fewer than 3% of the identified proteins. Considered separately the protein multiplicities of each of the 3 domains of life and the viruses also show statistically-robust power-laws. The high precision and vast provenance of these results essentially rule out explanations based in coincidence or particular mechanisms, and we propose that purely probabilistic explanations that are independent of mechanism can be considered for the emergence of global properties in evolving systems. | None yet | 9 |
Related links
Related papers and links
|
Sorry, no links registered in database yet. |
Auto-generated: $Revision: 1.64 $, $Date: 2022/05/20 08:41:34 $, Copyright Les Hatton 2001-
|