Tuesday 26 July 2022

That Nei distance chart

In 2015, Rewilding Europe published a PDF with a Nei distance chart showing the purported distance of 34 European cattle breeds to the one resolved aurochs genome. Many thought that we finally know which cattle breeds are genetically closest to the aurochs. But interpreting the chart that way would be a huge oversimplification and, in my opinion, inacurrate. In fact, I think we do not know anything more now than we knew before the Nei distance chart. With this post, I want to give my reasons for why I think that. Mind that I am not a geneticist, so this is completely my own interpretation. Critique is much appreciated.  

The Nei distance chart published by Rewilding Europe

1. The Nei distance analysis looked at only a very small fraction of the genome 

 

The cattle genome has 3 billion base pairs, the Nei distance analysis looked at 700.000 base pairs. Precisely, the analysis studied single nucleotid polymorphisms (SNPs). 700.000 SNPs is certainly a lot, but only a very small fraction of the total genome, thus it does not tell us much about the genetic closeness of the cattle breeds to the aurochs. With another 700.000 SNPs the results could be completely different. 

 

 

3. Only one aurochs genome was used for the analysis 

 

We have only one resolved complete aurochs genome for now, which is a problem for trying to analyse the genetic closeness of cattle breeds to their wildtype. It is unlikely (or actually impossible) that one individual comprised the full genetic diversity found in the wildtype populations, thus there must have been wildtype alleles found in other aurochs but not found in this one particular individual. The problem is that these wildtype alleles would be considered domestic alleles if found in a modern cattle individual and not found in the one aurochs individual that had its genome resolved. It could be the case that Caldela, a breed scoring very low on the chart, actually has a lot of wildtype alleles that just happen to be from other aurochs individuals than that of the one genome that we have, and so it results scoring low in the chart. That relativizes the relevance of the Nei distance analysis considerably. 

 

4. Nei distance might not be the ideal tool for analysing the situation of aurochs and cattle 

 

The Nei distance was developed for populations that diverged by mutation and genetic drift in isolated populations. But this is not the scenario that happened in the case of aurochs and cattle. In the domestication of cattle, we have at first drastic genetic bottleneck (since modern domestic cattle go back to only about 80 female founders), then massive directive selection (selection on tameness and economic value) during which many wildtype alleles might have been lost and mutated (= domestic) alleles became fixed, then we have local introgression from different types of aurochs in different regions of the world into the cattle genome (in Africa and Europe at least). Not to forget the not uncommon intermixing between taurine and indicine cattle, which descend from two rather divergent aurochs subspecies. 

 

That is why I have covered the Nei distance chart in only one post on my blog till now. I think it does not tell us much about the actual genetic distance between the aurochs and domestic cattle breeds. That is why the score in the chart seems to be rather coincidental and there is no correlation between a less-derived phenotype and the purported genetic closeness to that one aurochs individual analysed in the chart. For example, Fleckvieh scores higher than the Spanish fighting bull. Of course, it can be possible that a breed that has a rather derived morphology shares more alleles with the aurochs than one with a less-derived morphology, since many of the differences between aurochs and cattle might be non-visible, for example immunology, development, endocrinology, neurology, metabolism, or physiological aspects. But I consider it highly unlikely to be the case with Lidia and Fleckvieh, because Fleckvieh has experienced far more intense selective breeding than the Spanish fighting bull. Their score in the Nei distance chart is not evidence for that either, as outlined above. 

Furthermore, which aurochs are “the aurochs”? Even if we only care about the primigenius subspecies it is complicated to give an answer to that question. British cattle landraces have been found to share nuclear alleles with the British aurochs, likely due to introgression, which we cannot expect for Iberian, Italian or Near Eastern breeds and vice versa. 

Also, the total genetic closeness to the aurochs does not tell us which alleles are present in which breed, which is crucial information if one wants to unite aurochs alleles in one population with selective breeding. A breed scoring low in the overall genetic closeness to the aurochs might have alleles which all the other breeds do not have, and this is exactly the case in this Nei chart: Nelore, as an incidine breed, will share alleles with the British aurochs which taurine cattle have lost, because this is what was found by Orlando et al. 2015 (by the way, if the Tauros Programme is indeed aiming for breeding for “aurochs alleles”, why aren’t they breeding with zebus? There is no other way to get these alleles into the population. The answer is: Because they are not breeding on a genetic level, contrary to what they claim in press releases…). 

Thus, I think that this Nei distance chart does not tell us anything of value for “breeding-back”. Why did the Tauros Programme conduct this analysis, then? I think that’s because they needed to publish something “genetic”, in order to back up their claims that they are breeding for aurochs-like genetics. The project was very content with the results of the chart. They claimed that the breeds used in their project were particularly high-ranking. As you can see in the chart (the Tauros Programme breeds are those written in bold), this is not necessarily the case. Their breeds seem to be rather evenly distributed along the chart. However, this is not relevant for “breeding-back” anyway, at least in my opinion, for the reasons outlined in this post. 

 

Literature

 

Orlando, L.: The first aurochs genome reveals the breeding history of British and European cattle. 2015. 

 

10 comments:

  1. Reginald Winkler
    Thank you very much for the insightful contribution, the criticism of which I basically share. However, corresponding reservations, references to the provisional nature of the conclusions and the need for further research are also clearly stated in the cited study by Ronald Goderie et al. Many of your critical comments are confirmed there. It would be interesting to know whether and with what results the further investigations called for in the study have been carried out in the meantime. And it would be exciting if Chillingham, Betizu and Camargue cattle were also included. And of course also Lidia cattle, especially Casta-Navarra.

    ReplyDelete
    Replies
    1. I wouldn't really call that PDF a study though, since it was not published in a peer-reviewed scientific journal.

      Delete
  2. Even if there were genetic analysis of many aurochs genome and with many snp-s then a breeding effort to select for as many aurochs wild type genes as possible in a herd would just make less than half the job.

    There probably would still be may active genes from domestic cattle in such a herd.

    If they were kept in a domestic environment epigenetic activation of genes would probably activate many of the genes from domestic cattle.

    Selection by humans phenotype + selection for sertain aurouch genes + then natural selection in a wild type environment + epigenetic activation in a wild environment and lifestyle = that would probably bring most possible of the original aurochs back.

    ReplyDelete
    Replies
    1. I don't know if epigenetics play such a big role in the case of aurochs and cattle. The way I see it, the differences between both forms are more likely to be caused by mutations of genes, and changes in the number of copies of genes. And many wildtype alleles are probably lost from the modern cattle population altogether.

      Delete
  3. HI Daniel, as a many year follower of your blog, I very much appreciate all the great stories, explanations and summaries you have written in this blog. However, as you are asking for critical comments, here you go:

    About 1: SNP arrays usually cover SNPs more or less equally distributed throughout the entire genome (minus some more problematic ones like highly repetitive regions). 700k is quite a substantial array at that, a typical human SNP array is 450k and used, very frequently nowadays in all those ancestry services on the internet. While there are many other problems with those services, the results are definitely not random and you will be able to determine the 'ethnicity' of pretty much every human with such an array. I suspect that it will be more or less the same for cattle. (I do not know though how they determined those 700k SNPs. In humans, 450k arrays are derived from frequent variants that are known to be heterogeneous in the entire population, which I assume they will have mirrored for this study).

    About 2, and this is why I am actually writing this comment: No, no and no, it is not only coding regions that influence the phenotype. While this is still an area of active research (and your understanding may have been considered somewhat main stream just 30+ years ago), there is so much evidence now that non-coding, especially regulatory regions of any genome play a big role in the manifestation of phenotypic traits. Studies of kids with rare genetic disease (i.e due to a mutation in the child) can resolve at maximum 50% of cases when looking at the exome (thats mostly the coding regions). The reason why it is 50% is because those cases are due to high-impact mutations where one SNP on its own causes a (fatal) disease (which are certainly enriched in coding regions). However, most mutations have far less dramatic consequences. When you are looking for a mutation that is causing 10% smaller horns, your much better of with looking at regulatory regions (no-coding) that do influence how much mRNA of a protein involved in horn growth is made. In fact, current theories propose that most (protein-coding) genes are pretty much functionally equivalent between all mammalian species and it is 'only' the regulation when and where to express which gene that makes the mouse and human look different. (I have been thinking of what might be a good resource if one wanted to understand more about this. As a start, one could lookup the 'ENCODE project' or 'genome-wide association study' on Wikipedia?)

    Nevertheless, I am also a bit puzzled what the Rewilding Europe study was expecting to find. We are more or less certain that all our cattle is derived from aurochs. Compared to most (may be ignoring the indian breed that may be derived from zebu), the aurochs will be the out-group, meaning the cattle breeds are more related to each other than to the aurochs itself. At that point, genetic distance becomes a statistical measure and I find it hard say that one is really closer related than the other (Is it the human or the gorilla that is closer related to mouse?).

    Anyways, keep it up. Thanks for many, many interesting blog posts!

    ReplyDelete
    Replies
    1. Hi, many thanks for your comment & explanations!
      concerning 1) I see that 700k SNPs would be efficient at determining the breed identity of an individual, but they were aiming for which breed has the largest genetic match to the aurochs (at least based on what they have written), i.e. which share the highest numbers of identical sequences in the genome. Aren't 700k nucleotids a way too small sample to determine that, considering that the bovine genome is 3 billion bp large?
      concerning 2) What I wrote was definitely wrong, thanks for clearing that up. I knew that "junk" DNA is considered to have a crucial impact on the organism, I should have considered that when I wrote that. But apart from that, isn't there still the danger that some of the SNPs are from regions that are neither regulatory nor coding, or do such regions not exist?
      Regarding your last paragraph, I think Rewilding Europe did not want to find out the phylogenetic relationships of those cattle breeds to the aurochs, which of course will always have the aurochs as an outgroup as you say, but that they wanted to find out which breed has the largest genetic match to the aurochs, i.e. the most wildtype alleles and the fewest domestic mutations. I don't know if the Nei distance is suitable for determining this.

      Delete
    2. Hi Daniel,

      regarding 1) Well, I would assume that 700k SNPs is quite good. This is a statistical sample that obviously has some kind of error but it will definitely not be random. The total size of the genome does not matter for that, only the size of the sample (when I want to predict election results in Switzerland and the United Stated, sampling 1k participants in the former won't give a better results than for the latter just because the country is smaller). I am not sure how divergent cattle breeds are compared to humans but since both have been through population bottle necks, I would guess that prediction are comparable.

      regarding 2) As I understand it, this is still an open question. I general, there is a fitness cost to every organism to having more DNA than necessary, so evolution should slowly minimize genome size by removing 'junk' bits. However, there has been a long history of 'selfish genes' that proliferate only for their own good. And then there are ultraconserved elements that have the identical sequence in human and mouse, but when you remove them, nothing happens. So yes, some (or rather: many) of the SNPs will be from regions that will not have an intermediate phenotypic effect. But again, that's why you have 700k, to have a sound statistical sample. Similarly of the physical traits, there will be some that are less relevant for a breeding back program per se (idk. blood group, eye color, meet texture?). Probably depends on if you strive to get a genetic equivalent of the original aurochs (and not just the ecological equivalent).

      Delete
    3. What made me question the significance of the Nei distance chart right from the beginning was that there is absolutely no correlation between a less-derived overall phenotype and scoring high in the chart. For example, the very derived Fleckvieh scores higher than Spanish fighting cattle, which are arguably the less derived taurine breed. Not only have Fleckvieh a much more domestic morphology, I doubt that this breed is any closer to the aurochs in physiology, endocrinology, neurology, development and other aspects that have been affected by domestication because Fleckvieh experienced much more selective breeding than fighting cattle, have a much more domestic behaviour and are husbanded much more intensely, what certainly must have an influence on the fitness of the animals (due to relaxed selection, mutation accumulation etc.). How can it be that Fleckvieh, which is much stronger domesticated than fighting cattle, scores higher than fighting cattle in the chart, if the chart really does reflect the extent of genetic match to the aurochs? And secondly, wouldn't the results of another 700k SNPs be potentially different from those in this Nei chart?

      Delete
    4. Well, the Nei distance between aurochs and Fleckvieh is 0.1365 and between aurochs and Lidia is 0.1386. I would not necessarily deduct from this that aurochs and Fleckvieh are really closer related, if the difference is significant at all. Another 700k SNPs will probably give a different order simply because the differences between most modern European breeds are relatively small and insignificant compared to the aurochs, but not 'completely different'.

      Also, why do you assume there is more selective pressure on Fleckvieh than on Lidia, as the latter are also selected heavily for their fighting strength. There is always some kind of selection going on. What probably matters more is that effective population sizes should be much larger for Fleckvieh and many different European breeds have been crossed into it.
      And to add another point, genetic analyses also align American and European bison at different places in the bovine lineage tree. Another example: humans living in two neighboring villages in Africa can be more genetically distinct that any pair of non-African humans. All this tells us is that similarity in phenotype does not necessarily imply similarity in genotype.

      Delete
    5. Fleckvieh is domesticated much more intensely than Lidia. It is true that Lidia are heavily selected for "fighting spirit", but a recent study suggests that may be caused by one gene alone (the MAO-A gene). Also, Lidia live free all year round, and the breeders let the bulls fight for dominance, and the winner is the one which is chosen for covering the cows, which is mimicking natural selection to some degree.
      A recent study identified two genes which were of importance in the domestication of the horse, and another study identified more than 200 genes involved in the domestication of the yak. I think that if something similar would be done for the aurochs, and the identified genes would be compared between different cattle breeds, it would show that some breeds have more genes with wildtype alleles on the respective involved loci than others. I think that would tell us more about the genetic similarity to the aurochs than the Nei distance of SNPs, but I could be wrong.

      Delete