Information Processing

Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Sunday, August 31, 2014

Metabolic costs of human brain development


This paper quantifies the unusually high energetic cost of brain development in humans. Brain energy requirements and body-weight growth rate are anti-correlated in childhood. Given these results it would be surprising if nutritional limitations that prevented individuals from achieving their genetic potential in height didn't also lead to sub-optimal cognitive development. Nutritional deprivation likely stunts both mind and body.

See also Brainpower ain't free.
Metabolic costs and evolutionary implications of human brain development
(PNAS doi:10.1073/pnas.1323099111)

Significance
The metabolic costs of brain development are thought to explain the evolution of humans’ exceptionally slow and protracted childhood growth; however, the costs of the human brain during development are unknown. We used existing PET and MRI data to calculate brain glucose use from birth to adulthood. We find that the brain’s metabolic requirements peak in childhood, when it uses glucose at a rate equivalent to 66% of the body’s resting metabolism and 43% of the body’s daily energy requirement, and that brain glucose demand relates inversely to body growth from infancy to puberty. Our findings support the hypothesis that the unusually high costs of human brain development require a compensatory slowing of childhood body growth.

Abstract
The high energetic costs of human brain development have been hypothesized to explain distinctive human traits, including exceptionally slow and protracted preadult growth. Although widely assumed to constrain life-history evolution, the metabolic requirements of the growing human brain are unknown. We combined previously collected PET and MRI data to calculate the human brain’s glucose use from birth to adulthood, which we compare with body growth rate. We evaluate the strength of brain–body metabolic trade-offs using the ratios of brain glucose uptake to the body’s resting metabolic rate (RMR) and daily energy requirements (DER) expressed in glucose-gram equivalents (glucosermr% and glucoseder%). We find that glucosermr% and glucoseder% do not peak at birth (52.5% and 59.8% of RMR, or 35.4% and 38.7% of DER, for males and females, respectively), when relative brain size is largest, but rather in childhood (66.3% and 65.0% of RMR and 43.3% and 43.8% of DER). Body-weight growth (dw/dt) and both glucosermr% and glucoseder% are strongly, inversely related: soon after birth, increases in brain glucose demand are accompanied by proportionate decreases in dw/dt. Ages of peak brain glucose demand and lowest dw/dt co-occur and subsequent developmental declines in brain metabolism are matched by proportionate increases in dw/dt until puberty. The finding that human brain glucose demands peak during childhood, and evidence that brain metabolism and body growth rate covary inversely across development, support the hypothesis that the high costs of human brain development require compensatory slowing of body growth rate.

Thursday, August 28, 2014

Determination of Nonlinear Genetic Architecture using Compressed Sensing

It is a common belief in genomics that nonlinear interactions (epistasis) in complex traits make the task of reconstructing genetic models extremely difficult, if not impossible. In fact, it is often suggested that overcoming nonlinearity will require much larger data sets and significantly more computing power. Our results show that in broad classes of plausibly realistic models, this is not the case.
Determination of Nonlinear Genetic Architecture using Compressed Sensing (arXiv:1408.6583)
Chiu Man Ho, Stephen D.H. Hsu
Subjects: Genomics (q-bio.GN); Applications (stat.AP)

We introduce a statistical method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. The computational and data resource requirements are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. Our method uses a generalization of compressed sensing (L1-penalized regression) applied to nonlinear functions of the sensing matrix. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using both real and simulated human genomes.
Click for larger image.

Cosmopolitans -- Whit Stillman returns on Amazon

The pilot isn't bad -- American expats in Paris :-) The cinematography is beautiful, but then it's hard to go wrong in Paris.

More Whit Stillman.

Rabbit genome: domestication via soft sweeps



Domestication -- genetic change in response to a drastic change in environment -- happened via allele frequency changes at many loci. I expect a similar pattern in humans due to, e.g., agriculture.

I don't know why some researchers find this result surprising -- it seemed quite likely to me that "adaptation to domestication" is a complex trait controlled by many loci. Hence a shift in the phenotype is likely to be accomplished through frequency changes at many alleles.
Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication (Science DOI: 10.1126/science.1253714)

The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci.
From the paper:
... directional selection events associated with rabbit domestication are consistent with polygenic and soft sweep modes of selection (18) that primarily acted on standing genetic variation in regulatory regions of the genome. This stands in contrast with breed-specific traits in many domesticated animals that often show a simple genetic basis with complete fixation of causative alleles (19). Our finding that many genes affecting brain and neuronal development have been targeted during rabbit domestication is fully consistent with the view that the most critical phenotypic changes during the initial steps of animal domestication probably involved behavioral traits that allowed animals to tolerate humans and the environment humans offered. On the basis of these observations, we propose that the reason for the paucity of specific fixed domestication genes in animals is that no single genetic change is either necessary or sufficient for domestication. Because of the complex genetic background for tame behavior, we propose that domestic animals evolved by means of many mutations of small effect, rather than by critical changes at only a few domestication loci.
I'll repeat again that simply changing a few hundred allele frequencies in humans could make us much much smarter ...

Wednesday, August 27, 2014

Neural Networks and Deep Learning 2

Inspired by the topics discussed in this earlier post, I've been reading Michael Nielsen's online book on neural nets and deep learning. I particularly liked the subsection quoted below. For people who think deep learning is anything close to a solved problem, or anticipate a near term, quick take-off to the Singularity, I suggest they read the passage below and grok it deeply.
Neural Networks and Deep Learning (Chapter 3):

You have to realize that our theoretical tools are very weak. Sometimes, we have good mathematical intuitions for why a particular technique should work. Sometimes our intuition ends up being wrong [...] The questions become: how well does my method work on this particular problem, and how large is the set of problems on which it works well. -- Question and answer with neural networks researcher Yann LeCun

Once, attending a conference on the foundations of quantum mechanics, I noticed what seemed to me a most curious verbal habit: when talks finished, questions from the audience often began with "I'm very sympathetic to your point of view, but [...]". Quantum foundations was not my usual field, and I noticed this style of questioning because at other scientific conferences I'd rarely or never heard a questioner express their sympathy for the point of view of the speaker. At the time, I thought the prevalence of the question suggested that little genuine progress was being made in quantum foundations, and people were merely spinning their wheels. Later, I realized that assessment was too harsh. The speakers were wrestling with some of the hardest problems human minds have ever confronted. Of course progress was slow! But there was still value in hearing updates on how people were thinking, even if they didn't always have unarguable new progress to report.

You may have noticed a verbal tic similar to "I'm very sympathetic [...]" in the current book. To explain what we're seeing I've often fallen back on saying "Heuristically, [...]", or "Roughly speaking, [...]", following up with a story to explain some phenomenon or other. These stories are plausible, but the empirical evidence I've presented has often been pretty thin. If you look through the research literature you'll see that stories in a similar style appear in many research papers on neural nets, often with thin supporting evidence. What should we think about such stories?

In many parts of science - especially those parts that deal with simple phenomena - it's possible to obtain very solid, very reliable evidence for quite general hypotheses. But in neural networks there are large numbers of parameters and hyper-parameters, and extremely complex interactions between them. In such extraordinarily complex systems it's exceedingly difficult to establish reliable general statements. Understanding neural networks in their full generality is a problem that, like quantum foundations, tests the limits of the human mind. Instead, we often make do with evidence for or against a few specific instances of a general statement. As a result those statements sometimes later need to be modified or abandoned, when new evidence comes to light.

[ Sufficiently advanced AI will come to resemble biology, even psychology, in its complexity and resistance to rigorous generalization ... ]

One way of viewing this situation is that any heuristic story about neural networks carries with it an implied challenge. For example, consider the statement I quoted earlier, explaining why dropout works* *From ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (2012).: "This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons." This is a rich, provocative statement, and one could build a fruitful research program entirely around unpacking the statement, figuring out what in it is true, what is false, what needs variation and refinement. Indeed, there is now a small industry of researchers who are investigating dropout (and many variations), trying to understand how it works, and what its limits are. And so it goes with many of the heuristics we've discussed. Each heuristic is not just a (potential) explanation, it's also a challenge to investigate and understand in more detail.

Of course, there is not time for any single person to investigate all these heuristic explanations in depth. It's going to take decades (or longer) for the community of neural networks researchers to develop a really powerful, evidence-based theory of how neural networks learn. Does this mean you should reject heuristic explanations as unrigorous, and not sufficiently evidence-based? No! In fact, we need such heuristics to inspire and guide our thinking. It's like the great age of exploration: the early explorers sometimes explored (and made new discoveries) on the basis of beliefs which were wrong in important ways. Later, those mistakes were corrected as we filled in our knowledge of geography. When you understand something poorly - as the explorers understood geography, and as we understand neural nets today - it's more important to explore boldly than it is to be rigorously correct in every step of your thinking. And so you should view these stories as a useful guide to how to think about neural nets, while retaining a healthy awareness of the limitations of such stories, and carefully keeping track of just how strong the evidence is for any given line of reasoning. Put another way, we need good stories to help motivate and inspire us, and rigorous in-depth investigation in order to uncover the real facts of the matter.
See also here from an earlier post:
... evolution has [ encoded the results of a huge environment-dependent optimization ] in the structure of our brains (and genes), a process that AI would have to somehow replicate. A very crude estimate of the amount of computational power used by nature in this process leads to a pessimistic prognosis for AI even if one is willing to extrapolate Moore's Law well into the future. [ Moore's Law (Dennard scaling) may be toast for the next decade or so! ] Most naive analyses of AI and computational power only ask what is required to simulate a human brain, but do not ask what is required to evolve one. I would guess that our best hope is to cheat by using what nature has already given us -- emulating the human brain as much as possible.
If indeed there are good (deep) generalized learning architectures to be discovered, that will take time. Even with such a learning architecture at hand, training it will require interaction with a rich exterior world -- either the real world (via sensors and appendages capable of manipulation) or a computationally expensive virtual world. Either way, I feel confident in my bet that a strong version of the Turing test (allowing, e.g., me to communicate with the counterpart over weeks or months; to try to teach it things like physics and watch its progress; eventually for it to teach me) won't be passed until at least 2050 and probably well beyond.

Turing as polymath: ... In a similar way Turing found a home in Cambridge mathematical culture, yet did not belong entirely to it. The division between 'pure' and 'applied' mathematics was at Cambridge then as now very strong, but Turing ignored it, and he never showed mathematical parochialism. If anything, it was the attitude of a Russell that he acquired, assuming that mastery of so difficult a subject granted the right to invade others.

Friday, August 22, 2014

Two reflections on SCI FOO 2014

Two excellent blog posts on SCI FOO by Jacob Vanderplas (Astronomer and Data Scientist at the University of Washington) and Dominic Cummings (former director of strategy for the conservative party in the UK).

Hacking Academia: Data Science and the University (Vanderplas)

Almost a year ago, I wrote a post I called the Big Data Brain Drain, lamenting the ways that academia is neglecting the skills of modern data-intensive research, and in doing so is driving away many of the men and women who are perhaps best equipped to enable progress in these fields. This seemed to strike a chord with a wide range of people, and has led me to some incredible opportunities for conversation and collaboration on the subject. One of those conversations took place at the recent SciFoo conference, and this article is my way of recording some reflections on that conversation. ...

The problem we discussed is laid out in some detail in my Brain Drain post, but a quick summary is this: scientific research in many disciplines is becoming more and more dependent on the careful analysis of large datasets. This analysis requires a skill-set as broad as it is deep: scientists must be experts not only in their own domain, but in statistics, computing, algorithm building, and software design as well. Many researchers are working hard to attain these skills; the problem is that academia's reward structure is not well-poised to reward the value of this type of work. In short, time spent developing high-quality reusable software tools translates to less time writing and publishing, which under the current system translates to little hope for academic career advancement. ...




Few scientists know how to use the political system to effect change. We need help from people like Cummings.
AUGUST 19, 2014 BY DOMINIC CUMMINGS

... It was interesting that some very eminent scientists, all much cleverer than ~100% of those in politics [INSERT: better to say 'all with higher IQ than ~100% of those in politics'], have naive views about how politics works. In group discussions, there was little focused discussion about how they could influence politics better even though it is clearly a subject that they care about very much. (Gershenfeld said that scientists have recently launched a bid to take over various local government functions in Barcelona, which sounds interesting.)

... To get things changed in politics, scientists need mechanisms a) to agree priorities in order to focus their actions on b) roadmaps with specifics. Generalised whining never works. The way to influence politicians is to make it easy for them to fall down certain paths without much thought, and this means having a general set of goals but also a detailed roadmap the politicians can apply, otherwise they will drift by default to the daily fog of chaos and moonlight.

...

3. High status people have more confidence in asking basic / fundamental / possibly stupid questions. One can see people thinking ‘I thought that but didn’t say it in case people thought it was stupid and now the famous guy’s said it and everyone thinks he’s profound’. The famous guys don’t worry about looking stupid and they want to get down to fundamentals in fields outside their own.

4. I do not mean this critically but watching some of the participants I was reminded of Freeman Dyson’s comment:

‘I feel it myself, the glitter of nuclear weapons. It is irresistible if you come to them as a scientist. To feel it’s there in your hands. To release the energy that fuels the stars. To let it do your bidding. And to perform these miracles, to lift a million tons of rock into the sky, it is something that gives people an illusion of illimitable power, and it is in some ways responsible for all our troubles... this is what you might call ‘technical arrogance’ that overcomes people when they see what they can do with their minds.’

People talk about rationales for all sorts of things but looking in their eyes the fundamental driver seems to be – am I right, can I do it, do the patterns in my mind reflect something real? People like this are going to do new things if they can and they are cleverer than the regulators. As a community I think it is fair to say that outside odd fields like nuclear weapons research (which is odd because it still requires not only a large collection of highly skilled people but also a lot of money and all sorts of elements that are hard (but not impossible) for a non-state actor to acquire and use without detection), they believe that pushing the barriers of knowledge is right and inevitable. ...

Sunday, August 17, 2014

Genetic Architecture of Intelligence (arXiv:1408.3421)

This paper is based on talks I've given in the last few years. See here and here for video. Although there isn't much that hasn't already appeared in the talks or on this blog (other than some Compressed Sensing results for the nonlinear case) it's nice to have it in one place. The references are meant to be useful to people seriously interested in this subject, although I imagine they are nowhere near comprehensive. Apologies to anyone whose work I missed.

If you don't like the word "intelligence" just substitute "height" and everything will be OK. We live in strange times.
On the genetic architecture of intelligence and other quantitative traits (arXiv:1408.3421)
Categories: q-bio.GN
Comments: 30 pages, 13 figures

How do genes affect cognitive ability or other human quantitative traits such as height or disease risk? Progress on this challenging question is likely to be significant in the near future. I begin with a brief review of psychometric measurements of intelligence, introducing the idea of a "general factor" or g score. The main results concern the stability, validity (predictive power), and heritability of adult g. The largest component of genetic variance for both height and intelligence is additive (linear), leading to important simplifications in predictive modeling and statistical estimation. Due mainly to the rapidly decreasing cost of genotyping, it is possible that within the coming decade researchers will identify loci which account for a significant fraction of total g variation. In the case of height analogous efforts are well under way. I describe some unpublished results concerning the genetic architecture of height and cognitive ability, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation. Using results from Compressed Sensing (L1-penalized regression), I estimate the statistical power required to characterize both linear and nonlinear models for quantitative traits. The main unknown parameter s (sparsity) is the number of loci which account for the bulk of the genetic variation. The required sample size is of order 100s, or roughly a million in the case of cognitive ability.

Saturday, August 16, 2014

Neural Networks and Deep Learning



One of the SCI FOO sessions I enjoyed the most this year was a discussion of deep learning by AI researcher Juergen Schmidhuber. For an overview of recent progress, see this recent paper. Also of interest: Michael Nielsen's pedagogical book project.

An application which especially caught my attention is described by Schmidhuber here:
Many traditional methods of Evolutionary Computation [15-19] can evolve problem solvers with hundreds of parameters, but not millions. Ours can [1,2], by greatly reducing the search space through evolving compact, compressed descriptions [3-8] of huge solvers. For example, a Recurrent Neural Network [34-36] with over a million synapses or weights learned (without a teacher) to drive a simulated car based on a high-dimensional video-like visual input stream.
More details here. They trained a deep neural net to drive a car using visual input (pixels from the driver's perspective, generated by a video game); output consists of steering orientation and accelerator/brake activation. There was no hard coded structure corresponding to physics -- the neural net optimized a utility function primarily defined by time between crashes. It learned how to drive the car around the track after less than 10k training sessions.

For some earlier discussion of deep neural nets and their application to language translation, see here. Schmidhuber has also worked on Solomonoff universal induction.

These TED videos give you some flavor of Schmidhuber's sense of humor :-) Apparently his younger brother (mentioned in the first video) has transitioned from theoretical physics to algorithmic finance. Schmidhuber on China.



Friday, August 15, 2014

Y Combinator: "fund for the pivot"

I'm catching up on podcasts a bit now that I'm back in Michigan. I had an iTunes problem and was waiting for the next version release while on the road.

Econtalk did a nice interview with Y Combinator President Sam Altman. Y Combinator has always been entrepreneur-centric, to the point that the quality of the founders is one of the main factors they consider (i.e., more important than startup idea or business plan). At around 19 minutes, Altman reveals that they often "fund for the pivot" -- meaning that sometimes they want to place a bet on the entrepreneur even if they think the original idea is doomed. Altman also reveals that Y Combinator never looks at business plans or revenue projections. I can't count the number of times an idiot MBA demanded a detailed revenue projection from one of my startups, at a stage where the numbers and projections were completely meaningless.

Another good observation is about the importance of communication skills in a founder. The leadership team are a central nexus that has to informationally equilibrate the rest of the company + investors + partners + board members + journalists + customers ...  This is benefited tremendously by having someone who is articulate, succinct, and can "code switch" so as to speak the native language of an engineer or sales rep or VC.

@30 min or so:
Russ: ... one of the things that happens to me when I come out here in the summer--I live outside of Washington, D.C. and I come out every 6 or 7 weeks in the summer, and come to Stanford--I feel like I'm at the center of the universe. You know, Washington is--everyone in Washington, except for me--

Guest: Thinks they are--

Russ: Thinks they are in the center. And there are things they are in the center in. Obviously. But it's so placid there. And when I come to Stanford, the intellectual, the excitement about products and transforming concepts into reality, is palpable. And then I run into start-up people and venture capitalists. And they are so alive, compared to, say, a lobbyist in Washington, say, just to pick a random example. And there are certain things that just--again, it's almost palpable. You can almost feel them. So the thing is that I notice being here--which are already the next big thing, which at least they feel like they are.  [ Visiting Washington DC gives me hives! ]
I recall a Foo Camp (the O'Reilly one, not SCI FOO at Google; perhaps 2007-2010 or so) session led by Paul Graham and some of the other Y Combinator founders/funders. At the time they weren't sure at all that their model would work. It was quite an honest discussion and I think even they must be surprised at how successful they've been since then.

Wednesday, August 13, 2014

Designer babies: selection vs editing



The discussion in this video is sophisticated enough to make the distinction between embryo selection -- the parents get a baby whose DNA originates from them, but the "best baby possible" -- and active genetic editing, which can give the child genes that neither parent had.

The movie GATTACA focuses on selection -- the director made a deliberate decision to eliminate reference to splicing or editing of genes. (Possibly because Ethan Hawke's character Vincent would have no chance competing against edited people.)

At SCI FOO, George Church seemed confident that editing would be an option in the near future. He is convinced that off-target mutations are not a problem for CRISPR. I have not yet seen this demonstrated in the literature, but of course George knows a lot more than what has been published. (Warning: I may have misunderstood his comments as there was a lot of background noise when we were talking.)

One interesting genetic variant (Lrp5?) that I learned about at the meeting, of obvious interest to future splicers and editors, apparently conveys an +8 SD increase in bone strength!

My views on all of this:
... given sufficient phenotype|genotype data, genomic prediction of traits such as cognitive ability will be possible. If, for example, 0.6 or 0.7 of total population variance is captured by the predictor, the accuracy will be roughly plus or minus half a standard deviation (e.g., a few cm of height, or 8 IQ points). The required sample size to extract a model of this accuracy is probably on the order of a million individuals. As genotyping costs continue to decline, it seems likely that we will reach this threshold within five years for easily acquired phenotypes like height (self-reported height is reasonably accurate), and perhaps within the next decade for more difficult phenotypes such as cognitive ability. At the time of this writing SNP genotyping costs are below $50 USD per individual, meaning that a single super-wealthy benefactor could independently fund a crash program for less than $100 million.

Once predictive models are available, they can be used in reproductive applications, rang- ing from embryo selection (choosing which IVF zygote to implant) to active genetic editing (e.g., using powerful new CRISPR techniques). In the former case, parents choosing between 10 or so zygotes could improve their expected phenotype value by a population standard de- viation. For typical parents, choosing the best out of 10 might mean the difference between a child who struggles in school, versus one who is able to complete a good college degree. Zygote genotyping from single cell extraction is already technically well developed [25], so the last remaining capability required for embryo selection is complex phenotype prediction. The cost of these procedures would be less than tuition at many private kindergartens, and of course the consequences will extend over a lifetime and beyond.

The corresponding ethical issues are complex and deserve serious attention in what may be a relatively short interval before these capabilities become a reality. Each society will decide for itself where to draw the line on human genetic engineering, but we can expect a diversity of perspectives. Almost certainly, some countries will allow genetic engineering, thereby opening the door for global elites who can afford to travel for access to reproductive technology. As with most technologies, the rich and powerful will be the first beneficiaries. Eventually, though, I believe many countries will not only legalize human genetic engineering, but even make it a (voluntary) part of their national healthcare systems [26]. The alternative would be inequality of a kind never before experienced in human history.

Here is the version of the GATTACA scene that was cut. The parents are offered the choice of edited or spliced genes conferring rare mathematical or musical ability.

Monday, August 11, 2014

SCI FOO 2014: photos

The day before SCI FOO I visited Complete Genomics, which is very close to the Googleplex.




Self-driving cars:



SCI FOO festivities:







I did an interview with O'Reilly. It should appear in podcast form at some point and I'll post a link.




Obligatory selfie:

Friday, August 08, 2014

Next Super Collider in China?

If you're in particle physics you may have heard rumors that the Chinese government is considering getting into the collider business. Since no one knows what will happen in our field post-LHC, this is a very interesting development. A loose international collaboration has been pushing a new linear collider for some time, perhaps to be built in Japan. But since (1) the results from LHC are thus far not as exciting as some had anticipated, and (2) colliders are very very expensive, the future is unclear.

While in China and Taiwan I was told that it was very likely that a next generation collider project would make it into the coming 5 year science plan. It was even said that the location for the new machine (combining both linear and hadronic components) would be in my maternal ancestral homeland of Shandong province. (Korean physicists will be happy about the proximity of the site :-)

Obviously for the Chinese government the symbolic value of taking the lead in high energy physics is very high -- perhaps on par with putting a man on the moon. In the case of a collider, we're talking about 20 year timescales, so this is a long term project. Stay tuned!

On the importance of experiments, from Voting and Weighing:
There is an old saying in finance: in the short run, the market is a voting machine, but in the long run it's a weighing machine. ...

You might think science is a weighing machine, with experiments determining which theories survive and which ones perish. Healthy sciences certainly are weighing machines, and the imminence of weighing forces honesty in the voting. However, in particle physics the timescale over which voting is superseded by weighing has become decades -- the length of a person's entire scientific career. We will very likely (barring something amazing at the LHC, like the discovery of mini-black holes) have the first generation of string theorists retiring soon with absolutely no experimental tests of their *lifetime* of work. Nevertheless, some have been lavishly rewarded by the academic market for their contributions.

Thursday, August 07, 2014

@ SCI FOO 2014



Sorry for the lack of blog activity. I just returned from Asia and am in Palo Alto for SCI FOO 2014. Hopefully I'll post some cool photos from the event, which starts tomorrow evening. If you are there and read this blog then come over and say hello. If I had free t-shirts I'd give you one, but don't get your hopes up!

Earlier SCI FOO posts.

Saturday, August 02, 2014

It's all in the gene: cows


Some years ago a German driver took me from the Perimeter Institute to the Toronto airport. He was an immigrant to Canada and had a background in dairy farming. During the ride he told me all about driving German farmers to buy units of semen produced by highly prized Canadian bulls. The use of linear polygenic models in cattle breeding is already widespread, and the review article below gives some idea as to the accuracy.

See also Genomic Prediction: No Bull and Plenty of room at the top.
Invited Review: Reliability of genomic predictions for North American Holstein bulls

Journal of Dairy Science Volume 92, Issue 1, Pages 16–24, January 2009.
DOI: http://dx.doi.org/10.3168/jds.2008-1514

Genetic progress will increase when breeders examine genotypes in addition to pedigrees and phenotypes. Genotypes for 38,416 markers and August 2003 genetic evaluations for 3,576 Holstein bulls born before 1999 were used to predict January 2008 daughter deviations for 1,759 bulls born from 1999 through 2002. Genotypes were generated using the Illumina BovineSNP50 BeadChip and DNA from semen contributed by US and Canadian artificial-insemination organizations to the Cooperative Dairy DNA Repository. Genomic predictions for 5 yield traits, 5 fitness traits, 16 conformation traits, and net merit were computed using a linear model with an assumed normal distribution for marker effects and also using a nonlinear model with a heavier tailed prior distribution to account for major genes. The official parent average from 2003 and a 2003 parent average computed from only the subset of genotyped ancestors were combined with genomic predictions using a selection index. Combined predictions were more accurate than official parent averages for all 27 traits. The coefficients of determination (R2) were 0.05 to 0.38 greater with nonlinear genomic predictions included compared with those from parent average alone. Linear genomic predictions had R2 values similar to those from nonlinear predictions but averaged just 0.01 lower. The greatest benefits of genomic prediction were for fat percentage because of a known gene with a large effect. The R2 values were converted to realized reliabilities by dividing by mean reliability of 2008 daughter deviations and then adding the difference between published and observed reliabilities of 2003 parent averages. When averaged across all traits, combined genomic predictions had realized reliabilities that were 23% greater than reliabilities of parent averages (50 vs. 27%), and gains in information were equivalent to 11 additional daughter records. Reliability increased more by doubling the number of bulls genotyped than the number of markers genotyped. Genomic prediction improves reliability by tracing the inheritance of genes even with small effects.


Results and Discussion: ... Marker effects for most other traits were evenly distributed across all chromosomes with only a few regions having larger effects, which may explain why the infinitesimal model and standard quantitative genetic theories have worked well. The distribution of marker effects indicates primarily polygenic rather than simple inheritance and suggests that the favorable alleles will not become homozygous quickly, and genetic variation will remain even after intense selection. Thus, dairy cattle breeders may expect genetic progress to continue for many generations.

... Most animal breeders will conclude that these gains in reliability are sufficient to make genotyping profitable before breeders invest in progeny testing or embryo transfer. Rates of genetic progress should increase substantially as breeders take advantage of these new tools for improving animals (Schaeffer, 2008). Further increases in number of genotyped bulls, revisions to the statistical methods, and additional edits should increase the precision of future genomic predictions.

Table 3

TraitParent averageGenomic predictionGain from nonlinear genomic prediction compared with published parent average
PublishedObservedExpectedLinearNonlinear
Net merit301467535323
Milk yield353269565823
Fat yield351769656833
Protein yield353169585722
Fat percentage352969697843
Protein percentage353269626934
Productive life272855424518


"Horses ain't like people, man. They can't make themselves better than they're born. See, with a horse, it's all in the gene. It's the fucking gene that does the running. The horse has got absolutely nothing to do with it." --- Paulie (Eric Roberts) in The Pope of Greenwich Village

Tuesday, July 29, 2014

HKUST IAS

I have a new candidate for coolest research institute architecture. HKUST's Institute for Advanced Study is housed in an amazing building with a view of Clearwater Bay in HK. The members of the institute will be mostly theoretical physicists and mathematicians :-)

Stiff competition from Benasque's Center and the Perimeter Institute, however. Also Caltech's IQIM!

Click for larger images.












Compare to Dr. No's interrogation chamber :-)

Sunday, July 27, 2014

SNPs and SATS

This paper provides additional support that the GWAS hits found by SSGAC affect cognitive ability. My guess is that UK age 14 SATS scores are pretty g-loaded. Note this is an ethnically homogeneous sample of students.

If the effect size per allele is about 1/30 SD, it would take ~1000 to account for normal population variation. These are the first loci detected, so typical effect size of alleles affecting cognitive ability is probably smaller. This seems consistent with my estimate of ~10k causal variants.

Genetic Variation Associated with Differential Educational Attainment in Adults Has Anticipated Associations with School Performance in Children (PLoS July 17, 2014 DOI: 10.1371/journal.pone.0100248)

Genome-wide association study results have yielded evidence for the association of common genetic variants with crude measures of completed educational attainment in adults. Whilst informative, these results do not inform as to the mechanism of these effects or their presence at earlier ages and where educational performance is more routinely and more precisely assessed. Single nucleotide polymorphisms exhibiting genome-wide significant associations with adult educational attainment were combined to derive an unweighted allele score in 5,979 and 6,145 young participants from the Avon Longitudinal Study of Parents and Children with key stage 3 national curriculum test results (SATS results) available at age 13 to 14 years in English and mathematics respectively. Standardised (z-scored) results for English and mathematics showed an expected relationship with sex, with girls exhibiting an advantage over boys in English (0.433 SD (95%CI 0.395, 0.470), p<10−10) with more similar results (though in the opposite direction) in mathematics (0.042 SD (95%CI 0.004, 0.080), p = 0.030). Each additional adult educational attainment increasing allele was associated with 0.041 SD (95%CI 0.020, 0.063), p = 1.79×10−04 and 0.028 SD (95%CI 0.007, 0.050), p = 0.01 increases in standardised SATS score for English and mathematics respectively. Educational attainment is a complex multifactorial behavioural trait which has not had heritable contributions to it fully characterised. We were able to apply the results from a large study of adult educational attainment to a study of child exam performance marking events in the process of learning rather than realised adult end product. Our results support evidence for common, small genetic contributions to educational attainment, but also emphasise the likely lifecourse nature of this genetic effect. Results here also, by an alternative route, suggest that existing methods for child examination are able to recognise early life variation likely to be related to ultimate educational attainment.

Saturday, July 26, 2014

Success, Ability, and all that

I came across this nice discussion at LessWrong which is similar to my old post Success vs Ability. The illustration below shows why even a strong predictor of outcome is seldom able to pick out the very top performer: e.g., taller people are on average better at basketball, but the best player in the world is not the tallest; smarter people are on average better at making money, but the richest person in the world is not the smartest, etc.


This seems like a trivial point (as are most things, when explained clearly), however, it still eludes the vast majority. For example, in the Atlantic article I linked to in the earlier post Creative Minds, the neuroscientist professor who studies creative genius misunderstands the implications of the Terman study. She repeats the common claim that Terman's study fails to support the importance of high cognitive ability to "genius"-level achievement: none of the Termites won a Nobel prize, whereas Shockley and Alvarez, who narrowly missed the (verbally loaded) Stanford-Binet cut for the study, each won for work in experimental physics. But luck, drive, creativity, and other factors, all at least somewhat independent of intelligence, influence success in science. Combine this with the fact that there are exponentially more people a bit below the Terman cut than above it, and Terman's results do little more than confirm that cognitive ability is positively but not perfectly correlated with creative output.


In the SMPY study probability of having published a literary work or earned a patent was increasing with ability even within the top 1%. The "IQ over 120 doesn't matter" meme falls apart if one measures individual likelihood of success, as opposed to the total number of individuals at, e.g., IQ 120 vs IQ 145 who have achieved some milestone. The base population of the former is 100 times that of the latter!

This topic came up last night in Hong Kong, at dinner with two hedge funders (Caltech/MIT guys with PhDs) who have had long careers in finance. Both observed that 20 years ago it was nearly impossible to predict which of their colleagues and peers would go on to make vast fortunes, as opposed to becoming merely rich.

Tuesday, July 22, 2014

Genome editing excises HIV

See also CRISPR Symposium at MSU and Genetic engineering of monkeys using CRISPR.
The Scientist: ... The researchers, led by Kamel Khalili at Temple University in Philadelphia, Pennsylvania, used the CRISPR/Cas9 genome-editing system to excise HIV from several human cell lines, including microglia and T cells. They targeted both the 5’ and 3’ ends of the virus, called the long terminal repeats (LTRs), so that the entire viral genome was removed.

“We were extremely happy with the outcome,” Khalili told The Scientist. “It was a little bit . . . mind-boggling how this system really can identify a single copy of the virus in a chromosome, which is highly packed DNA, and exactly cleave that region.”

His team showed that not only could Cas9 excise one copy of the HIV genome, but—operating in the same cell—it could also clip out another copy lurking in a different chromosome. Often, Khalili said, a cell can have several copies of latent HIV distributed across various chromosomes. “Most likely the technology is going to clean up the viral DNA” in a cell, he said.

... One limitation of the CRISPR/Cas9 approach is that it can chop up unintended regions of the genome, producing so-called off-target effects. Khalili’s group performed whole-genome sequencing to look for off-target effects, but didn’t find any. T.J. Cradick, the director of the protein engineering core facility at Georgia Tech, said that a more thorough analysis of potential off-target effects is still required to make sure nothing has been overlooked. Nonetheless, “latent HIV provirus is a very exciting target and . . . a very promising way forward,” said Cradick, who did not participate in the study.

W. Hu et al., “RNA-directed gene editing specifically eradicates latent and prevents new HIV-1 infection,” PNAS, doi:10.1073/pnas.1405186111, 2014

Monday, July 21, 2014

The Creative Mind



 See also Anne Roe's The Making of a Scientist.
The Atlantic: ... One after another, my writer subjects came to my office and spent three or four hours pouring out the stories of their struggles with mood disorder—mostly depression, but occasionally bipolar disorder. A full 80 percent of them had had some kind of mood disturbance at some time in their lives, compared with just 30 percent of the control group—only slightly less than an age-matched group in the general population. (At first I had been surprised that nearly all the writers I approached would so eagerly agree to participate in a study with a young and unknown assistant professor—but I quickly came to understand why they were so interested in talking to a psychiatrist.) 
The Vonneguts turned out to be representative of the writers’ families, in which both mood disorder and creativity were overrepresented—as with the Vonneguts, some of the creative relatives were writers, but others were dancers, visual artists, chemists, architects, or mathematicians. This is consistent with what some other studies have found. When the psychologist Kay Redfield Jamison looked at 47 famous writers and artists in Great Britain, she found that more than 38 percent had been treated for a mood disorder; the highest rates occurred among playwrights, and the second-highest among poets. When Joseph Schildkraut, a psychiatrist at Harvard Medical School, studied a group of 15 abstract-expressionist painters in the mid-20th century, he found that half of them had some form of mental illness, mostly depression or bipolar disorder; nearly half of these artists failed to live past age 60. ... 
This time around, I wanted to examine a more diverse sample of creativity, from the sciences as well as the arts. My motivations were partly selfish—I wanted the chance to discuss the creative process with people who might think and work differently, and I thought I could probably learn a lot by listening to just a few people from specific scientific fields. After all, each would be an individual jewel—a fascinating study on his or her own. Now that I’m about halfway through the study, I can say that this is exactly what has happened. My individual jewels so far include, among others, the filmmaker George Lucas, the mathematician and Fields Medalist William Thurston, the Pulitzer Prize–winning novelist Jane Smiley, and six Nobel laureates from the fields of chemistry, physics, and physiology or medicine. Because winners of major awards are typically older, and because I wanted to include some younger people, I’ve also recruited winners of the National Institutes of Health Pioneer Award and other prizes in the arts. 
Apart from stating their names, I do not have permission to reveal individual information about my subjects. And because the study is ongoing (each subject can take as long as a year to recruit, making for slow progress), we do not yet have any definitive results—though we do have a good sense of the direction that things are taking. By studying the structural and functional characteristics of subjects’ brains in addition to their personal and family histories, we are learning an enormous amount about how creativity occurs in the brain, as well as whether these scientists and artists display the same personal or familial connections to mental illness that the subjects in my Iowa Writers’ Workshop study did. ... 
As I hypothesized, the creative people have shown stronger activations in their association cortices during all four tasks than the controls have. (See the images on page 74.) This pattern has held true for both the artists and the scientists, suggesting that similar brain processes may underlie a broad spectrum of creative expression. Common stereotypes about “right brained” versus “left brained” people notwithstanding, this parallel makes sense. Many creative people are polymaths, people with broad interests in many fields—a common trait among my study subjects.

Saturday, July 19, 2014

Bell Curve @20 @Harvard



The host is Harvard professor Harvey Mansfield. I'm not sure who all of the other panelists are, but they seem to include a professor of government and another of economics. The Asian physics guy is probably Peter Lu.
The Program on Constitutional Government at Harvard University

March 14, 2014: Charles Murray, on “The Bell Curve Revisited.” Charles Murray is a Fellow at the American Enterprise Association, and the author of famous and influential books, among them, Losing Ground (1984), The Bell Curve; Intelligence and Class Structure in American Life (1994, with Richard Herrnstein), and most recently, Coming Apart: the State of White America,1960-2010 (2013). He declares himself a libertarian, has written for many journals, and has received the Irving Kristol award from AEI and the Bradley Prize from the Bradley Foundation. He is Harvard ’65 and received a PhD in political science from M. I. T. in 1974. He is also the author of several “Murray’s laws” of social behavior.

Hail Britannia -- 100k whole genomes

Progress! Genotyping of large, well-phenotyped samples.
TechnologyReview: The British government says that it plans to hire the U.S. gene-sequencing company Illumina to sequence 100,000 human genomes in what is the largest national project to decode the DNA of a populace. ...

Some other countries are also considering large national sequencing projects. The U.K. project will focus on people with cancer, as well as adults and children with rare diseases. Because all Britons are members of the National Health Service, the project expects to be able to compare DNA data with detailed centralized health records (see “Why the U.K. Wants a Genomic National Health Service”).

While the number of genomes to be sequenced is 100,000, the total number of Britons participating in the study is smaller, about 70,000. That is because for cancer patients Genomics England intends to obtain the sequence of both their inherited DNA as well as that of their cancers.
BGI bid for this work but their transition to the upgraded Complete Genomics technology is still in progress. This delay has affected our cognitive genomics project as well.

Big data sets are also being assembled in the US (note in this case only SNP genotyping; cost is less than $100 per person now):
AKESOgen announced today that it has been awarded a $7.5M contract by the U.S. Department of Veterans Affairs (VA) for genotyping samples from U.S. veterans as part of the Million Veteran Program (MVP). This award covers the genotyping of 105,000 veterans in the first year of a five year contract.

"The VA's Million Veteran Program is one of the largest genetic initiatives ever undertaken in the US and its visionary genomics and genetics approach will provide new insights about how genes affect health. The goal is to improve healthcare for veterans by understanding the genetic basis of many common conditions. The data will ultimately be beneficial to the healthcare of all veterans and of the wider community. We are delighted to have been selected by the VA for this unique endeavor and we will provide genetic data of the highest quality to the VA." said Bob Boisjoli, CEO of AKESOgen. To fulfill the VA contract, AKESOgen will utilize a custom designed array based genotyping solution from Affymetrix, Inc. ...
My prediction is that of order a million phenotype:genotype pairs will be enough to deduce the genetic architecture of complex traits such as height or cognitive ability. SNPs will be enough to solve most of the problem, so that cost is now ~ $100M or less -- interested billionaires please contact me :-)

Wednesday, July 16, 2014

Conor Mcgregor

Win or lose, he's entertaining. Definitely the biggest character in the UFC.







Friday, July 11, 2014

Minds and Machines


HLMI = ‘high–level machine intelligence’ = one that can carry out most human professions at least as well as a typical human. I'm more pessimistic than the average researcher in the poll. My 95 percent confidence interval has earliest HLMI about 50 years from now, putting me at ~ 80-90th percentile in this group as far as pessimism. I think human genetic engineering will be around for at least a generation or so before machines pass a "strong" Turing test. Perhaps a genetically enhanced team of researchers will be the ones who finally reach the milestone, ~ 100 years after Turing proposed it :-)
These are the days of miracle and wonder
This is the long-distance call
The way the camera follows us in slo-mo
The way we look to us all
The way we look to a distant constellation
That’s dying in a corner of the sky
These are the days of miracle and wonder
And don’t cry baby don’t cry
Don’t cry -- Paul Simon

Future Progress in Artificial Intelligence: A Poll Among Experts

Vincent C. Müller & Nick Bostrom

Abstract: In some quarters, there is intense concern about high–level machine intelligence and superintelligent AI coming up in a few decades, bringing with it significant risks for humanity; in other quarters, these issues are ignored or considered science fiction. We wanted to clarify what the distribution of opinions actually is, what probability the best experts currently assign to high–level machine intelligence coming up within a particular time–frame, which risks they see with that development and how fast they see these developing. We thus designed a brief questionnaire and distributed it to four groups of experts. Overall, the results show an agreement among experts that AI systems will probably reach overall human ability around 2040-2050, and move on to super-intelligence in less than 30 years thereafter. The experts say the probability is about one in three that this development turns out to be ‘bad’ or ‘extremely bad’ for humanity.

Thursday, July 10, 2014

Chimp intelligence is heritable


A natural place to look for alleles of large effect are the otherwise conserved (from mouse through chimp) variants that are different in humans. See The Genetics of Humanness and The Essential Difference.

My guess (without checking the paper to see if they report it) is that test-retest correlation for chimps is well below the 0.9--0.95 often found for (human) g. Thus the h2 = 0.5 figure reported below could be significantly higher if corrected for reliability.
Nature News: Smart chimpanzees often have smart offspring, researchers suggest in one of the first analyses of the genetic contribution to intelligence in apes. The findings, published online today in Current Biology1, could shed light on how human intelligence evolved, and might even lead to discoveries of genes associated with mental capacity.

A team led by William Hopkins, a psychologist at Georgia State University in Atlanta, tested the intelligence of 99 chimpanzees aged 9 to 54 years old, most of them descended from the same group of animals housed at the Yerkes National Primate Research Center in Atlanta. The chimps faced cognitive challenges such as remembering where food was hidden in a rotating object, following a human’s gaze and using tools to solve problems.

A subsequent statistical analysis revealed a correlation between the animals' performance on these tests and their relatedness to other chimpanzees participating in the study. About half of the difference in performance between individual apes was genetic, the researchers found.

In humans, about 30% of intelligence in children can be explained by genetics; for adults, who are less vulnerable to environmental influences, that figure rises to 70%. Those numbers are comparable to the new estimate of the heritability of intelligence across a wide age range of chimps, says Danielle Posthuma, a behavioural geneticist at VU University in Amsterdam, who was not involved in the research.

“This study is much overdue,” says Rasmus Nielsen, a computational biologist at the University of California, Berkeley. “There has been enormous focus on understanding heritability of intelligence in humans, but very little on our closest relatives.”

Tuesday, July 08, 2014

James Simons: Mathematics, Common Sense, and Good Luck

A great MIT colloquium by Jim Simons (intro by I. Singer). Interesting discussion @28 min about how Simons (after leaving mathematics at 38) became an investor. Initially, he relied both on fundamental / event-driven analysis (reading the newspaper ;-) as well as computer models. But Simons eventually decided on a completely model-driven approach, and the rest is history.

@38 min: on RenTech's secret, We start with first rate scientists ... Great infrastructure ... New ideas shared and discussed as soon as possible in an open environment ... Compensation based on overall firm performance ...

@44 min: Be guided by beauty ... Try to do it RIGHT ... Don't give up and hope for some good luck!

@48 min: a defense of HFT ... the cost of liquidity?

@55 min: world's greatest investor is a Keynesian :-)

@58 min: brief precis of financial crisis ... See also here.

See also Jim Simons is my hero.

Blog Archive

Labels