Master/Thèse 2 en Epidémiologie Génétique

Title: Studying the genetic architecture of common diseases in a population isolate

Project: Our project aims at understanding the genetic architecture of common diseases relying on the specificities of population isolates. This project is run in a population isolate from Southern Italy (Cilento) in collaboration with Marina Ciullo’s lab (http://www.igb.cnr.it/cilentoisolates) at the Consiglio Nazionale delle Ricerche (CNR) in Naples.

Common multifactorial diseases are due to a combination of multiple genetic and environmental factors. These are the most frequent human diseases, such as cardiovascular or neurodegenerative diseases. Because of their complex determinism, identifying genetic risk factors for such diseases often turns out to be difficult. Population isolates are defined as originating from a limited number of founder individuals with a subsequent demographic growth while remaining isolated from other populations. Because of this, a reduction in the number of genes involved in diseases is expected. In addition, living habits and environment are often quite similar, making the variability of environmental factors less important, making it in turn easier to observe the effects of the genetic variability. These specificities make population isolates very interesting for studying complex traits where both genetic and environment risk factors are involved. Cilento hill villages have all features of genetic isolates. They have been isolated until very recently and underwent a bottleneck due to the plague in the 17th century that reduced to a very few people these populations. Genealogical records have been collected. Cilento demographic characteristics are intermediate between the ones found in the Icelandic population (low level of inbreeding and relatedness) and the Hutterite religious isolate in the USA (high level of inbreeding and relatedness).

The study sample is composed of 2,137 individuals with deep phenotyping. The health status of the populations was assessed through structured standardized questionnaires, clinical records, physical examination, instrumental (blood pressure measurements, ECG, echocardiography) and blood tests. In addition, environmental factors such as physical activity and nutritional intake have been collected. For 1,625 individuals, dense marker genotyping (deCode microsatellites and Illumina SNP array) are available. In addition, deep exome data for 250 of these individuals have been generated at the Sanger Institute (UK). And shallow whole-genome sequence data are expected in the short run.

The project aims at evaluating several aspects of these data. In particular but not limited to, how sequence data allow accurate estimations of inbreeding and relatedness. This is key as most methods in isolated populations rely on this information. But the exact contribution of the student to this project is open to discussion and will depend on his/her background and the time frame of the internship. Depending on the student’s interest and funding opportunities, this Master project could be extended into a Ph.D. project.

References :

Ciullo M, Nutile T, Dalmasso C, Sorice R, Bellenguez C, Colonna V, Persico MG, Bourgain C. Identification and replication of a novel obesity locus on chromosome 1q24 in isolated populations of Cilento. Diabetes. 2008 Mar;57(3):783-90.
Gazal S, Sahbatou M, Babron MC, Génin E, Leutenegger AL. FSuite: exploiting inbreeding in dense SNP chip and exome data. Bioinformatics. 2014Jul 1;30(13):1940-1
Sorice R, Ruggiero D, Nutile T, Aversano M, Husemoen L, Linneberg A, Bourgain C, Leutenegger AL, Ciullo M. Genetic and environmental factors influencing the Placental Growth Factor (PGF) variation in two populations. PLoS One. 2012;7(8):e42537
Zeggini E. Next-generation association studies for complex traits. Nat Genet. 2011 Mar 29;43(4):287-8

Prerequisites: exposure or experience with bioinformatics or statistical analyses of genetic data, programming (R, Perl, C, etc…)

Subjects and methodologies related to the project

  1. Population genetics issues (consanguinity, population structure …)
  2. Bioinformatics for high throughput genetics and genomics
  3. Genetic Epidemiology of Neurodegenerative, Cardiovascular or Cancer related traits

Tools and methodologies related to the project

  1. Mathematical & statistical modeling
  2. Genetic data analysis (R or ad-hoc software)
  3. Monte Carlo simulations
  4. Perl, Python, C/C++ or Java programming

Lab environment: We are a lab of genetic epidemiologists mainly interested in identifying the genetic factors involved in human diseases, understanding their mode of action and characterizing other factors (environment, life style…) which may modulate their effect on disease. Our work includes data collection and data production, as well as data analysis and modeling approaches. Currently, we have different projects focused on asthma, cancer, neurodegenerative or cardiovascular related traits in large outbred populations and in population isolates. These projects are part of international collaborations. Lab website: http://genestat.cephb.fr

Application: Interested candidates should send a C.V., the names of three references and a letter of motivation to anne-louise.leutenegger@inserm.fr