Software

Professor Loïc Yengo has developed a range of software tools associated with the various research studies in which he has participated. These tools are listed below and are listed in his GitHub page.

Yengo et al. (2025) [in prep.]. Within-family heritability estimates from 500,000 sibling pairs of diverse ancestries.

ConvertFSIR. This is an R-based function that converts within-family sibling regression (Visscher et al., Plos Genet. 2006) estimates of heritability and shared-environmental variance for binary traits (e.g., disease) from the observed 0-1 scale to an underlying continuous liability.

Wainschtein et al. (2025) [under review]. Estimation and mapping of the missing heritability of human phenotypes.

rseREML. This is an R-based function to estimates heritability of quantitative traits using pedigree data available in large biobanks. The code is optimized for large sample sizes and relatively small family sizes.
fastGREML. This is a C++ program that implements computational improvements to the GREML methodology initially implemented in GCTA. It notably relies on stochastic approximation.

Sidorenko et al. (2024) Nature Genetics. Genetic architecture reconciles linkage and association studies of complex traits.

predLink. This C++ program predicts genetic linkage from summary statistics of genome-wide association studies.
Related theory and codes can be found here.

Zhang et al. (2025) Nature Genetics. The contribution of gametic phase disequilibrium to the heritability of complex traits.

simAM. This C++ code simulates samples of genotypes and phenotypes from a population undergoing assortative mating.
makeDGRM. This C++ code generates a Disequilibrium Genomic Relationship Matrix from individual-level genotype data in PLINK format.
makeBCDGRM. This C++ code generates Disequilibrium Genomic Relationship Matrix from a set of pre-calculated standard Disequilibrium Genomic Relationship (in GCTA format) Matrix per chromosome.

Data

Professor Loïc Yengo has compiled a number of datasets associated with various research studies he has led. These datasets are listed below.

Yengo et al. (2024). A saturated map of common genetic variants associated with human height.

GWAS Summary

GWAS summary statistics associated with the publication (excluding data from 23andMe) are provided for 5 ancestry/ethnicity group and the groups combined:

AFR: African (mostly AFRICAN AMERICAN)
EAS: East-Asian
SAS: South-Asian
HIS: Hispanic
EUR: European
ALL: Fixed-effect mete-analysis of all ancestries groups

Column descriptions:

SNPID (represented as CHR:POS:REF:ALT)
RSID (RS NUMBER, WHEN AVAILABLE)
CHR CHROMOSOME
POS GENOMIC POSIION (BASE PAIR) – hg19/hg37 BUILD
EFFECT_ALLELE
OTHER_ALLELE
EFFECT_ALLELE_FREQ (3 significant figures)
BETA (6 significant figures)
SE (3 significant figures)
P P-VALUE MARGINAL EFFECT
N

Polygenic Score Weights

Polygenic score (PGS) weights derived from GWAS summary statistics from the study (including data from 23andMe) for 5 ancestry/ethnicity group and an overall score:

Column descriptions:

SNPID (represented as CHR:POS(hg19):REF:ALT)
RSID (If missing, lookup in dbsnp performed. If still missing, chr:pos used)
CHR CHROMOSOME
POS GENOMIC POSIION (BASE PAIR) – hg19/hg37 BUILD
PGS_EFFECT_ALLELE
PGS_WEIGHT (posterior joint SNP effect)
PGS_OTHER_ALLELE
PGS_POSTERIOR_STANDARD_ERROR (posterior standard deviation of joint SNP effect)

PGS Method: PGS weights were generated using the SBayesC method implemented with GCTB (v. 2.0) from GWAS summary statistics including data from 23andMe (i.e., based on the largest samples). SBayesC ran with ancestry group matched linkage disequilibrium matrix except for the cross-ancestry meta-analysis (i.e., *ALL* file), which is based on an LD matrix estimated in a European ancestry sample.

You can check a README file here.

Yengo et al. (2018). Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry.

The supplementary data can be downloaded here.
Summary statistics of genome-wide association of body mass index. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
Summary statistics of genome-wide association of height. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
Results from Summary-data based Mendelian Randomisation (SMR) aiming at prioritising genes, which local genetic control correlate with that of focus traits in genome-wide association studies of height and Body Mass Index (BMI; Yengo et al. 2018; HMG). These analyses have prioritised 110 and 610 genes associated with BMI and height respectively. Each file contains tables with 21 columns as described described here (http://cnsgenomics.com/software/smr/#SMR&HEIDIanalysis) and last column (“tissue”) indicates in which tissue gene expression was measured. SMR analyses were performed using expression QTL (eQTL) data from the Gtex project.

bt_bb_section_top_section_coverage_image

+61 7 3346 6474

Please use our contact form

306 Carmody Rd, St Lucia QLD 4067