GWAS data

Filenames: LASA_Affy, LASA_Illumina

Contact: Najada Stringa 

Genotyping array data (often referred to as GWAS data) are currently used to identify single nucleotide polymorphism (SNP) associated with various traits in Genome-Wide Association Studies (GWAS). SNP array data cover variants located in the protein coding region as well as in non-coding regions of the DNA.

GWAS data can be used to perform association – GWAS or candidate gene – studies, and to perform analyses developed for these data: polygenic risk scoring, estimation of SNP-based heritability and genetic correlations. In LASA, genotyping data are available for the first, second and third cohort.

Measurements in LASA 

Blood collection
Blood samples were drawn from respondents participating in the LASA medical interview in C-cycle (1995-1996), G-cycle (2008-2009), 2B-cycle (2002-2003) and 3B-cycle (2012-2013). In the first and second cohort, DNA isolation was made from buffy coats in C and 2B cycle or full blood samples in G-cycle. For participants who had both full blood samples and buffy coats available, full blood samples were used to extract the DNA. In the third cohort full blood samples drawn at baseline (3B-cycle) were used for DNA isolation. In all samples DNA was extracted using standard procedures.

Measurement procedure & quality control (QC)
Genotyping for the first cohort was done using two arrays: Axiom-NL Array (Affymetrix Inc, Santa Clara, CA., USA) at Avera Institute for Human Genetics, Sioux Falls, SD., USA and Infinium Global Screening Array (GSA) (Illumina Inc, San Diego, CA., USA), as part of the EU GSA consortium at Human Genomics Facility (HuGe-F), Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands. At first, we were able to genotype 623 participants with the Axiom-NL Array. For this, we selected participants who had data on the C-cycle and D-cycle. Later, we were able to genotype the remaining persons with blood samples available using GSA. 

Genotyping for the second and third cohort was done using Infinium Global Screening Array-24-v1.0 (GSA) (Illumina Inc, San Diego, CA., USA), as part of the EU GSA consortium at Human Genomics Facility (HuGe-F), Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands.

Axiom-NL Array [1] targets around 610 000 SNPs and includes SNPs commonly found in other genotyping platforms as well as specific markers from previous GWAS results, SNPs associated with psychiatric disorders, fertility and twinning.

GSA-24 v.1.0 targets around 690 000 SNPs and includes common SNPs as well as other SNPs important in clinical research and precision medicine research.

Due to technical differences, quality control (QC) and imputation were done separately for each array. For both arrays, QC was performed using Ricopili (Rapid Imputation and Computational Pipeline for GWAS), an established tool developed by the Psychiatric Genomics Consortium [2, 3]. Samples with sex mismatch (genetic sex does not match reported sex), duplicate samples, excess heterozygosity and call rate < 0.98 were removed after QC. SNPs with call rate < 0.98 and minor allele frequency (MAF) < 0.01 were also excluded.

A summary of the total number of individuals and SNPs available per array can be found in table 1.

Ancestry and relatedness
Principal components (PC) for each array were calculated and the data was plotted (see Fig. 1 and 4) together with the 1000 Genome dataset. Samples of non-European ancestry were identified and later removed using the 1000 Genome data as reference. Then, 10 PCs were calculated for each array (see Fig. 2 and 5). These PCs are available and recommended to be used as covariates in all analysis to adjust for population stratification. To check whether the generated principal components do properly correct for population stratification we ran a GWAS on height (see Fig. 3 and 6).

The data was further checked for relatedness between participants and a list of related individuals is available.

Both datasets were imputed  based on Haplotype Reference Consortium (HRC, panel version 1.1. Imputation was performed on the Michigan Imputation Server for the autosomal chromosomes (1-22).

Table 1: Number of samples and SNPs available per array



Europeans (excl. related samples)


623 (631 028 SNPs)

620 (600 950 SNPs)



1880 (686 082 SNPs)

1791 (471 977 SNPs)






Variable information
The raw genotypes, QC-ed genotypes and imputed data files are available. Specific SNPs can be extracted upon request.

Data includes also:











Affymetrix - Me+ - - - - Me+ - - - -
 - Me+ - - Me+ - Me+ - Me+ - -

1 More information about the LASA data collection waves is available here.

 * 2B=baseline second cohort;
   3B=baseline third cohort;
   MB=migrants: baseline first cohort

Me=data collected in medical interview
+= see Blood collection description

Previous use in LASA
Several publications are under construction.


  1. Ehli, E.A., et al., A method to customize population-specific arrays for genome-wide association testing. European Journal of Human Genetics, 2017. 25(2): p. 267-270.
  2. Schizophrenia Working Group of the Psychiatric Genomics, C., Biological insights from 108 schizophrenia-associated genetic loci. Nature, 2014. 511(7510): p. 421-427.
  3. Wray, N.R., et al., Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet, 2018. 50(5): p. 668-681.