Wednesday, July 31, 2013

Extracting phenotype type by genotype using GenABEL

GenABEL is an excellent R package for GWAS studies. It uses special data structure to efficiently store data. The data structure is quite useful and results in remarkable time saving when running GWAS, it does have some limitations. For example, often in GWAS studies one need to know phenotype distribution across genotype of some variant but I couldn’t find a straightforward way of looking at phenotype distribution across genotypes (there may be a better way of doing it but I couldn’t find it)

To get phenotypic information across genotypes I used the following approach (assuming that data is in an object called ‘data’

1. Abstract phenotypic information
pheno<- phdata(data)
The returned object is a dataframe and can be confirmed with class(pheno) command

2. Abstract SNP data
snps <- as.character(data[, c("SNP1", "SNP2", "SNP3")])
You can change as.character in the line above with as.numeric if you want to get genotype information in 0,1,2 format.
The returned object is a matrix with row numbers as subject ID. Thus we need to do two things with this matrix. First we have to convert it into a dataframe and then we have to convert rownames into a column of id

3. Convert matrix 'snps' into a dataframe with row names as an additional column
snps.df<-data.frame(as.numeric(rownames(snps)),snps)
colnames(snps.df)[1]="id"
                          ### Change the column name to 'id'
snp.data <- merge(pheno, snps.df, by="id")  

Now you have a dataframe with phenotype data and SNPs genotype data