A Piece of My Mind: December 2014

Sunday, December 28, 2014

Installing Packages in R

R has thousands of packages that extend its use to almost every arena of research. It is highly likely that you will not need most of these packages but it is also likely that you will need several of them (perhaps from 5 to 20) depending on what you plan to do. Most of these packages are available on the the CRAN website.

http://www.cran.r-project.org/web/packages/available_packages_by_name.html

Bioconductor is another source of R packages for bioinformatics-related research packages can be downloaded from its website.

http://www.bioconductor.org/packages/release/bioc/

To use a package, there are two steps:

First Step: Download and Install a Package – you can download a package to a local directory and install it from there using drop down menu OR you can directly install it from the CRAN repository. Depending on the R GUI user interface that you use, the exact steps may be slightly different. Often, first you have to specify which mirror you want to use; chose a mirror that is geographically closer to you for faster downloads. Then you can chose a package from the package list.

I use RStudio GUI. In Rstudio, click on the tab labeled ‘Packages’. If this tab is not visible, press Ctrl+7 and the tab will become visible (usually in the right lower quadrant of the window). From there you can chose install, then type in the name of package (if multiple packages, enter package names separated by space or a comma). Make sure that ‘install dependencies’ box is checked. Make sure that the correct repository and installation location are selected. Then click Install. You can also chose to use installation command directly from the console; the command below will install ggplot2 package:

install.packages("ggplot2")

Note that you need to install packages only once

Second Step: Loading a Package – Installing a package makes it available for later use but packages are not automatically uploaded during a session. Once you have installed a package, you will need to load that package when you need it during a session. To load a package use the function ‘library()’. the following command will load the package ggplot2.

library(ggplot2)

Some may like to use ‘require()’ function instead of ‘library()’. However, see this post for the differences between the two and why one should prefer ‘library()’ over ‘require()’

Personally, I try to load all needed packages at the beginning of a script. However, this strategy may not work if there is an overlap in the names of functions between two packages and you may see a warning “The following objects were masked from ‘package:xyz’:”.

Some other useful commands to know

.libpaths() # – will give you location of library for packages
library() # – will show you all installed packages
search() # – will show you currently loaded packages

Saturday, December 27, 2014

Some Thoughts About Clinical Research

Clinical research requires a wide range of skills. These skill include the ability to work with a wide range of people, to lead teams with people from wide and vastly different backgrounds, to design appropriate studies, to ask right questions, to understand research methods specific to the study question, to develop in-depth content expertise in the area of research focus, to get funding for research projects, to present study results at national meetings, to write manuscripts for publication in peer-review journals, and so on and so forth.

A fundamental skill for a researcher is the ability to knit together the conceptual framework for a study (theory) with appropriate measurement, with the result either supporting or opposing the conceptual framework. The theory should be based on the most current state of knowledge, the data collected should have the ability to test the theory, the statistical models should reflect both the conceptual structure hypothesized to have given rise to the data and the nature of collected data, and the inferences should be based on the data and the tested statistical models. This process is not linear, rather it is a loop in which theoretical aspects inform the collection of data and results of the data analyses help in refining the theory, which generates more testable hypothesis, additional data collection, and so on.

Most research is probabilistic, as opposed to deterministic. In other words, the results we obtain are not always certain; we have to include uncertainty in our analyses and expect some uncertainty in our results and inferences. Thus, we have to accept that our results are unlikely to be laws governing the system we plan to study and more likely to be an approximation of what we expect to find in the real world, with some uncertainty. There are many reasons and sources of this uncertainty some of which can be addressed while others may still be there despite our best attempts.

A researcher should determine whether the interest of research is to build inferences at population level or at the level of individuals unit (often a patient in clinical research). The study design, data collection and analysis, and the inferences may be quite different depending on what is the object of our interest. While we study individuals, our results usually address inferences at population level. In general, it is much easier to predict about the response at a population level, that is on an average, individuals with higher body mass index (say >30) will have higher blood glucose than (say) 126mg/dL. However, it is much difficult to predict with certainty how likely a particular individual with a BMI>30 is to have higher blood glucose level than 126 mg/dL. For a predictor to work well at an individual level, among other things effect size needs to be quite large.

Another important concept is that of causality. While we often have a conceptual model in our mind that A is caused by B, it may be quite difficult to prove except perhaps in a clinical trial setting. There are several factors that can increase the likelihood that the direction of cause and effect in our conceptual model is correct, such as temporality and biological plausibility. However, often there remains a possibility that B is in fact caused by A or that some other unknown (or unmeasured) factor, C, may be responsible for both A and B. Hence, we often claim an association or correlation between A and B and not causality.

Friday, December 19, 2014

Starting to work with R

There may be some who have just started working with R after someone convinced them that R is the way to go. For those souls, it may be difficult to get started quickly. Below are some of the steps to use to start working with R

Step 1: Go to the CRAN webpage and download the version of R that is appropriate for your operating system - http://cran.r-project.org/

Step 2: Install R

Step 3: Download a GUI for R – While R comes with a GUI, other GUIs are much better. My favorite is RStudio, To download RStudio, go to RStudio website and download the version that is appropriate for your operating system - http://www.rstudio.com/products/rstudio/download/

Step 4: Install RStudio (or a GUI of your choice).

Step 5: Start using RStudio (or GUI of your choice)

That’s it – Good luck!