Biostatistics Series Module 1: Basics of Biostatistics

Indian J Dermatol. 2016 Jan-Feb;61(1):10-20. doi: 10.4103/0019-5154.173988.

Abstract

Although application of statistical methods to biomedical research began only some 150 years ago, statistics is now an integral part of medical research. A knowledge of statistics is also becoming mandatory to understand most medical literature. Data constitute the raw material for statistical work. They are records of measurement or observations or simply counts. A variable refers to a particular character on which a set of data are recorded. Data are thus the values of a variable. It is important to understand the different types of data and their mutual interconversion. Biostatistics begins with descriptive statistics that implies summarizing a collection of data from a sample or population. Categorical data are described in terms of percentages or proportions. With numerical data, individual observations within a sample or population tend to cluster about a central location, with more extreme observations being less frequent. The extent to which observations cluster is summarized by measures of central tendency while the spread can be described by measures of dispersion. The confidence interval (CI) is an increasingly important measure of precision. When we observe samples, there is no way of assessing true population parameters. We can, however, obtain a standard error and use it to define a range in which the true population value is likely to lie with a certain acceptable level of uncertainty. This range is the CI while its two terminal values are the confidence limits. Conventionally, the 95% CI is used. Patterns in data sets or data distributions are important, albeit not so obvious, component of descriptive statistics. The most common distribution is the normal distribution which is depicted as the well-known symmetrical bell-shaped Gaussian curve. Familiarity with other distributions such as the binomial and Poisson distributions is also helpful. Various graphs and plots have been devised to summarize data and trends visually. Some plots, such as the box-and-whiskers plot and the stem-and-leaf plot are used less often but provide useful summaries in select situations.

Keywords: Boxplot; confidence interval; data; descriptive statistics; measures of central tendency; measures of dispersion; normal distribution; stem-and-leaf plot; variable.