Radiation therapy (RT) is a frontline approach to treating cancer. While the target of radiation dose delivery is the tumor, there is an inevitable spill of dose to nearby normal organs causing complications. This phenomenon is known as radiotherapy toxicity. To predict the outcome of the toxicity, statistical models can be built based on dosimetric variables received by the normal organ at risk (OAR), known as Normal Tissue Complication Probability (NTCP) models. To tackle the challenge of the high dimensionality of dosimetric variables and limited clinical sample sizes, statistical models with variable selection techniques are viable choices. However, existing variable selection techniques are data-driven and do not integrate medical domain knowledge into the model formulation. We propose a knowledge-constrained generalized linear model (KC-GLM). KC-GLM includes a new mathematical formulation to translate three pieces of domain knowledge into non-negativity, monotonicity, and adjacent similarity constraints on the model coefficients. We further propose an equivalent transformation of the KC-GLM formulation, which makes it possible to solve the model coefficients using existing optimization solvers. Furthermore, we compare KC-GLM and several well-known variable selection techniques via a simulation study and on two real datasets of prostate cancer and lung cancer, respectively. These experiments show that KC-GLM selects variables with better interpretability, avoids producing counter-intuitive and misleading results, and has better prediction accuracy.
Keywords: Statistical modeling; generalized linear models; radiation toxicity prediction; variable selection techniques.