The goal of this study is to derive a methodology for modeling the biological activity of non-nucleoside HIV Reverse Transcriptase (RT) inhibitors. The difficulties that were encountered during the modeling attempts are discussed, together with their origin and solutions. With the selected multivariate techniques: robust principal component analysis, partial least squares, robust partial least squares and uninformative variable elimination partial least squares, it is possible to explore and to model the contaminated data satisfactory. It is shown that these techniques are versatile and valuable tools in modeling and exploring biochemical data.