Research Group in Biostatistics (RGB)
Peter Song of the University of Michigan is giving a talk on high-dimensional classification analysis, on Friday, April 20, 2012. Below is the summary of the talk:
Recent advances in high throughput technology have given opportunities to conduct innovative clinical and epidemiology studies using millions of proteomics or genetics biomarkers. One primary interest is to derive classifiers to partition patients into different clinical groups. Prior to the application of any existing classifier in actual studies, a very first design task is to determine an adequate sample size to reach a desirable classification precision. In this talk, we will present a systematic analysis for the sample size calculation in high-dimensional classification analysis. Different from classical hypothesis testing, our design for classification utilizes the probability of correct classification (PCC) as a criterion to calibrate the sample size. Using a classifier constructed by the higher criticism threshold (HCT) approach, we develop a novel procedure that enables us to accurately determine the sample size under large p small n scenarios. We further extend such procedure to allow studies to incorporate patient’s clinical characteristics. We derive the theoretical bound of maximum improvement in classification precision when both molecular and clinical predictors are used. We illustrate our new methods via a study design that uses proteomics markers to classify post-kidney transplantation patients into stable and rejection classes. This is a joint work with Meihua Wu and Brisa Sanchez.