UWEE Tech Report Series

Filtering tandem mass spectra for quality


Sergey Feldman, Barbara Frewen, Michael J. MacCoss, and Maya R. Gupta

random forests, mass spectrometry, feature selection


Accurate protein and peptide identifications by database search depend on the quality of the mass spectrometer spectra. Excessive quantities of low quality spectra consume valuable computing resources and can decrease overall accuracy of peptide and protein identifications. We present a fast spectrum quality filter called French Press that can remove low quality spectra without database searching. The filter's speed is the result of a tuned random forest classifier and a greedily optimized classification feature subset, culled from features appearing in prior research on spectrum filtering and modeling. Results on diverse data sets of mass spectrometer runs show that the filter can remove roughly $50\%$ of low quality spectra while retaining $99\%$ of identifiable spectra.

Download the PDF version