Mission
Translating Cancer Data into Biological and Clinical Insights
We are an interdisciplinary research lab focusing on computational and translational oncology. Our dedicated team leverages the power of high-throughput sequencing, mass spectrometry, and informatics to push the boundaries of cancer research and treatment. At the core of our mission is a simple yet profound aim: we want to unlock the hidden potential within extensive cancer omics datasets. Our approach is driven by data, and our ultimate goal is to drive progress in the fight against cancer.
We explore three key domains that contribute to our central mission: computational proteomics, cancer proteogenomics, and data democratization.
Our commitment in the above three domains has yielded data-driven hypotheses that shed light on new biological mechanisms, biomarkers, and therapeutic targets. This illumination is especially pertinent in the realms of targeted and immunotherapies. Through close partnerships with cancer biologists and clinicians, we aim to translate these discoveries into tangible benefits for cancer patients.
Computational Proteomics
Proteins are the fundamental building blocks of cells and primary targets for therapeutic intervention. Mass spectrometry (MS)-based shotgun proteomics empowers us to comprehensively identify and quantify proteins and their modifications within biological samples. However, there are various computational challenges in shotgun proteomics data analysis.
Peptide identification is a pivotal step in the analysis of MS proteomics data. To enhance the sensitivity of peptide identification, we have developed deep learning-based algorithms 1-2. These algorithms are tailored for immunopeptidomic and post-translational modification (PTM) profiling studies, where peptide identification proves to be the most challenging.
In our pursuit of identifying novel disease-specific peptides for potential use as biomarkers and therapeutic targets, we have pioneered a customized protein database approach 3-4. By integrating this approach with immunopeptidomics or computational human leukocyte antigens (HLA) binding prediction in a computational workflow named NeoFlow, we streamline the process of proteogenomics-based neoantigen discovery 5.
For assembling identified peptides into proteins, we have introduced a bipartite graph model that effectively represents the intricate relationships between peptides and proteins 6. Expanding upon this model, we have created SEPepQuant 7, a tool designed to enhance isoform characterization. This advancement enables the detection of protein isoform regulations, which play important roles in both normal and disease processes.
To facilitate the interpretation of genes and proteins identified from proteomics and other omics studies, we integrate these findings with existing knowledge about pathway and biological networks to gain a systematic understanding 8-9. Given the often limited curation of knowledge at the level of PTM sites, we leverage recent advancements in deep learning-based natural language processing 10. This approach allows us to extract insights from published literature on PTM sites, enriching our comprehension of findings from PTM studies.
References
Cancer Proteogenomics
Cancer is a disease of genetic aberrations, but many processes downstream of the genome may influence cancer phenotypes. Cancer proteogenomics aims to integrate next generation sequencing-based genomics and transcriptomics with MS-based proteomics to gain a comprehensive understanding of cancer, ultimately enhancing cancer diagnosis and treatment 1.
Our journey in this field started with a groundbreaking colon cancer study published in 2014 2, which introduced the concept of cancer proteogenomics. Since then, this approach has been applied to over a dozen cancer types, with more on the horizon. Our team has actively contributed to more than ten of these studies, assuming a leading role in investigations related to colon cancer, uterine cancer, breast cancer, head and neck cancer, pancreatic cancer, and lung cancer 3-9.
Together, these studies have demonstrated that integrated proteogenomic analysis provides functional context to interpret genomic abnormalities, and that proteogenomics holds great potential to enable new advances in cancer biology, diagnostics, and therapeutics.
References
Data Democratization
While thousands of proteomics datasets with billions of MS spectra have been deposited into public data repositories, their utilization is largely restricted to computational proteomics researchers due to the intricacies involved in comprehending, retrieving, analyzing, and interpreting MS data.
Inspired by the BLAST algorithm, we have developed PepQuery 1-2, a peptide-centric algorithm that enables users to query a peptide sequence of interest, such as a mutant peptide, against a collection of MS/MS spectra to identify statistically significant peptide-spectrum matches. PepQuery has a wide range of applications, such as detecting proteomic evidence for genomically predicted novel peptides, validating novel or known peptides identified using traditional spectrum-centric database searching, prioritizing tumor-specific antigens, identifying missing proteins, and selecting proteotypic peptides for targeted proteomics experiments 2.
In addition to promoting the reuse of public MS spectra data, we also make processed cancer multi-omics data from large consortium studies readily available and useful to the broad research community 3. The LinkedOmics web portal 4 provides a unique platform for biologists and clinicians to access and analyze the vast amount of cancer multi-omics data generated by TCGA and CPTAC.
LinkedOmics allows users to analyze and visualize associations between billions of molecular and clinical feature pairs for each tumor cohort, to compare the association results across omics modalities and cancer types, and to interpret association results using WebGestalt, a user-friendly pathway and network analysis tool developed by us. LinkedOmicsKB 5 further streamlines pan-cancer and multi-omic analysis by introducing innovative visualization techniques.
Collectively, these tools grant scientists easy access to intricate omics data via user-friendly web interfaces. This accessibility significantly amplifies the potential impact of these invaluable cancer datasets.
References