
I am the principal investigator (PI) of the Big Data Biology Lab at the Centre for Microbiome Research at the Queensland University of Technology (Brisbane, Australia) where we study the global microbiome.
Previously, my lab was hosted at Fudan University in Shanghai (2018–2023).
Before becoming a group leader, I worked at the European Molecular Biology Laboratory (EMBL) in Peer Bork's group. I have a PhD from Carnegie Mellon University (2011), where I worked on bioimage informatics for subcellular location analysis with Bob Murphy.
May 18, 2026 Talk at the 55th Annual Meeting of SBBq in Águas de Lindóia, SP, Brazil. "Big data and small genes. The small proteins of the global microbiome"
Email me if you want to set up meetings at any of these opportunities (or to invite me for other opportunities).
If you want to chat with me about science and such, you can use my cal.com link.
More technical version: I am interested in microbiomes. I wish to answer basic questions on what determines the structure of a microbial community in a given environment and what are the differences/similarities between different environments. Towards this, I pursue both method development (e.g. SemiBin or NGLess) and biologically-driven projects. Solving problems related to small proteins (those with fewer than 100 amino acids) is a particular interest of mine.
Not-so technical version: Microbes are all around us (including inside of us), what we call a microbiome. With modern technology, it is possible to sequence the DNA of all the microbes in a sample (be it from a soil sample or from a human gut sample). This allows us to answer questions such as: what are the microbes present in a given sample? How do they differ between samples? What are the functions of these microbes?
However, to achieve this, we need to develop computational methods to analyse these data. My work is focused on both developing these methods and applying them to answer biological questions.
Here is what o3 had to say about my research interests:
I work at the nexus of microbiology, big data, and machine learning: my group mines massively scaled metagenomic datasets—tens of thousands of environmental and host-associated samples—to build global gene catalogs such as GMGC v1, capturing both canonical proteins and the overlooked “small gene” universe of short open-reading frames. Leveraging these resources, we uncover new biology, most recently assembling AMPSphere, the world’s largest collection of candidate antimicrobial peptides, and experimentally validating their therapeutic promise. In parallel, we create open-source, high-performance tools like SemiBin and SemiBin 2, which use self-supervised neural networks to reconstruct high-quality metagenome-assembled genomes from short- and long-read data with minimal computational cost. Together, these efforts aim to chart the diversity, ecology, and functional capacity of the global microbiome—and translate that knowledge into novel antibiotics and other bioactive molecules.
Recent(ish) publications (since 2024; see full list):
16. Long-read metagenomic sequencing reveals novel lineages and functional diversity in urban soil microbiome by Yiqian Duan, Anna Cusco, Yaozhong Zhang, ..., Gaofei Jiang, Xing-Ming Zhao, Luis Pedro Coelho in bioRxiv (PREPRINT) (2026).
15. A gut microbiome-kidney-heart axis predictive of future cardiovascular diseases by Kanta Chechi, Rima Chakaroun, Antonis Myridakis, ..., Luis Pedro Coelho, ..., S. Dusko Ehrlich, Karine Clément, Marc-Emmanuel Dumas in Nature Communications (2026).
14. proGenomes4: providing 2 million accurately and consistently annotated high-quality prokaryotic genomes by Anthony Fullam, Ivica Letunic, Oleksandr M Maistrenko, ..., Luis Pedro Coelho, ..., Thomas S B Schmidt, Peer Bork, Daniel R Mende in Nucleic Acids Research (2025).
13. Capturing global pet dog gut microbial diversity and hundreds of near-finished bacterial genomes by using long-read metagenomics in a Shanghai cohort by Anna Cuscó, Yiqian Duan, Fernando Gil, ..., Ulrike Löber, Xing-Ming Zhao, Luis Pedro Coelho in biorXiv (PREPRINT) (2025).
12. AEMB: a computationally efficient abundance estimation method for metagenomic binning by Shaojun Pan, Ivan Tolstoganov, Kristoffer Sahlin, Marcel Martin, Xing-Ming Zhao, Luis Pedro Coelho in biorXiv (PREPRINT) (2025).
11. Persistence of High-Risk Antimicrobial Resistance Genes in Extracellular DNA Along an Urban Wastewater-River Continuum by John P. Makumbi, Samuel K. Leareng, Oliver K. Bezuidt, Luis Pedro Coelho, Thulani P. Makhalanyane in Sneak Peek (SSRN Preprints) (2025).
10. A census of hidden and discoverable microbial diversity beyond genome-centric approaches by Vishnu Prasoodanan P K, Oleksandr M Maistrenko, Anthony Fullam, ..., Luis Pedro Coelho, ..., Anja Spang, Peer Bork, Thomas S B Schmidt in bioRxiv (PREPRINT) (2025).
9. AI-Driven Antimicrobial Peptide Discovery: Mining and Generation by Paulina Szymczak, Wojciech Zarzecki, Jiejing Wang, ..., Luis Pedro Coelho, Cesar de la Fuente-Nunez, Ewa Szczurek in Accounts of Chemical Research (2025).
8. argNorm: normalization of antibiotic resistance gene annotations to the Antibiotic Resistance Ontology (ARO) by Svetlana Ugarcina Perovic, Vedanth Ramji, Hui Chong, Yiqian Duan, Finlay Maguire, Luis Pedro Coelho in Bioinformatics (2025).
7. Quest for Orthologs in the Era of Biodiversity Genomics by Felix Langschied, Nicola Bordin, Salvatore Cosentino, ..., Luis Pedro Coelho, ..., Paul D Thomas, Christophe Dessimoz, Ingo Ebersberger in Genome Biology and Evolution (2024).
6. A catalog of small proteins from the global microbiome by Yiqian Duan, Célio Dias Santos-Júnior, Thomas Sebastian Schmidt, ..., Xing-Ming Zhao, Peer Bork, Luis Pedro Coelho in Nature Communications (2024).
5. Discovery of antimicrobial peptides in the global microbiome with machine learning by Célio Dias Santos-Júnior, Marcelo D.T. Torres, Yiqian Duan, ..., Jaime Huerta-Cepas, Cesar de la Fuente-Nunez, Luis Pedro Coelho in Cell (2024).
4. For long-term sustainable software in bioinformatics by Luis Pedro Coelho in PLOS Computational Biology (2024).
3. Challenges in computational discovery of bioactive peptides in ’omics data by Luis Pedro Coelho, Célio Dias Santos‐Júnior, Cesar de la Fuente‐Nunez in PROTEOMICS (2024).
2. A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer by Marija Dmitrijeva, Janko Tackmann, João Frederico Matias Rodrigues, Jaime Huerta-Cepas, Luis Pedro Coelho, Christian von Mering in Nature Ecology & Evolution (2024).
1. Ubiquity of inverted ’gelatinous’ ecosystem pyramids in the global ocean by Lombard Fabien, Guidi Lionel, Manoela C. Brandão, ..., Luis Pedro Coelho, ..., Karsenti Eric, Gorsky Gabriel, Tara Oceans Coordinators in biorXiv (PREPRINT) (2024).
All publications... (Google Scholar profile)
Feb 25 Talk at BRISJAMS in Brisbane, Australia. "AI and big data in microbiology: the hype, the promise, and the disappointments"
Dec 15-18 Keynote at the 19th International Conference on Data and Text Mining in Biomedical Informatics (DTMBIO 2025) in Muju, Republic of Korea.
Oct 23-24 I attended the Queensland Immunology Networking Symposium
Oct 13-15 I was in Houston for the SMBE Satellite Meeting: Evolutionary Biochemistry of Insect Antimicrobial Peptides.
Sep 16-19 I will be at the EMBL Human Microbiome Symposium.
Aug 17-22, 2025 Decoding Microproteins Across Evolution and Disease GRC in Barcelona, Spain
Apr 15 argNorm published at Bioinformatics
Apr 15-18 I will be at the Pakistan Society for Microbiology Conference.
Jul 16-24 I was in London July 15-20 and then in Liverpool for the ISMB/ECCB 2025 – Intelligent Systems for Molecular Biology & European Conference on Computational Biology
Dec 10 & 12: Open office hours. Two sessions at different times of the day, so it works for all timezones. Attendance is free, but registration is required:
Nov 11-15 I co-taught a 1 week course on state-of-the-art bioinformatic approaches to analyze metagenomic data. Click here to learn more and register.
Copyright (c) 2009-2026. Luis Pedro Coelho. All rights reserved.