Date of Award


Document Type


Degree Name

Master of Science (MS)



First Advisor

Harrison, Scott H.


Flagella have been studied for more than a hundred years, yet there remains a simple and powerful question about how to further make use of the recent wave of genomic data to find differences between flagellar genes across diverse bacterial strains. With the comparative power of bacterial genomic data, we propose that flagellar gene sequences may be used as genetic markers that indicate evolutionary adaptations of different bacteria for different hosts. We specifically proposed to identify and utilize genetic sequence differences that may diagnose for the presence of bacterial pathogens found to infect different hosts, specifically animal versus plant hosts. Four flagella genes were chosen for this study due to the fact that they are consistently found across diverse types of bacteria having both flagellar phenotypes and fully sequenced genomes. fliC and fliD express proteins that are located on the outside of the cell and are potentially antigenic in animal hosts. fliJ and flgG are expressed as intracellular protein products. We collected sequences for these flagellar genes from 18 animal host bacteria and 18 plant host bacteria and analyzed for homologous regions and dN/dS measures of selective pressure. As expected, fliC had the highest amount of diversification for animal host bacteria versus plant host bacteria. We developed a quantitative trait loci approach for identifying differentiating regions within the fliC gene - a 51 nucleotide region encoding for some of the conserved N-terminal domain of flagellin (FliC) and a 21 nucleotide region encoding for the more variable middle domain of flagellin. We then developed and tested an algorithm with these regions against our database of 36 bacterial strains, and the performance characteristics of the algorithm were best for inferring the presence of plant host bacteria compared to animal host bacteria. We implemented an extraction and PCR amplification protocol to screen for bacterial fliC DNA from agricultural produce and, for multiple replicates from a single specimen of lettuce, found sequence patterns corresponding to both the N-terminal domain and middle domain that were consistent with plant host bacteria. We anticipate that further development of the algorithm and the usage of next generation methods of sequencing will help enhance this overall workflow for more extensive contexts of usage.