This dataset was used to overview the taxonomic composition of the gastrointestinal microbiomes during the first two years of life with the intent to better understand the effects of the invasive pathogen, Shigella. Data was collected through a two year longitudinal study of a mother-infant cohort in Malawi, using 16S rRNA amplicon sequencing to characterize the gastrointestinal microbiota. Rectal swab samples were collected every 6 six months during well-child visits or when an infant visiting the clinic presented diarrhea to diagnose Shigella infection. 16S rRNA gene amplicon libraries yielded an average of 23,720 reads per sample with a total of 8,729,027 reads and a total of 325 taxa were identified after quality filtering.