Research Interests

Data Science
Linear Algebra
Language Evolution
Natural Language Processing

Statistical Modeling of Word Rank Evolution

Collaborators: Rick DaleSuzanne Sindi


The goal is to model word rank evolution using a Wright-Fisher inspired model. Google Ngram data is used to analyze eight languages and compared it to the model that simulates drift evolution. The time-series evolutionary dynamics of word ranks are investigated by adjusting the model parameters and comparing it to the language data.

Preprint (submitted and under peer-review for PLOS ONE Journal):


Analysis of Twitter Texts

Collaborators: Maia Powell, Ayme Tomson, Suzanne Sindi, & Arnold Kim

The goal is to uncover the discourse and evolution behind certain hashtag social movements using NLP methods, machine learning algorithms, and language models. Twitter data was collected and processed using NLTK and several Python tools.


Performance Analysis on Question Answering

Collaborators: Sam Nguyen & Juanita Ordonez


The objective was to fine-tune and evaluate three language models named BERT, ALBERT, and LongFormer on question answering data set called DuoRC where it contains movie plots with narrative structures. Due to the complexity and length of narrative texts, these models are needed to not only answer the question but must also go beyond its capabilities to perform complex reasoning and reading comprehension to infer answers to questions.



Machine Learning Application on Opacity

Collaborators: Robert C. Blake & Ben C. Yee

The objective was to encode opacity - a material property on how much radiation can pass through it - into a neural network as a surrogate model against an existing atomic physics code.


Twitter Network Analysis of the California Camp Fire

Collaborators: Maia Powell & Matthew Mondares

The goal was to explore the spread of information generated by Twitter bots during the 2018 California Camp Fire disaster utilizing user-user and hashtag co-occurrence networks. Twitter bots are users who have automated repetitive and straightforward tweets. Most of them post, repost, or like other tweets to spread information faster than actual users for an unknown large-scale goal.


Predictive Modeling of Flood Susceptibility

Collaborators: Madeline Brown, Ritesh Sharma, & Umesh Krishnamurthy

This project was about modeling flood risk given multiple factors such as scale, demographics, risk perceptions, topology, soil moisture, and precipitation. The general goal was to develop a model for real-time predictions to alert and inform communities of flood risks.


Modeling Spider Predation

Collaborators: Michele Lynn Joyner, Edith Seier, Chelsea Ross, J Colton Watts, Nathaniel Hancock, Michael Largent, & Thomas C. Jones.

The goal of this study was to model the predation movements of the spider species Anelosimus Studiosus using stochastic differential equations.