Date of Award

2019

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Dr. Xiaohong Yuan

Abstract

Twitter has emerged rapidly as an ideal platform for news updates especially during natural disasters. During disasters, people exchange endless amount of information on Twitter. This information may include warnings, evacuation orders, updates etc. Making sense of this information is challenging due to the limitations of available tools to analyze high-volume data. There have been studies done to make use of twitter data as it contains valuable information that has the potential to help improve the efficiency of disaster response. This research presents a framework to extract, automatically label, and classify tweets from two recent disaster events in order to make sense of the data, identify disaster-related tweets, and evaluate their credibility. The framework also includes classifying tweets into disaster-related or not disaster-related, and credible or not credible using learning-based methods. Many risk-factors associated with panic disorder occur amongst the public during natural disaster. This research presents a panic trigger identification framework to detect triggers that form cyber disruption threats in hurricane disaster data, and reports to emergency responders to mitigate such threats. The results of this research show that automated labeling can be sufficient for labeling tweets in accordance with relevance to the disaster and tweet credibility. For disaster relevance classification, using CountVectorizer word vectorizer has produced features that led to higher accuracies (98% on average) especially when using Decision Tree and Random Forest models. For classifying tweets in terms of credibility, Random Forest and Decision Tree models have given the best predictions with high accuracies (96% on average). For classifying tweets in terms of panic triggers, Random Forest and Decision Tree have given the best predictions with high accuracies (95% on average) when using CountVectorizer features. The contributions of this research include: (1) Two datasets of tweets on hurricanes were collected which will be made available for future researchers; (2) An automated labeling framework were developed to label disaster tweets into disaster-related and not-disaster-related using dictionary-based technique, and credible and not credible using user-based and content-based features; and (3) A panic trigger detection framework was developed to improve emergency response.

Share

COinS