MPF Alumni Experience Analysis

DS 4200: Information Presentation & Visualization

💥 Motivation

This project is a data visualization solution for Massachusetts Promise Fellowship (MPF). It focuses on two primary factors: the number of classes and service sites taken/served by MPF Fellows, and their evaluation of their experience as MPF Fellows in multiple dimensions (such as professional development and self-growth). The visualization aims to highlight the relationship between these two factors. This information will be extracted from data provided by MPF that their alumnus have filled out on a survey.

The goal of this project is to use feedbacks from previous alumnus to provide insights for MPF that can assist with their recruitment efforts and plans for improving their program. These insights can potentially be extrapolated to the parent organization AmeriCorps to further inform their overall programming.

🗂️ Data

The primary dataset used in this project was the De-Identified Survey Responses from Alumnus for 2020. This dataset provided a detailed understanding of alumus' MPF experiences and feedbacks during the classes/years they served at MPF.

There were a wide variety of challenges that were present when cleaning a dataset like this. Here are a subset of the steps taken to improve these datasets. First, we needed to parse the time data into a consistent format that could be plotted as datetime objects, rather than a string. The next step was to convert some of the categorical data into ordinal data. For example, ordering values from Strongly Agree to Strongly Disagree as ordinal instead of the categorical format it was presented in. In other words, it would make sense that Neutral would lie between these values.

The next type of data encountered in bulk was text-based data originating from the written feedbacks from alumnus. In order to meaningfully utilize these feedback, each response was mapped to a corresponding sentiment score. Traditional data cleaning steps for text data, such as Tokenization, removing stop words, and etc. were utilized.

Other issues encountered in the data cleaning process included handling a significant quantity of NaN values. In a Machine Learning setting, it would make sense to simply impute these values. However, this could be misleading in a data visualization setting, so these values were relegated to a separate category. Additionally, there were some alumni survey responses that were duplicates and had to be removed. The sites served were also manually inputted by the Alums filling out the response so they had to be parsed (if served more than one location) and cleaned so that different names referring to the same site are grouped together. Similarly, the classes/years served by alumnus were also grouped together and had to be seperated in order to be used in the visualization.

📝 Task Analysis

Index (ID #) "Domain" Task Analytic Task (Low-level, “Query”) Search Task (Mid-level) Analyze Task (High-level)
1 What is the relationship between specific MPF experiences (survey questions that start with My MPF Experience…) and the class/classes served? Correlate Lookup Present
2 What is the relationship between a specific class and MPF experiences? Correlate Browse Present
3 What is the relationship between key words mentioned in experience (written response) to classes served? Correlate Explore Discover
4 Which MPF experience responses had the lowest agreement level? Find Extremum Browse Discover
5 Relationship between a specific class and a specific MPF Experience. Correlate Lookup Present
6 What is the median/mean number of classes/service sites related to MPF Experiences? Characterize Distribution Browse Discover

🎨 Design Process

For our final visualization, we decided to create a word cloud linked to a stacked bar chart that can be filtered by various MPF experiences. The terms used in the word cloud orignated from the written feedback given by the alumnus in their survey responses. The size of each word reflect how often the word was mentioned by all the alumnus. The binned colors used in the word cloud represent the range of sentiment scores of each word with words that have a more positive sentiment darker. The stacked bar chart below represents the alumnus survey responses for their MPF experience throughout the various class/year served.

The design process started with rough sketches by each teammate. Then a finalized digital sketch was created to mimic the final result of the visualization. Some design choices include using binned colors for the word cloud for easier identification of words with different sentiment scores. The stacked bar chart is designed to have proportions of how many Alumnus rated an experience instead of actual counts for easier comparison between the different class the Alumni served.

📊 Final Visualizations

The word cloud was designed to utilize the size channel and color channel to encode both the number of occurances of the term and its setiment value, which allows the user to immediatetly see which terms have been mentioned the most. The sentiment value (a number between 0 and 1) of a term is the tendency of that term to appear in sentences with a positive connotations. The higher the number, the more positive the term.

As mentioned previously, a stacked bar chart is used to allow a more direct comparison between each class/year served by alumnus. This can allow the user to more easily compare the various MPF experiences year over year. The stacked bar chart also uses the color channel to clearly differentiate between the percentage of various ratings for each MPF experience. The package used for coding is mainly D3.js libary.

How to Interact with the Visualization

The stacked bar chart can be filtered by different experiences with the dropdown menu. The stacked bar chart can be additionally filtered by clicking on words in the word cloud above. Similarly, a bar can be clicked to look at the corresponding words for that class/year in the word cloud.

🔎 Data Analysis

MPF Experience Feedback Received

In order to evaluate the overall sentiment towards MPF, a sentiment analysis was conducted and a Word Cloud of terms was constructed. Overall, it appeared that the feedback for MPF was overwhelmingly positive. The vast majority of terms used had either positive or neutral sentiment. There were a couple terms that initially appeared negative (e.g. “Traumatic”) that were filtered out as they were actually being referred to in a positive light (i.e. "starting a career in helping those with Traumatic brain injuries"). The words with neutral sentiment were filtered out resulting in a Word Cloud of the most popular terms.

MPF Experience

By analyzing the stacked bar chart, alumnus tends to respond more positively (higher agreement level) when asked about MPF experiences that are more directly related to the work they are performing at MPF such as Mentor Young People and Professional Development. The experiences that has the lowest agreement level tend to be those that are less likely to be directly impacted by the work performed. Such as Self-care and Self-worth. There also does not seem to have a clear trend between the various classes/year served and the MPF experiences.

📄 Conclusion

In conclusion, most alumni have reported a positive experience by indicating they “Strongly Agree” or “Agree” with their MPF experience survey questions. Additionally, MPF experiences had the most positive responses when the experience is directly related to the work the alumnus performed and more dispersed with experiences that are more related to individual development of the fellows. Improvements/future-work would include further refining the word cloud and adding an option to show not only the positive feedback terms but also terms that might have a more negative setiment to it. This can allow further analysis to possibly identify areas of improvement.