Pro-Ana Communities

code
analysis
Author

Sydney Bednar

Published

March 27, 2023

Introduction

TRIGGER WARNING

For my undergraduate thesis, I have been using computational methods to analyze topical trends in online eating disorder communities. More than9% of the global population will have an eating disorder at some point over their lifetime. About 26% of this population will attempt suicide, andonly 10% will receive treatment. Many who are struggling with an eating disorder have turned to these communities for support, but the main challenge with these spaces is that a subculture of pro-ana or pro-anorexia thinking has infiltrated, and users comfortably trade harmful dieting tips and medical misinformation. 

The dataset I will be using consists of three data frames which I have scraped using Selenium and the Reddit API. Each data frame includes metadata and topic model results (l1-normalized) as well as actual post text from discussions within that community. The communities include MyProAna, Eating Disorder Central, and the Reddit community, EDAnonymous. 

Question: How do topics of conversation vary across the three communities and what are the most important words within each community?

I am interested in understanding the differences between each community in terms of topics and individual words. As a student-athlete, I have had my own personal experiences with disordered eating and have also witnessed many of my closest friends struggle throughout their time in college. This topic resonates deeply with me because of this, and I think that looking into eating disorder communities can help inform the gaps that exist in eating disorder treatment.

Approach

Bar/Lollipop Chart:

I will first create a lollipop chart faceted by topic to show the average score of that topic within each community. This score is indicative of how salient each topic is within a community. I will also highlight the longest line by changing the shape of the point on the end so that the user can easily determine which community has the highest score for that topic. This plot will be an efficient way to compare values that have small differences, as it is easier to see these differences than when using a classic bar chart. 

Wordcloud:

Along with looking at topical trends across the three communities, I will make a wordcloud to explore differences at a word level. This will help answer the second part of my question regarding differences between communities at the word-level. The word cloud will show word count frequencies mapped to both size and a sequential hue so that it is clear to the viewer which words are most important. This is a very natural representation of word frequencies and will be easy for the viewer to understand.

Analysis

Discussion

In the first chart, we can see that on average, while some topics have similar saliency across all three communities, a few are dominated by one community. Most notably, the Reddit community appears to discuss more about living with an ED as well as triggers, whereas EDC discusses calorie restriction and feeling hungry, and MPA offers more empathy and discussions around recovery. This makes sense because EDC is the least moderated site and contains a lot of pro-ana content, whereas MPA has actually been cited as a recovery site, and Reddit’s AI moderation system removes many harmful posts. Based on these topical trends, I would expect to see each community have differences in word usage as well.

In the second chart, we can see that a few words are frequent in all of the communities. These words, including weight, eat, and food are unsurprising due to the nature of these communities. We can see that feel is one of the most common words in the Reddit community, which supports the finding that the topic, living with an ED, is dominated by this community, as many members are most likely describing how it feels to live with the disorder. Overall, however, it is difficult to make causal claims just from these words, as we are unable to understand the context. For example, we can see that the word calories is more prominent in EDC, which is what we would expect based on the topic saliency which I described above, but we also see that calories is a prominent word on MPA.

We do not know, however, whether the word calories is used in a positive or negative way. Additionally, to properly analyze these words, we would need to compare them with the important words associated with each topic. Ultimately, I think that the topical differences (which are themselves informed by word frequencies) are much more informative and should be my focus as I continue to analyze the language in these communities, with word-level differences as a supplement.