Exploratory Data Analysis

It is vital in any data science research project to first explore the data to gain descriptive and preliminary insights before diving into more complex analysis practices. To understand our customer base, we investigated which subreddits Glossier customers interact with the most. We deciphered that customers interacted heavily with sports, current events, and younger-generation threads and surmise the customer base is largely dominated by younger individuals. We also identified that Glossier is most talked about during the holiday months (November-December) and least talked about around the March timeframe; this aligns with expectations as retail-related discussions and purchases peak around the holidays.

From rudimentary research, the Glossier subreddit content comprises a wide variety of topics and sentiments around our beauty products. Some of the most common products mentioned include blush, skincare, and foundation, suggesting demand for these products are highest. Future analysis will classify products by sentiment, identify prevalent topics, and contrast key findings to our competitors. And, from a competitor outlook, Sephora and Ulta appear to be the most popular across makeup subreddits, while Glossier and Fenty lag behind. This suggests that market share is dominated by Sephora and Ulta. Future analysis will examine this hypothesis and identify whether this phenomenon exists across other online platforms.

The analysis performed here is aimed at utilizing descriptive analysis to understand the data and shape our downstream analysis. As such, this analysis is structured by a descriptive question associated with one or more of the business goals above; the business goals that each chart and question relate to are highlighted in the text. All code for the following analysis and visualizations can be found here. Finally, it is imperative to note that not all business goals have been addressed.

Question #1: What subreddits do Glossier users interact with the most?

Figure 1 shows the top subreddits by total activity (comments and submissions) for the unique users who have posted in r/Glossier. This successfully shows popular interests among these "Glossier authors", their market base. From Figure 1, it is evident that individuals in the Glossier market are also interested in investing which is clear from their interactions with r/CryptoCurrency, r/wallstreetbets, r/Superstonk, and r/amcstock. These individuals also like to keep up to date on current events as r/news, r/politics, and r/worldnews are the sixth, eighth, and twelfth subreddits with the most activity. These individuals are interested in sports, particularly basketball and soccer. We can also conclude that the Glossier market skews toward younger age groups, with r/teenagers being the subreddit with the seventh most activity.

Table 1 further breakdown the activity on the top subreddits by submission and comments. Interestingly, there are many more comments than submissions by these users. That and the interest in r/FreeKarma4U and r/FreeKarma4You suggest that these individuals are avid Reddit users. Overall, we hypothesize that the average customer is from a younger generation, which aids in identifying Glossier persona characteristics (Business Question 6).

Figure 1

Table 1: Activity Of Glossier Users By Subreddit

Question #2: How does user activity across the Glossier thread fluctuate over time?

Figure 2 depicts the overall activity, a combination of comments and submissions, in the Glossier subreddit thread from January 2021 to August 2022. From Figure 2, it is evident that the holiday months (November - January) have significantly more activity than other months. Furthermore, there appears to be some level of seasonality and cyclic behavior present; the activity tends to increase in the spring months, decrease in the summer months, and increase in the winter months. We can surmise that the increase in activity during the winter holiday months is due to an increase in purchases and retail-related discussions. Additionally, we can hypothesize that the activity tends to increase in the spring in preparation for the summer when new beauty products are desired. These patterns suggest that demand may stay constant for the next month, providing a valuable hypothesis for Business Question 4 and Business Question 5.

Table 2 shows the total number of comments and submissions in the Glossier subreddit by month. This summary table adds additional granularity to the chart above and helps put the total activity into context. As expected, there are more comments than submissions for any given month. Whenever a month experiences a change in overall activity, the submission and comment volume follow the same behavior (i.e., increase or decrease).

Figure 2

Table 2: Glossier Subbreddit Activity By Month

Question #3: What are the top words mentioned across all Glossier comments?

Figure 3 depicts the top used words within the r/Glossier subreddit from January 2021 to August 2022. The size of the words indicates frequency. We can gather that posters and commenters are mostly writing about a wide range of products. We see the words brow, shades, concealer, lip, skin, etc., suggesting that the conversation around products is diverse. We can also see common words include don't and love; we hope to explore this more with NLP to understand if, as these words suggest, there is a wide range in sentiment about Glossier's products and business (Business Question 2, Business Question 3).

Figure 3

Brand Mentions

Question #4: How often are cosmetic providers mentioned in general makeup channels and how does Glossier compare?

Figure 4 depicts the percentage of brand mentions in the Makeup and MakeupAddiction subreddits from January 2021 to August 2022. We can clearly see that Sephora, Ulta, and Fenty are mentioned more in the Makeup and MakeupAddiction subreddits than Glossier - 34%, 25%, and 28% as compared to 13% respectively. Table 3 lists all possible combinations of Sephora, Ulta, Fenty and Glossier mentions in the Makeup and MakeupAddiction subreddits. While brands are most often mentioned by themselves in a post or comment, there are some interesting combinations such as Sephora and Ulta being mentioned in the same post or comment or Fenty and Glossier being mentioned together. Overall, we hypothesize that Glossier is not as popular as its competitors and will validate through text and predictive analyses (Business Question 9). The analysis will further be expanded by linking external google trends data and determining if reddit mentions correlate to google searches.

Figure 4

Brand Mentions

Table 3: Brand Mentions In General Makeup Channels

Question #5: How do the top products vary by Glossier subreddit as compared to competitor subreddits?

Figure 5 represents the top five mentioned products in the r/Glossier subreddit from January 2021 to November 2022 in comparison to all products. While we can see that there are some standout products - namely blush, skincare products, foundation, bronzer, and tinted chapstick - the largest area is comprised of the "other" category; products in this category were mentioned much less in comparison to the top five mentions. Using mentions as a proxy for popularity, we may glean that these five products are in high demand. However, further exploration into sentiment of these products is critical to understand whether customers have a positive or negative experience with them. This supplemental analysis will allow us to determine which products should be sold together in a kit, which products may not be with carrying, and which products we should explore developing more shade/scent options for (Business Question 2).

Figure 5

Brand Mentions