Data Set

The data for this project were scraped from, a video game review website:

  • Basic game information (Name, Platform, Genre, etc.) on over 6500 games.
  • Game ratings from the IGN reviewer and the user community on nearly 6000 games.
  • Full review text on approximately 900 games.
  • List of top-level commenters and comment text on 850 games.


Pulled data from, a video game review website:

  • Game Information
  • Full Review Text
  • Commenter Usernames


Using this information, several recommendation systems can be implemented:

  • Jaccard Similarity: Using the number of overlapping commenters, a Jaccard Similarity measure can be obtained between games. This similarity measure is used for the bulk of video game recommendations.
  • TF-IDF: For games without commenter information, a Term Freqency-Inverse Document Frequency analysis of the review text highlights other game reviews about similar topics.


Two algorithms to make recommendations:

  • Jaccard Similarity
  • TF-IDF on game reviews

Recommended Game Genres

Hover over an edge segment to see all recommended genres for a specific game genre. Hover over a connection to see the percentage of all recommendations in each direction (chosen genre to recommended genre).

When showing recommendations for a particular game, how many are from the same genre?

The most ''insular'' category, by a large margin, is Wrestling (pink segment near the top), where 14% of all recommendations for wrestling games are also in the wrestling genre (a 35x increase over random chance). There appears to be a strong niche group which prefers to play most wrestling games that are released.

On the flip side, the least insular category is Board (smaller pink segment on the right), where no other board games were ever recommended in the top 12.

Because the recommendation algorithm is entirely based on user comments and interest, these trends indicate if users are passionate about a particular game genre. While RespawnInto is a user product, insights like these may be beneficial to game companies in knowing which markets and game genres to target.

When showing recommendations for a particular game, how many are from the same genre?

GenreInsular Factor


Prediction success for users from Gamespot using the RespawnInto recommendations vs best-in-genre vs random chance.

To validate, commenter names were scraped from over 300 game review pages on (another game review website), where 1400 users had commented on 2+ games.

One of the games was fed into the recommendation algorithm, and the gamespot user was classified as "correctly predicted" if the other game appeared in some top number of recommended games.

The successful prediction rate by the recommendation algorithm was compared to two other methods:

  • Random: The recommended games were chosen randomly from the set of 855.
  • Best-In-Genre: Only games in the same genre as the selected game were returned, and were ordered by best IGN Score.

The final version of RespawnInto displays 12 recommended games in order to make the interface as simple as possible. In this scenario, the RespawnInto algorithm is 75% better than the best-in-genre selection, and 30 times better than random.

  • Pulled commenters from with 2+ comments
  • Given 1 game, can my algorithm predict the other in top 6, 12, 18?
  • Compared to: Random, Best-In-Genre

Ben Thompson

Texas Christian University