Improving the Video Game Recommender

Written by

in

In my last post, we took the 380+ game video game attributes that we extracted from the Steam video game dataset and wrote an algorithm to cluster the attributes into 24 groups. In this post we will use the clusters to make an new and improved video game recommender. If you haven’t read my first
and second post on the video game recommender, please read them before continuing.

Step 1: Reconstructing the game features table

The game features table that we built in the first post contained hundreds of attributes. We will construct a much smaller table using the game attribute clusters.

Game Features Table

The each column in the table indicates the magnitude of that attribute present in each game. The numbers are computed for each game using the attribute-cluster assignments we obtained from K-means. For each game, we count up the number of attributes that belong to each cluster. The construction of the new game features table is described in the code snippet below.

# Fit the K Means clustering algorithm get the cluster assignments for each attribute
km = KMeans(n_clusters=24, random_state=25)
km.fit(feature_set)
labels = km.predict(feature_set)
attribute_assignments = pd.Series(labels, index=game_categories)
Group attributes into list of clusters
attribute_clusters = []
for i in range(24):
cluster = attribute_assignments[attribute_assignments == i]
attribute_clusters.append(cluster.index.tolist())
feature_columns = ['clust_'+str(i) for i in range(25)]
game_features = []
for idx in range(steam_games_df.shape[0]):
# Obtain list of genres, tags, and specs
game_genre = steam_games_df.iloc[idx]['genres']
game_tags = steam_games_df.iloc[idx]['tags']
game_specs = steam_games_df.iloc[idx]['specs']
attributes = []
data_row = {k:0 for k in feature_columns}
data_row['id'] = steam_games_df.iloc[idx]['id']
# Iterate through each entry in the lists and create the features
if game_genre:
attributes.extend(game_genre.split(','))
if game_tags:
attributes.extend(game_tags.split(','))
if game_specs:
attributes.extend(game_specs.split(','))
attributes = set(attributes)
if len(attributes) > 0:
for attr in attributes:
for i in range(len(attribute_clusters)):
if attr in attribute_clusters[i]:
data_row['clust_'+str(i)] += 1
else:
data_row['clust_24'] += 1
game_features.append(data_row)
game_features_df = pd.DataFrame(game_features)
game_features_df = game_features_df.set_index('id')

Step 2: Reconstructing the user features table

Using the game features table we built in step 1, we will rebuild a much smaller user features table.

User Features Table

Similar to the game features table, each column in the user features table indicates the degree each user prefers a game with that attribute. The numbers are computed for each user by retrieving the features for each game played by the user from the game features table and adding them up.

game_feat_dict = game_features_df.to_dict()
# Read user items data file and build features table
with open('/content/drive/MyDrive/VideoGameRecFiles/australian_users_items.json','r',encoding='utf8') as f:
data = f.read()
data = data.strip().split("\n")
user_features = []
# We need to keep track of all the games each user played so we can avoid recommending games that they have already played.
user_play_list = {}
for user_data in data:
# The stdataset is not a properly formatted json file. Because of this we need to iterate through each individual JSON object and use
# the ast module to parse the object.
record = ast.literal_eval(user_data)
data_row = {k:0 for k in feature_columns}
data_row['user_id'] = record['user_id']
play_list = []
for item in record['items']:
item_id = item['item_id']
play_list.append(item_id)
for col in feature_columns:
if item_id in game_feat_dict[col]:
data_row[col] += game_feat_dict[col][item_id]
user_play_list[record['user_id']] = play_list
user_features.append(data_row)
user_features_df = pd.DataFrame(user_features)
user_features_df = user_features_df.set_index("user_id").drop_duplicates()

Step 3: Defining a new recommender function

With our new game and user feature tables in place a new method for examining similarity between users is in order. In our first recommender, we used the matching dissimilarity score. In our new recommender, we’re going to use cosine similarity. Cosine similarity is a measure of similarity between two numerical vectors. It is the dot product between two vectors divided by the product of their lengths.

Let’s suppose that we have a user that we want to generate recommendations for. We’ll call the user in question u and the number of recommendations we’d like to generate x. We will use cosine similarity to find another user named v who’s preference is the most similar to u. We will then select x games from v‘s play history that has not been played by u and then recommend them

Here’s the new recommendation procedure in code form.

def cosine_score(user1, user2):
score = cosine_similarity(user1.values.reshape(1, -1), user2.values.reshape(1,-1))[0][0]
return score
def recommend_games(user_id, n=10):
'''
Given a user id, recommend games to that user. By default 10 games are recommended
'''
# Get user features
user = user_features_df.loc[user_id]
# Get games played by the user
play_list = user_play_list[user_id]
other_users = user_features_df[user_features_df.index != user_id]
scores = other_users.apply(lambda user2: cosine_score(user, user2), axis=1).sort_values(ascending=False)
rec_idx = 0
recommended_games = user_play_list[scores.index[rec_idx]]
recommended_games = list(filter((lambda gid: gid not in play_list), recommended_games))
while (len(recommended_games) < n):
rec_idx += 1
additional_games = user_play_list[scores.index[rec_idx]]
recommended_games.extend(list(filter((lambda gid: gid not in play_list), additional_games)))
return recommended_games[:n]

That’s all folks!

You can find the code for this post here. Until next time!