Improving the Video Game Recommender

In my last post, we took the 380+ game video game attributes that we extracted from the Steam video game dataset and wrote an algorithm to cluster the attributes into 24 groups. In this post we will use the clusters to make an new and improved video game recommender. If you haven’t read my first
and second post on the video game recommender, please read them before continuing.

Step 1: Reconstructing the game features table

The game features table that we built in the first post contained hundreds of attributes. We will construct a much smaller table using the game attribute clusters.

The each column in the table indicates the magnitude of that attribute present in each game. The numbers are computed for each game using the attribute-cluster assignments we obtained from K-means. For each game, we count up the number of attributes that belong to each cluster. The construction of the new game features table is described in the code snippet below.

	# Fit the K Means clustering algorithm get the cluster assignments for each attribute
	km = KMeans(n_clusters=24, random_state=25)
	km.fit(feature_set)
	labels = km.predict(feature_set)
	attribute_assignments = pd.Series(labels, index=game_categories)

	Group attributes into list of clusters
	attribute_clusters = []
	for i in range(24):
	cluster = attribute_assignments[attribute_assignments == i]
	attribute_clusters.append(cluster.index.tolist())

	feature_columns = ['clust_'+str(i) for i in range(25)]

	game_features = []
	for idx in range(steam_games_df.shape[0]):
	# Obtain list of genres, tags, and specs
	game_genre = steam_games_df.iloc[idx]['genres']
	game_tags = steam_games_df.iloc[idx]['tags']
	game_specs = steam_games_df.iloc[idx]['specs']
	attributes = []

	data_row = {k:0 for k in feature_columns}
	data_row['id'] = steam_games_df.iloc[idx]['id']

	# Iterate through each entry in the lists and create the features
	if game_genre:
	attributes.extend(game_genre.split(','))
	if game_tags:
	attributes.extend(game_tags.split(','))
	if game_specs:
	attributes.extend(game_specs.split(','))

	attributes = set(attributes)

	if len(attributes) > 0:
	for attr in attributes:
	for i in range(len(attribute_clusters)):
	if attr in attribute_clusters[i]:
	data_row['clust_'+str(i)] += 1
	else:
	data_row['clust_24'] += 1
	game_features.append(data_row)

	game_features_df = pd.DataFrame(game_features)
	game_features_df = game_features_df.set_index('id')

view raw rebuild_game_features_table.py hosted with ❤ by GitHub

Step 2: Reconstructing the user features table

Using the game features table we built in step 1, we will rebuild a much smaller user features table.

Similar to the game features table, each column in the user features table indicates the degree each user prefers a game with that attribute. The numbers are computed for each user by retrieving the features for each game played by the user from the game features table and adding them up.

	game_feat_dict = game_features_df.to_dict()

	# Read user items data file and build features table
	with open('/content/drive/MyDrive/VideoGameRecFiles/australian_users_items.json','r',encoding='utf8') as f:
	data = f.read()
	data = data.strip().split("\n")
	user_features = []
	# We need to keep track of all the games each user played so we can avoid recommending games that they have already played.
	user_play_list = {}
	for user_data in data:
	# The stdataset is not a properly formatted json file. Because of this we need to iterate through each individual JSON object and use
	# the ast module to parse the object.
	record = ast.literal_eval(user_data)
	data_row = {k:0 for k in feature_columns}
	data_row['user_id'] = record['user_id']
	play_list = []

	for item in record['items']:
	item_id = item['item_id']
	play_list.append(item_id)
	for col in feature_columns:
	if item_id in game_feat_dict[col]:
	data_row[col] += game_feat_dict[col][item_id]

	user_play_list[record['user_id']] = play_list
	user_features.append(data_row)

	user_features_df = pd.DataFrame(user_features)
	user_features_df = user_features_df.set_index("user_id").drop_duplicates()

view raw rebuild_user_features_table.py hosted with ❤ by GitHub

Step 3: Defining a new recommender function

With our new game and user feature tables in place a new method for examining similarity between users is in order. In our first recommender, we used the matching dissimilarity score. In our new recommender, we’re going to use cosine similarity. Cosine similarity is a measure of similarity between two numerical vectors. It is the dot product between two vectors divided by the product of their lengths.

Let’s suppose that we have a user that we want to generate recommendations for. We’ll call the user in question u and the number of recommendations we’d like to generate x. We will use cosine similarity to find another user named v who’s preference is the most similar to u. We will then select x games from v‘s play history that has not been played by u and then recommend them

Here’s the new recommendation procedure in code form.

	def cosine_score(user1, user2):
	score = cosine_similarity(user1.values.reshape(1, -1), user2.values.reshape(1,-1))[0][0]
	return score

	def recommend_games(user_id, n=10):
	'''
	Given a user id, recommend games to that user. By default 10 games are recommended
	'''

	# Get user features
	user = user_features_df.loc[user_id]
	# Get games played by the user
	play_list = user_play_list[user_id]
	other_users = user_features_df[user_features_df.index != user_id]

	scores = other_users.apply(lambda user2: cosine_score(user, user2), axis=1).sort_values(ascending=False)

	rec_idx = 0
	recommended_games = user_play_list[scores.index[rec_idx]]
	recommended_games = list(filter((lambda gid: gid not in play_list), recommended_games))

	while (len(recommended_games) < n):
	rec_idx += 1
	additional_games = user_play_list[scores.index[rec_idx]]
	recommended_games.extend(list(filter((lambda gid: gid not in play_list), additional_games)))

	return recommended_games[:n]

view raw updated_recommender_function.py hosted with ❤ by GitHub

That’s all folks!

You can find the code for this post here. Until next time!

Improving the Video Game Recommender

Step 1: Reconstructing the game features table

Step 2: Reconstructing the user features table

Step 3: Defining a new recommender function

That’s all folks!

More posts

BG/BB CTLV Modeling for Charities

Gamma Gamma Bills Y’all!

Introducing BG/NBD Models

Buy Till You Die Modeling with Pareto/NBD