Figure 1 — Scatter plot of Health vs Pokemon Rank: The Pokemon’s evolution stage
in shown by the different style of marker and color. The Pokemon’s speed
determines the size of the marker. Pokemon’s with larger speeds have larger markers.
Hover your mouse over a point to see the corresponding Pokemon.
Things I do on a Friday night
Friday 5:00PM: I have nothing better to do and my new blog needs
content so I guess I will revisit a Pokemon dataset I got hold of last year.
Actually, it was with this dataset that I wrote my first blog related
to data visualization. I spent like 3 days collecting the data, processing it,
making graphs, and writing the whole damn thing. It’s been over a year and
the Pokemon blog I wrote in Medium has barely gained any views. However,
I’ve learned so many tricks since then so I figured why not create a cool
graph with this data.
Two hours later: Well I could not find the original Pokemon dataset I used
before so I had to find three different data sets and compile them into one.
One dataset contained all the Pokemon’s stats but was missing the rank and
evolutionary stage. So I had to dig around the web to find the missing information.
Friday 9:00PM: After I compiled all the information I took a break. Now
I am realizing that seaborn
and mpld3
don’t like each other very well.
The whole point of this article is to create a scatter plot that is interactive
and shows a Pokemon image as you hover over the points. I want to avoid using
matplotlib
because it will take too much work to make a nice looking
plot where I can control the color, marker size, and marker type based on
a Pokemon’s stats. There has to be a way to do it.
Friday 11:00PM I have been digging deeper and deeper into the net on how
to make this work. I have found multiple examples on how mpld3
and maplotlib
are compatible and offer the feature I want when used together. As a matter
of fact, you can use only matplotlib
to create a matplotlib object
that shows an image as you hover a point. That’s great but since
I want to embed the scatter plot on my webpage, I need to save the matplotlib
figure as an html file. It turns out that when you save an matplotlib figure
as html it loses some of it’s functionalities. Because of that, I looked
into plotly
the obvious choice to make interactive figures; however,
plotly
does not innately allows you to have images pop out as you hover
over a point. This is a feature that has been requested since 2016 but remains
to be added.
Friday 11:55PM: Well that took much longer than I expected but here we go.
Go hover your mouse over a point in the figure above to see a cool trick.
To make this graph, I had to do a bunch of maneuvers. The main packages
that I used are pandas
, seaborn
, matplotlib
, and mpld3
. Making
seaborn
and mpld3
to work correctly was a bit of a challenge. This is the
line that did the trick for me:
1
|
`plugins.connect(fig, mpld3.plugins.PointHTMLTooltip(ax.get_children()[0], labels))`
|
I will just paste the Python source code here and you can figure out the rest.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
|
###############################################################################
# 1. Importing Libraries #
###############################################################################
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from mpld3 import fig_to_html, plugins
import mpld3
import io
import requests
###############################################################################
# 2. Helper Functions #
###############################################################################
def reorder_data(data, rank, stage):
"""Sorts the data df based on the rank df"""
ordered_pokemon = list(rank['Pokemon'].values)
unordered_pokemon = list(data['name'].values)
# Find order
ordered_indices = [unordered_pokemon.index(pokemon) for pokemon in ordered_pokemon]
# Reoder data
df = data.reindex(ordered_indices)
df['rank'] = rank['Rank'].values
# Get stages
name = [name.strip() for name in stage['name'].values]
stage['name'] = name
keep = [line in ordered_pokemon for line in stage['name']]
stage = stage[keep]
stage = stage.drop_duplicates()
stage = stage.reset_index(drop = True)
unordered_pokemon_stage = list(stage['name'].values)
# Get stages sorted
# Find order
ordered_indices = [unordered_pokemon_stage.index(pokemon) for pokemon in ordered_pokemon]
# Reset index
stage = stage.reindex(ordered_indices)
# Add stage
df['stage'] = stage['stage'].values
return df
def get_labels(pokemons, data):
"connects labels to images"
temp_data = pokemons[['name', 'rank']]
ordered_pokemon = list(temp_data['name'].values)
unordered_pokemon = list(data['name'].values)
# Find order
ordered_indices = [unordered_pokemon.index(pokemon)+1 for pokemon in ordered_pokemon]
# Create empty list
labels = []
# Create tags
for ii in ordered_indices:
raw_path = f"https://raw.githubusercontent.com/frank-ceballos/frank-blog/master/content/posts/04PokemonGraph/images/main_sprites/{ii}.png"
temp_tag = f'<img src="{raw_path}" alt="image name">'
labels.append(temp_tag)
return labels
###############################################################################
# 3. Create Dataset #
###############################################################################
# Define urls
url_data = "https://raw.githubusercontent.com/frank-ceballos/frank-blog/master/content/posts/04PokemonGraph/data/pokemon_data.csv"
url_rank = "https://raw.githubusercontent.com/frank-ceballos/frank-blog/master/content/posts/04PokemonGraph/data/pokemon_rank.csv"
url_stage = "https://raw.githubusercontent.com/frank-ceballos/frank-blog/master/content/posts/04PokemonGraph/data/stages.csv"
# Get pokemon data
s=requests.get(url_data).content
data = pd.read_csv(io.StringIO(s.decode('utf-8')))
# Get rank data
s=requests.get(url_rank).content
rank = pd.read_csv(io.StringIO(s.decode('utf-8')))
# Get stage data
s=requests.get(url_stage).content
stage = pd.read_csv(io.StringIO(s.decode('utf-8')))
# Reorderdata based on rank
pokemons = reorder_data(data, rank, stage)
# Process data
pokemons = pokemons.loc[pokemons['generation'] == 1]
# Drop features
features_to_remove = ['japanese_name', 'percentage_male', 'height_m', 'weight_kg', 'classfication', 'abilities', 'type2']
pokemons = pokemons.drop(features_to_remove, axis = 1)
###############################################################################
# 3. Create Graph #
###############################################################################
# Set seaborn enviroment and font size
sns.set(font_scale = 1.5)
sns.set_style({"axes.facecolor": "0.95", "axes.edgecolor": "1", "grid.color": "1",
"grid.linestyle": "-", 'axes.labelcolor': '0', "xtick.color": "1",
'ytick.color': '1', 'axes.spines.left': True,
'axes.spines.bottom': True,
'axes.spines.right': True,
'axes.spines.top': True})
# Define color palette
color_palette = ['#e53d00', '#00cc66', '#ffb400']
# Create figure
fig, ax = plt.subplots(figsize=(12,9))
# Create scatterplot
sns.scatterplot(x = 'rank', y = 'hp' , hue = 'stage', style = 'stage',
label = None, size = 'speed', sizes=(50, 400),
palette = color_palette, legend = False, data = pokemons,
ax = ax)
# Change axis labels
plt.xlabel('Pokemon Rank')
plt.ylabel('Health (HP)')
# Change the x-axis range
plt.xlim(-1, 155)
# Manually create a legend since mpld3 cant render the sns.scatterplot legend
ax.plot([], [], "o", color=color_palette[0] , label="Basic")
ax.plot([], [], "x", color=color_palette[2] , label="Stage 1")
ax.plot([], [], "o", color=color_palette[1] , label="Stage 2")
ax.legend(title="Evolutionary Stage", loc="best", framealpha=1, fontsize = 'medium',
markerscale = 2, facecolor = 'white')
# Tight layout
plt.tight_layout()
# Create labels for points
labels = get_labels(pokemons, data)
# Connect sns and mpld3
plugins.connect(fig, mpld3.plugins.PointHTMLTooltip(ax.get_children()[0], labels))
# Save figure
file_name = 'pokemon_graph.html'
mpld3.save_html(fig, file_name)
|
Until next time, take care, and code everyday!