Choose the correct graph or chart style for the task you want your audience to accomplish
Published in · 13 min read · Sep 4, 2020
--
According to the World Economic Forum, the world produces 2.5 quintillion bytes of data every day. With so much data, it’s become increasingly difficult to manage and make sense of it all. It would be impossible for any person to wade through data line-by-line and see distinct patterns and make observations.
Data visualization is one of the data science processes; that is, a framework for approaching data science tasks. After data is collected, processed, and modeled, the relationships need to be visualized for the conclusions.
We use data visualization as a technique to communicate insights from data through visual representation. Our main goal is to distill large datasets into visual graphics to allow for a straightforward understanding of complex relationships within the data.
So now, we know data visualization can provide insight that traditional descriptive statistics cannot. Our big question is how to choose the right chart for the data?
This note will give us an overview of the different chart types. For each type of chart, we will introduce a short description. We then discuss when to use it and when we should avoid using it. Next, we will look at some Python code for implementation. I only present the primary principle; the full version will be provided at the end of this article.
I hope this note is interesting enough to pick up the slack. Let’s hop to it.
Before making a chart, it’s essential to understand why we need one. Graphs, plots, maps, and diagrams help people understand complex data, find patterns, identify trends, and tell stories. Think about the message we want to share with our audience. Here, I group the charts by their data visualization functions, that is, what we want our charts to communicate with our audience. While each chart’s allocation into specific functions isn’t a perfect system, it still works as a useful guide for selecting a chart based on our analysis or communication needs.
The first part of this note will introduce us to different charts to display the connection between variables, the trend over time, and the relative order of variables within category(ies)
Relationship
1. Scatter plot using Matplotlib
2. Marginal Histogram
3. Scatter plot using Seaborn
4. Pair Plot in Seaborn
5. Heat Map
Data over Time
6. Line Chart
7. Area Chart
8. Stack Area Chart
9. Area Chart Unstacked
Ranking
10. Vertical Bar Chart
11. Horizontal Bar Chart
12. Multi-set Bar Chart
13. Stack Bar Chart
14. Lollipop Chart
The second part of this note will introduce us to different chart types use to compare variables and their distribution.
Distribution
15. Histogram
16. Density Curve with Histogram
17. Density Plot
18. Box Plot
19. Strip Plot
20. Violin Plot
21. Population Pyramid
Comparisons
22. Bubble Chart
23. Bullet Chart
24. Pie Chart
25. Net Pie Chart
26. Donut Chart
27. TreeMap
28. Diverging Bar
29. Choropleth Map
30. Bubble Map
We use a relationship method to display a connection or correlation between two or more variables.
When assessing a relationship between data sets, we are trying to understand how two or more data sets combine and interact with each other.
This relationship is called correlation, and it can be positive or negative, meaning that the variables considered might be supportive or working against each other.
A scatter plot is a type of chart that is often used in statistics and data science. It consists of multiple data points plotted across two axes. Each variable depicted in a scatter plot would have various observations. It can be an advantageous chart type whenever we see any relationship between the two data sets.
We use a scatter plot to identify the data’s relationship with each variable (i.e., correlation or trend patterns.) It also helps in detecting outliers in the plot.
In machine learning, scatter plots are often used in regression, where x and y are continuous variables. They are also used in clustering scatters or outlier detection.
Scatter plots are not suitable if we are interested in observing time patterns.
A scatter plot is used with numerical data or numbers. So, if we have categories such as three divisions, five products, etc., a scatter plot would not reveal much.
Python Implementation
We use the Iris dataset for visualization.
plt.scatter(iris_df['sepal length (cm)'], iris_df['sepal width (cm)'])
plt.title('Scatterplot of Distribution of Sepal Length and Sepal Width', fontsize=15)
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
Marginal histograms are histograms added to the margin of each axis of a scatter plot for analyzing the distribution of each measure.
We use a marginal histogram to assess the relationship between two variables and examine their distributions. Putting marginal histograms in scatter plots or adding marginal bars on highlighted tables makes the visualization interactive, informative, and impressive.
Python Implementation
# A seaborn jointplot shows bivariate scatterplots and univariate histograms in the same figure
p = sns.jointplot(iris_df['sepal length (cm)'], iris_df['sepal width (cm)'], height=10)
Our goal here is to produce a legend to understand the differences between groups. We will use seaborn’s FacetGrid to color the scatterplot by species.
sns.FacetGrid(iris_df, hue=’species’, size=10) \
.map(plt.scatter, ‘sepal length (cm)’, ‘sepal width (cm)’) \
.add_legend()
plt.title(‘Scatterplot with Seaborn’, fontsize=15)
Another useful seaborn plot is pairplot
, which shows the bivariate relationship between each pair of features. From the pair plot, we’ll see that the Iris-setosa species is separated from the other two across all feature combinations.
sns.pairplot(iris_df.drop(“target”, axis=1), hue=”species”, height=3)
A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are useful for cross-examining multivariate data, through placing variables in rows and columns and coloring cells within the table.
All the rows are one category (labels displayed on the left side), and all the columns are another category (labels displayed on the bottom). The individual rows and columns are divided into the subcategories, which all match each other in a matrix. The cells within the table either contain color-coded categorical data or numerical data based on a color scale. Data in a cell demonstrates the relationship between two variables in the connecting row and column.
Heatmaps are useful for showing variance across multiple variables, revealing any patterns, displaying whether any variables are similar, and detecting any correlations between them.
Heatmap can be super useful when we want to see which intersections of the categorical values have a higher concentration of the data than others.
Heatmaps are better suited to displaying a more generalized view of numerical data. It is harder to accurately tell the differences between color shades and extract specific data points (unless we include the cells’ raw data).
Heatmaps can also show the changes in data over time if one of the rows or columns is set to time intervals. An example of this would be to use a heatmap to compare the temperature changes across the year in the city(ies), to see the hottest or coldest places. So the rows contain each month, the columns indicate hours, and the cells would have the temperature values.
Python Implementation
We use the World Happiness Report dataset from Kaggle. I cleaned the data and combined all files into the happiness_rank.csv
file. You can download and clean the data or download the final result here. I recommend you check out my data cleaning codes on Github.
sns.heatmap(happy[usecols].corr(),linewidths=0.25,
vmax=0.7,square=True,cmap="Blues",
linecolor='w',annot=True,annot_kws={"size":8},
mask=mask, cbar_kws={"shrink": .9})
Sometimes it isn’t enough to know that a relationship exists between variables; in some cases, better analysis is possible if we can also visualize when the relationship took place. Because relationships are denoted with links between variables, the date/time appears as a link property. This visualization method shows data over the period to find trends or changes over time.
Line charts are used to display quantitative values over a continuous interval or period.
Line charts are drawn by first plotting data points on a cartesian coordinate grid and then connecting them. Typically, the y-axis has a quantitative value, while the x-axis is a timescale or a sequence of intervals. The direction of the lines on the graph works as an excellent metaphor for the data: an upward slope indicates increasing values, and a downward slope indicates where values have decreased. The line’s journey across the graph can create patterns that reveal trends in a dataset.
Line charts are most frequently used to show trends and analyze how the data has changed over time.
Line charts are best for continuous data as it connects many variables that all belong to the same category.
When grouped with other lines or other data series, individual lines can be compared. However, we should avoid using more than four lines per graph, as this makes the chart more cluttered and harder to read. A solution to this is to split our chart into multiples subplots.
Python Implementation
Suppose we have a dataset containing information about Medium members. We want to see the trend of articles that have been read in 2019.
plt.plot(data['Month'], data['All Views'], color='#4870a0', marker='o')
The idea of an area chart is based on the line chart. The colored region shows us the development of a variable over time.
Area charts are ideal for clearly illustrating the magnitude of change between two or more data points. For example, the happiness score has six generating divisions; we would like to see each of these divisions’ contributions.
Moreover, if we are interested in the portion generated by each division and not that much of the total amount of the division self, we can use a 100% stacked area chart. This will show each division’s percentage contribution over time.
Area charts are not the best choice if we want to present fluctuating values, like the stock market or price changes.
Python Implementation
Here, we want to present an accumulative number of external views over time.
plt.stackplot(data['Month'], data['External Views'], colors='#7289da', alpha=0.8)
The idea of a stack area chart is based on the simple area charts. It displays the value of several groups on the same graphic. Values of each group are displayed on top of each other. The entire graph represents the total of all data plotted over time.
The stacked area chart type is a powerful chart as it allows grouping of data and seeing trends over a selected date range.
Stacked area charts use the areas to convey whole numbers, so they do not work for negative values.
Stacked area charts are colorful and fun, but we should use them with caution because they can quickly become a mess. We shouldn’t stack together more than five categories.
Python Implementation
plt.stackplot(data['Month'], data['Internal Views'], data['External Views'],
alpha=0.75,
colors=['#7289da','#f29fa9'],
labels=['Internal Views', 'External Views'])
Unlike a stack area chart, an area chart unstacked shows the overlap of several groups on the same graphic.
x = data['Internal Views']
y = data['External Views']# plot the data
ax.plot(x, color='#49a7c3', alpha=0.3, label='Internal Views')
ax.plot(y, color='#f04747', alpha=0.3, label='External Views')# fill the areas between the plots and the x axis
# this can create overlapping areas between lines
ax.fill_between(x.index, 0, x, color='blue', alpha=0.2)
ax.fill_between(x.index, 0, y, color='red', alpha=0.2)
A visualization method displays the relative order of data values.
Bar charts are among the most frequently used chart types. As the name suggests, a bar chart is composed of a series of bars illustrating a variable’s development.
There are four types of bar charts: horizontal bar chart, verticle bar chart, group bar chart, and stacked bar chart.
Bar charts are great when we want to track the development of one or two variables over time. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value.
A simple bar chart isn’t suitable when we have a single period breakdown of a variable. For example, if I want to portray the main business lines that contributed to a company’s revenues, I wouldn’t use a bar chart. Instead, I would create a pie chart or one of its variations.
Vertical bar charts (column chart) are distinguished from histograms, as they do not display continuous developments over an interval. Vertical bar chart’s discrete data is categorical and therefore answers the question of “how many?” in each category.
Vertical bar charts are typically used to compare several items in a specific range of values. So it is ideal for comparing a single category of data between individual sub-items, for example, corresponding revenue between regions.
Python Implementation
We use mpg_ggplot2 data frame. It is a rectangular collection of variables (in the columns) and observations (in the rows). mpg
contains observations collected by the US Environmental Protection Agency on 38 popular models of car.
Python Implementation
Here, we want to compare car models.
plt.bar(value_count.index, value_count.values, color='#49a7c3')
Horizontal bar charts represent the data horizontally. The data categories are shown on the y-axis, and the data values are shown on the x-axis. The length of each bar is equal to the value corresponding to the data category, and all bars go across from left to right.
Python Implementation
plt.barh(value_count.index, value_count.values, color='#b28eb2')
Also known as a Grouped Bar Chart or Clustered Bar Chart.
This variation of a bar chart is used when two or more data series are plotted side-by-side and grouped under categories, all on the same axis.
We use multi-set bar charts to compare grouped variables or categories to other groups with those same variables or category types.
The downside of group bar charts is that they become harder to read the more bars we have in one group.
Python Implementation
ax = views.plot.bar(rot=0,color='#E6E9ED',width=1, figsize=(14,8))
ax = df.plot.bar(rot=0, ax=ax, color=['#7289da', '#dd546e', '#99aab5', '#f3c366'],
width=0.8, figsize=(14,8))
Unlike a multi-set bar chart that displays their bars side-by-side, stacked bar charts segment their bars. Stacked bar charts are used to show how a broader category is divided into smaller categories and what the relationship of each part has on the total amount.
Stacked bar charts place each value for the segment after the previous one. The total value of the bar is all the segment values added together. It is ideal for comparing the total amounts across each group/segmented bar.
One major flaw of Stacked bar charts is that they become harder to read the more segments each bar has. Also, comparing each component to each other is difficult, as they are not aligned on a common baseline.
Python Implementation
rect1 = plt.bar(data['Month'] ,data['Internal Views'],
width=0.5, color='lightblue')
rect2 = plt.bar(data['Month'], data['External Views'],
width=0.5, color='#1f77b4')
Lollipop chart serves a similar purpose as an ordered bar chart in a visually pleasing way. We use lollipop charts to show the relationship between a numerical variable and another numerical or categorical variable.
The lollipop chart is often claimed to be useful compared to a standard bar chart if we are dealing with a large number of values and when values are all high, such as in the 80–90% range (out of 100%). Then a broad set of tall columns can be visually aggressive.
If our data has unsorted bars of very similar length — it is harder to compare the sizes of two very identical lollipops than standard bars.
Python Implementation
(markerline, stemlines, baseline) = plt.stem(value_count.index, value_count.values)
That’s it for the first part. The code is available on Github. We will continue with distributions and comparisons on part two.
So far, we know that data visualization is a quick, easy way to convey concepts universally — and we can experiment with different scenarios by making slight adjustments.
There are dozens of tools for data visualization and data analysis — these range from simple — zero codings required (Tableau) to complex — coding required (JaveScript). Not every tool is right for every person looking to learn visualization techniques, and not every tool can scale to industry or enterprise purposes.
My favorite professor told me that “Good data visualization theory and skills will transcend specific tools and products.” When we learn this skill, focus on best practices, and explore our style when it comes to visualizations and dashboards. Data visualization isn’t going away anytime soon, so it’s essential to build a foundation of analysis and storytelling, and exploration that you can carry with regardless of the tools or software you end up using.
If you want to dig deeper into this particular topic, here are some excellent places to start.
FAQs
Data Visualization: How to choose the right chart (Part 1)? ›
Selecting the right chart type
Ask yourself how many variables do you want to show, how many data points you want to display and how you want to scale your axis. Line, bar and column charts represent change over time. Pyramids and pie charts display parts-of-a-whole.
Selecting the right chart type
Ask yourself how many variables do you want to show, how many data points you want to display and how you want to scale your axis. Line, bar and column charts represent change over time. Pyramids and pie charts display parts-of-a-whole.
Bar charts and pie charts should be used to show part to whole relationships. Pie charts should only be used when there are less than six categories, otherwise use a bar chart or, if appropriate, combine categories.
Which chart is a good choice when showing part whole line chart? ›a Pie Chart. Pie charts are best to use when you are trying to compare parts of a whole. They do not show changes over time.
What is the best chart to show progress to goal? ›Best chart to show progress against a goal
If you want to compare how far you've come compared to how far you still need to go, reach for a progress bar. This is a simple chart with major storytelling potential, especially if that progress ends up exceeding original expectations.
The first step toward good data visualization is to identify the problem you're trying to solve. What vital strategic question are you going to answer? How will the information you're presenting provide real value to the company?
Which type of graph is best for data visualization? ›Bar charts are one of the most common data visualizations. You can use them to quickly compare data across categories, highlight differences, show trends and outliers, and reveal historical highs and lows at a glance. Bar charts are especially effective when you have data that can be split into multiple categories.
Which type of chart provides the best visual display? ›Scatterplot. Scatter plots are useful for showing precise, data dense visualizations, correlations, and clusters between two numeric variables.
What are the basics of data visualization? ›Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.
Which would you use to make chart 1 look like chart to select all that apply? ›The tool that can make chart 1 look like chart 2 is the data labels. Charts are graphical ways of representing data elements.
What is the best type of chart to show a pattern of data charts? ›
Best Use Cases for These Types of Charts:
Scatter plots are helpful in situations where you have too much data to quickly see a pattern. They are best when you use them to show relationships between two large data sets.
A useful chart is simple but not bland. It tells only one story and does it well. It speaks for itself and requires only minimal thinking to understand. Also, such charts use pleasant colors everyone can see.
What is the best way to graph parts of a whole? ›Bar charts and pie charts should be used to show part to whole relationships. Pie charts should only be used when there are less than six categories, otherwise use a bar chart or, if appropriate, combine categories.
When should you avoid line charts? ›And while there are few hard-and-fast rules when it comes to data visualization, one thing is for certain: line charts are not suitable for comparing multiple categories at one point in time for a single variable.
What chart for comparing parts of a whole? ›A pie chart is used to compare to a whole.
Which chart should have at least 3 stages of progress to show? ›In order to use a funnel chart, you should have at least three stages to plot.
Which activity chart will be better for planning? ›Gantt charts are useful for planning and scheduling projects. They help you assess how long a project should take, determine the resources needed, and plan the order in which you'll complete tasks.
Which type of chart would be best suited to the task? ›Line charts are among the most frequently used chart types. Use lines when you have a continuous data set. These are best suited for trend-based visualizations of data over a period of time, when the number of data points is very high (more than 20).
What is the first rule of data visualization? ›The first commandment of data visualization is to define or identify the purpose—i.e., what to visualize or show. Purpose comes from knowing your stakeholders and their objectives.
What three things are needed in order to have successful data visualization? ›Accurate: The visualization should accurately represent the data and its trends. Clear: Your visualization should be easy to understand. Empowering: The reader should know what action to take after viewing your visualization.
What are the 4 stages of data visualization? ›
These stages are exploration, analysis, synthesis, and presentation.
Which charts are most effective? ›- Line Chart.
- Scatter Plot.
- Bar Chart.
- Sunburst Chart.
- Sankey Chart.
- Radar Chart.
- Likert Scale Chart.
- Comparison Chart.
Question 1: Pie charts are less confusing than bar charts and should be your first attempt when creating a visual.
What is the best graph to compare two sets of data? ›A Dual Axis Line Chart is one of the best graph to compare two sets of data. The chart has a secondary y-axis to help you display insights into two varying data points. More so, it uses two axes to easily illustrate the relationships between two variables with different magnitudes and scales of measurement.
What are the 5 steps of Visualisation? ›The five phases of visualization process: data gathering, processing, preparation, reduction and visual layout design.
What are the five pillars of data visualization? ›Similar to the three pillars of DevOps Observability, I discovered that Data Observability can be split into five key pillars representing the health of your data, including freshness, distribution, volume, schema, and lineage.
What is the easiest data visualization tool to use? ›Some of the best data visualization tools include Google Charts, Tableau, Grafana, Chartist, FusionCharts, Datawrapper, Infogram, and ChartBlocks etc. These tools support a variety of visual styles, be simple and easy to use, and be capable of handling a large volume of data.
Which charts are used to emphasize different types of information in a chart? ›Combination charts let you display different types of data in different ways on the same chart. You may display columns, lines, areas, and steps all on the same combination chart. Use them to visually highlight the differences between different sets of data.
Which type of chart in Excel is most useful when comparing the parts of a whole? ›Area Charts can be used to plot the change over time and draw attention to the total value across a trend. By showing the sum of the plotted values, an area chart also shows the relationship of parts to a whole. To create an Area Chart, arrange the data in columns or rows on the worksheet.
Which types of charts can be created from the list? ›- Bar chart. In a bar chart, values are indicated by the length of bars, each of which corresponds with a measured group. ...
- Line chart. ...
- Scatter plot. ...
- Box plot. ...
- Histogram. ...
- Stacked bar chart. ...
- Grouped bar chart. ...
- Area chart.
What are the 3 main groups of chart patterns? ›
Chart patterns fall broadly into three categories: continuation patterns, reversal patterns and bilateral patterns.
What is the most important part of a chart? ›The most important part of your chart is the information, or data, it contains. Pie charts represent data as part of 100 (a percentage). Each slice represents a different piece of data.
What makes a chart bad? ›Graphs are often made misleading for advertising or other purposes, or even just by accident, by: • Leaving gaps/changing the scale in vertical axes • Uneven shading/colours • Unfair emphasis on some sections • Distorting areas in histograms (bar widths should always be equal - if you have different widths then the bar ...
How do I choose a chart style? ›- Click the chart that you want to format. This displays the Chart Tools, adding the Design, Layout, and Format tabs.
- On the Design tab, in the Chart Styles group, click the chart style that you want to use. Tip: To see all predefined chart styles, click More .
Visualisation methods that show part (or parts) of a variable to it's total. Often used to show how something is divided up.
What are the 7 parts of a graph that should be seen on all line graphs? ›- The Title. The title offers a short explanation of what is in your graph. ...
- The Legend. The legend tells what each line represents. ...
- The Source. The source explains where you found the information that is in your graph. ...
- Y-Axis. ...
- The Data. ...
- X-Axis.
The horizontal line is called the x-axis, and the vertical line is the y-axis. The point of intersection is called the point of origin. This is designated with the coordinates (0,0). The cartesian plane is the space where points are plotted, and graphs are drawn or created.
What is the one rule you should never break when making a chart? ›1: Don't break the y-axis scale
The one time you shouldn't break an axis is when making a bar or column chart because it breaks the relationship between the rectangle's dimensions and the data.
- 3D and other special effects. Using 3D and blow-apart effects can make your data hard to interpret. ...
- Odd scales. Another way to make your readers work is to challenge their assumptions. ...
- Information overload. ...
- Too many/non-contrasting colours. ...
- Skipping the text.
Line charts are ideal for showing changing time series as well as trends and developments over time. Bar charts are good for comparing size, especially on small screens. They are a good alternative to column charts when the data are not time series, or axis labels are long.
What type of chart is useful for comparing parts of? ›
A pie chart is a pictorial representation of the data in which the whole is represented by a circle and the parts, by non-intersecting adjacent sectors. Hence, comparison of parts of a whole may be done by a pie chart.
What is the best chart to use to show the relationship of parts to a whole like a pie chart but can contain more than one data series? ›Just like a pie chart, a doughnut chart shows the relationship of parts to a whole, but a doughnut chart can contain more than one data series. Each data series that you plot in a doughnut chart adds a ring to the chart.
What type of chart is good for single series of data? ›Pie charts show the size of items in one data series, proportional to the sum of the items. The data points in a pie chart are shown as a percentage of the whole pie. Consider using a pie chart when: You have only one data series.
What is key to selecting the chart type excel? ›Select a chart type: To create a Column or Bar chart (to compare values across a few categories), press C and then 1. To select the type of the Column or Bar chart, use the Down arrow key and the Right arrow key, and then press Enter. Tip: You can quickly insert a basic Bar chart in a worksheet.
Which chart should I use and why? ›Bar charts are good for comparisons, while line charts work better for trends. Scatter plot charts are good for relationships and distributions, but pie charts should be used only for simple compositions — never for comparisons or distributions.
What is the most widely used used chart in Excel? ›1. Excel Column Charts. One of the most common charts used in presentations, column charts are used to compare values to one another. Usually, these are values that have been categorized in some way.
What are the 4 most commonly used types of chart? ›The four most common are probably line graphs, bar graphs and histograms, pie charts, and Cartesian graphs. They are generally used for, and are best for, quite different things.
Which of the following is must for data visualization? ›Pie charts and Bar charts are considered data visualization methods. Data visualization method: It is a graphical method of presenting data. For this purpose, we use graphical elements like graphs, charts, maps, etc.
Which tab is used to choose a chart type? ›If you have already have a chart, but you just want to change its type: Select the chart, click the Design tab, and click Change Chart Type. Choose a new chart type in the Change Chart Type box.
Which tools would you use to make chart 1 look like chart to? ›Data Labels is the tool which is used to make chart 1 look like chart 2. When you select the chart and right-click, and select the option to add data labels, the required labels will be displayed.
Which chart type provides the best visual display? ›
Scatterplot. Scatter plots are useful for showing precise, data dense visualizations, correlations, and clusters between two numeric variables.
Why it is important to select the correct data when creating a chart? ›Why Is Important to Choose Right Chart for Your Presentation? We use charts to tell stories, evaluate alternatives, understand trends or find out if everything is normal, however, an incorrect charting choice can lead to poor judgment of the messages where as a correct chart can lead to right and faster decisions.