Rei Sanchez-Arias, Ph.D.
Using colors to distinguish, represent data, and highlight
Color is an important tool in designing visualizations. It allows you to encode another variable or split the data into groups.
However, there are a lot of issues you need to consider when choosing which colors to use and how to apply them.
Color is an important tool in visualizations and it is important to use it appropriately to have the largest impact.
Consider the following plot of GDP vs life expectancy
GDP per capita vs life expectancy across countries in 2010.
You see a general upward trend.
GDP vs life expectancy, now colored by region. You can easily see where countries in different regions group together.
Much more information was provided by adding color provided:
By coloring the region the countries belong to, we can see how the countries are distributed in the world on this plot.
The low GDP countries are almost all in sub-Saharan Africa, and on the high end you see Europe and North America.
The orange in between is the Commonwealth of Independent States, former Soviet states
A color palette is the range of colors used to encode the data values.
You will have different sorts of palettes for quantitative and qualitative data.
Choosing the correct palette is extremely important.
A refreshed color palette for charts in R 4.0.0
https://blog.revolutionanalytics.com/2020/04/r-400-is-released.html
The default color palette in visualization software such as MATLAB
and Python
's matplotlib
library used to be jet (fortunately, both have updated to new palettes).
You might also know this as the rainbow palette. The jet palette goes from a dark blue to a dark red, shifting to green and yellow along the way.
A Better Default Colormap for Matplotlib | SciPy 2015 | Nathaniel Smith and Stéfan van der Walt introducing viridis
We show the palette below both in color and converted to gray-scale to show the luminance along the spectrum.
Check also: https://www.youtube.com/watch?v=XjHzLUnHeM0
The jet/rainbow palette is flawed because the luminance does not transition smoothly from one end to the other. The yellow is much brighter than the rest of the colors which can make some data seem more important than it really is.
not a smooth gradient
Reddit post from 7 years ago
https://www.reddit.com/r/matlab/comments/1jqk8t/you_should_never_use_the_default_colors_in_matlab/
You can see that the short cyan and yellow regions are much stronger perceptually than the other regions. Data that falls into these regions will be overemphasized.
Looking at the gray-scale versions, it's apparent that the cyan and yellow regions pop out due to their greater luminance compared to the rest of the palette. This happens due to the way our brains perceive color.
Our eyes are more sensitive to green than red, and more sensitive to red than blue.
You can see how jet distorts perceptual importance in the example below.
The yellow and cyan regions are much brighter and eye catching than the red and blue regions which are actually the interesting (extreme) parts.
Our brains are interpreting higher luminance as more important. You can see this in the gray-scale versions where there are luminance spikes at the edges where it's colored yellow and cyan.
We could use a palette that is linear in intensity with extremes while providing a smooth transition between them. Here we use a diverging palette going from red to light yellow to green.
This palette has smooth transitions between the positive and negative regions.
With this color palette, the transitions between the bands are smooth and the red and green regions have equal luminance.
There are two basic types of linear luminance palettes, sequential and diverging.
Sequential palettes have a smooth transition from light to dark or dark to light. These are great for continuous data that is all positive so low values are light and high values are dark (or the other way around if you prefer).
Here's an example of a sequential palette going from light to dark red
You can also use palettes that shift hues as well as luminance:
Sequential palettes applied to two Gaussian blobs
When you work with data that has some breakpoint, values that go from negative to positive for example, it is typically best to use a diverging palette.
Diverging palettes transition from one color to another, passing through a light (or dark) color with the luminance shifting linearly through the palette.
Here is what it looks like applied to the same Gaussians as before, but one is negative. The jet palette is also included so you can see how the cyan and yellow make rings around the blobs, while in the other palettes it is a smooth transition between them.
For qualitative data, you are often comparing data from different groups or categories. For this you need to choose colors that are as visually separate as possible.
I want hue
is a great tool for building optimally distinct palettes
We show a scatter plot of the sepal lengths and petal widths of a sample of irises.
By coloring the species different, you can easily see the data falls into three fairly distinct clusters.
From Andrew Heiss
From Andrew Heiss
Built using the albersusa
, tidycensus
and sf
packages
Humans perceive color through signals produced by cells in the retina called cones.
Light comes into the eye, hits the cones, and the cones send off electrical signals to the brain. There are (typically) three types of cones: short (S), medium (M), and long (L). They are sensitive to different frequencies (colors) of light.
Short cones prefer blue, medium prefer green, and long prefer red.
Around 10% of men and 1% of women have mutations that affect these cones and produce what is known as color blindness.
The most common form is red-green color blindness, typically caused by the medium cones shifting sensitivity towards red light, a mutation called deuteranomaly.
People with deuteranomaly cannot distinguish between red and green, as shown in this image of a red and a green apple.
The bottom row is how you would see these two apples if you had red-green color blindness
Materials from: "Fundamentals of Data Visualization" by Claus O. Wilke
From: "Fundamentals of Data Visualization" by Claus O. Wilke
We use color to distinguish discrete items or groups that do not have an intrinsic order, such as different countries on a map or different manufacturers of a certain product.
In this case, we use a qualitative color scale. Such a scale contains a finite set of specific colors that are chosen to look clearly distinct from each other while also being equivalent to each other. The second condition requires that no one color should stand out relative to the others.
The colors should not create the impression of an order, as would be the case with a sequence of colors that get successively lighter.
From: "Fundamentals of Data Visualization" by Claus O. Wilke
Color can also be used to represent data values, such as income, temperature, or speed. In this case, we use a sequential color scale.
Such a scale contains a sequence of colors that clearly indicate
which values are larger or smaller than which other ones
how distant two specific values are from each other (the color scale needs to vary uniformly across its entire range).
Sequential scales can be based on a single hue (e.g., from dark blue to light blue) or on multiple hues (e.g., from dark red to light yellow).
The ColorBrewer Blues
scale is a monochromatic scale that varies from dark to light blue. The Heat
and Viridis
scales are multi-hue scales that vary from dark red to light yellow and from dark blue via green to light yellow, respectively.
Representing data values as colors is particularly useful when we want to show how the data values vary across geographic regions. We can draw a map of the geographic regions and color them by the data values. Such maps are called choropleths.
Diverging scales can be thought of as two sequential scales stitched together at a common midpoint color. Common color choices for diverging scales include brown to greenish blue, pink to yellow-green, and blue to red.
Color can also be an effective tool to highlight specific elements in the data. There may be specific categories or values in the dataset that carry key information about the story we want to tell, and we can strengthen the story by emphasizing the relevant figure elements to the reader.
An easy way to achieve this emphasis is to color these figure elements in a color or set of colors that vividly stand out against the rest of the figure.
This effect can be achieved with accent color scales, which are color scales that contain both a set of subdued colors and a matching set of stronger, darker, and/or more saturated colors
When working with accent colors, it is critical that the baseline colors do not compete for attention.
Stephen Few has a good article, "Practical Rules for Using Color in Charts", outlining practical rules for using color correctly
If you want different objects of the same color in a table or graph to look the same, make sure that the background is consistent.
If you want objects in a table or graph to be easily seen, use a background color that contrasts sufficiently with the object.
Stephen Few has a good article, "Practical Rules for Using Color in Charts", outlining practical rules for using color correctly
If you want different objects of the same color in a table or graph to look the same, make sure that the background is consistent.
If you want objects in a table or graph to be easily seen, use a background color that contrasts sufficiently with the object.
Use color only when needed to serve a particular communication goal.
Use different colors only when they correspond to differences of meaning in the data.
Use soft, natural colors to display most information and bright and/or dark colors to highlight information that requires greater attention.
When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values.
Use soft, natural colors to display most information and bright and/or dark colors to highlight information that requires greater attention.
When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values.
Non-data components of tables and graphs should be displayed just visibly enough to perform their role, but no more so, for excessive salience could cause them to distract attention from the data.
To guarantee that most people who are colorblind can distinguish groups of data that are color coded, avoid using a combination of red and green in the same display.
From: "Data Visualization" by Kieran Healy
Choose a color palette based on its ability to express the data you are plotting.
An unordered categorical variable like Country
or Sex
, requires distinct colors that will not be easily confused with one another.
An ordered categorical variable like Level of Education
requires a graded color scheme of some kind running from less to more or earlier to later.
If your variable is ordered, is your scale centered on a neutral midpoint with departures to extremes in each direction, as in a Likert scale?
In general, the default color palettes that ggplot
makes available are well-chosen for their perceptual properties and aesthetic qualities. We can also use color and color layers as device for emphasis, to highlight particular data points or parts of the plot, perhaps in conjunction with other features.
We choose color palettes for mappings through one of the scale_
functions for color
or fill
.
You can use the RColorBrewer
package to make a wide range of named color palettes available to you, and choose from those. When used in conjunction with ggplot
, you access these colors by specifying the scale_color_brewer()
or scale_fill_brewer()
functions, depending on the aesthetic you are mapping.
RColorBrewer
Diverging palettes
Sequential palettes.
Qualitative palettes
RColorBrewer
Examplelibrary(tidyverse)p <- ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Sepal.Width, color = Species))
RColorBrewer
Example (2)p + geom_point(size = 2) + scale_color_brewer(palette = "Set2") + theme(legend.position = "top")
Available qualitative palettes:
Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3
RColorBrewer
Example (3)p + geom_point(size = 2) + scale_color_brewer(palette = "Pastel2") + theme(legend.position = "top")
RColorBrewer
Example (4)p + geom_point(size = 2) + scale_color_brewer(palette = "Dark2") + theme(legend.position = "top")
You can also specify colors manually, via scale_color_manual()
or scale_fill_manual()
. These functions take a value argument that can be specified as vector of color names or color values that R knows about.
R knows many color names (like red
, and green
, and cornflowerblue
). Try running demo('colors')
in the console for an overview.
Alternatively, color values can be specified via their hexadecimal RGB value.
A way of encoding color values in the RGB color-space, where each channel can take a value from 0 to 255 like this. A color hex value begins with a hash or pound character, #
, followed by three pairs of hexadecimal or "hex" numbers.
Hex values are in Base 16, with the first six letters of the alphabet standing for the numbers 10 to 15. This allows a two-character hex number to range from 0 to 255.
You read them as #rrggbb
, where rr
is the two-digit hex code for the red channel, gg
for the green channel, and bb
for the blue channel.
So #CC55DD
translates in decimal to CC = 204
(red), 55 = 85
(green), and DD = 221
(blue).
Introduce a palette that is friendly to color-blind viewers:
cb_palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
p + geom_point(size = 2) + scale_color_manual(values = cb_palette) + theme(legend.position = "top")
5 tips on designing colorblind-friendly visualizations
https://www.tableau.com/about/blog/2016/4/examining-data-viz-rules-dont-use-red-green-together-53463
R Graphics Cookbook, 2nd edition Winston Chang
Robert Simmon has a great series of blog posts on Subleties of Color
Paletton's Colorpedia is an encyclopedia of colors.
I want hue to generate and refine palettes of optimally distinct colors.
viridis: Perceptually uniform color scales.
ColorBrewer: Sequential, diverging, and qualitative color palettes that take accessibility into account.
Colorgorical: Create color palettes based on mathematical rules for perceptual distance.
Photochrome: Word-based color palettes.
Colours Cafe: Instagram account for colours inspiration
Vischeck: Simulate how your images look for people with different forms of colorblindness (web-based)
Coolors.co: hundreds of color palettes and beautiful color schemes.
scales::show_col(c("#532D8E", "#B1B3B5", "#5CB8B2", "#AF95D3", "#202C61", "#FFE9E9", "#EBC1F4", "#AC62D2"), ncol = 8)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Rei Sanchez-Arias, Ph.D.
Using colors to distinguish, represent data, and highlight
Color is an important tool in designing visualizations. It allows you to encode another variable or split the data into groups.
However, there are a lot of issues you need to consider when choosing which colors to use and how to apply them.
Color is an important tool in visualizations and it is important to use it appropriately to have the largest impact.
Consider the following plot of GDP vs life expectancy
GDP per capita vs life expectancy across countries in 2010.
You see a general upward trend.
GDP vs life expectancy, now colored by region. You can easily see where countries in different regions group together.
Much more information was provided by adding color provided:
By coloring the region the countries belong to, we can see how the countries are distributed in the world on this plot.
The low GDP countries are almost all in sub-Saharan Africa, and on the high end you see Europe and North America.
The orange in between is the Commonwealth of Independent States, former Soviet states
A color palette is the range of colors used to encode the data values.
You will have different sorts of palettes for quantitative and qualitative data.
Choosing the correct palette is extremely important.
A refreshed color palette for charts in R 4.0.0
https://blog.revolutionanalytics.com/2020/04/r-400-is-released.html
The default color palette in visualization software such as MATLAB
and Python
's matplotlib
library used to be jet (fortunately, both have updated to new palettes).
You might also know this as the rainbow palette. The jet palette goes from a dark blue to a dark red, shifting to green and yellow along the way.
A Better Default Colormap for Matplotlib | SciPy 2015 | Nathaniel Smith and Stéfan van der Walt introducing viridis
We show the palette below both in color and converted to gray-scale to show the luminance along the spectrum.
Check also: https://www.youtube.com/watch?v=XjHzLUnHeM0
The jet/rainbow palette is flawed because the luminance does not transition smoothly from one end to the other. The yellow is much brighter than the rest of the colors which can make some data seem more important than it really is.
not a smooth gradient
Reddit post from 7 years ago
https://www.reddit.com/r/matlab/comments/1jqk8t/you_should_never_use_the_default_colors_in_matlab/
You can see that the short cyan and yellow regions are much stronger perceptually than the other regions. Data that falls into these regions will be overemphasized.
Looking at the gray-scale versions, it's apparent that the cyan and yellow regions pop out due to their greater luminance compared to the rest of the palette. This happens due to the way our brains perceive color.
Our eyes are more sensitive to green than red, and more sensitive to red than blue.
You can see how jet distorts perceptual importance in the example below.
The yellow and cyan regions are much brighter and eye catching than the red and blue regions which are actually the interesting (extreme) parts.
Our brains are interpreting higher luminance as more important. You can see this in the gray-scale versions where there are luminance spikes at the edges where it's colored yellow and cyan.
We could use a palette that is linear in intensity with extremes while providing a smooth transition between them. Here we use a diverging palette going from red to light yellow to green.
This palette has smooth transitions between the positive and negative regions.
With this color palette, the transitions between the bands are smooth and the red and green regions have equal luminance.
There are two basic types of linear luminance palettes, sequential and diverging.
Sequential palettes have a smooth transition from light to dark or dark to light. These are great for continuous data that is all positive so low values are light and high values are dark (or the other way around if you prefer).
Here's an example of a sequential palette going from light to dark red
You can also use palettes that shift hues as well as luminance:
Sequential palettes applied to two Gaussian blobs
When you work with data that has some breakpoint, values that go from negative to positive for example, it is typically best to use a diverging palette.
Diverging palettes transition from one color to another, passing through a light (or dark) color with the luminance shifting linearly through the palette.
Here is what it looks like applied to the same Gaussians as before, but one is negative. The jet palette is also included so you can see how the cyan and yellow make rings around the blobs, while in the other palettes it is a smooth transition between them.
For qualitative data, you are often comparing data from different groups or categories. For this you need to choose colors that are as visually separate as possible.
I want hue
is a great tool for building optimally distinct palettes
We show a scatter plot of the sepal lengths and petal widths of a sample of irises.
By coloring the species different, you can easily see the data falls into three fairly distinct clusters.
From Andrew Heiss
From Andrew Heiss
Built using the albersusa
, tidycensus
and sf
packages
Humans perceive color through signals produced by cells in the retina called cones.
Light comes into the eye, hits the cones, and the cones send off electrical signals to the brain. There are (typically) three types of cones: short (S), medium (M), and long (L). They are sensitive to different frequencies (colors) of light.
Short cones prefer blue, medium prefer green, and long prefer red.
Around 10% of men and 1% of women have mutations that affect these cones and produce what is known as color blindness.
The most common form is red-green color blindness, typically caused by the medium cones shifting sensitivity towards red light, a mutation called deuteranomaly.
People with deuteranomaly cannot distinguish between red and green, as shown in this image of a red and a green apple.
The bottom row is how you would see these two apples if you had red-green color blindness
Materials from: "Fundamentals of Data Visualization" by Claus O. Wilke
From: "Fundamentals of Data Visualization" by Claus O. Wilke
We use color to distinguish discrete items or groups that do not have an intrinsic order, such as different countries on a map or different manufacturers of a certain product.
In this case, we use a qualitative color scale. Such a scale contains a finite set of specific colors that are chosen to look clearly distinct from each other while also being equivalent to each other. The second condition requires that no one color should stand out relative to the others.
The colors should not create the impression of an order, as would be the case with a sequence of colors that get successively lighter.
From: "Fundamentals of Data Visualization" by Claus O. Wilke
Color can also be used to represent data values, such as income, temperature, or speed. In this case, we use a sequential color scale.
Such a scale contains a sequence of colors that clearly indicate
which values are larger or smaller than which other ones
how distant two specific values are from each other (the color scale needs to vary uniformly across its entire range).
Sequential scales can be based on a single hue (e.g., from dark blue to light blue) or on multiple hues (e.g., from dark red to light yellow).
The ColorBrewer Blues
scale is a monochromatic scale that varies from dark to light blue. The Heat
and Viridis
scales are multi-hue scales that vary from dark red to light yellow and from dark blue via green to light yellow, respectively.
Representing data values as colors is particularly useful when we want to show how the data values vary across geographic regions. We can draw a map of the geographic regions and color them by the data values. Such maps are called choropleths.
Diverging scales can be thought of as two sequential scales stitched together at a common midpoint color. Common color choices for diverging scales include brown to greenish blue, pink to yellow-green, and blue to red.
Color can also be an effective tool to highlight specific elements in the data. There may be specific categories or values in the dataset that carry key information about the story we want to tell, and we can strengthen the story by emphasizing the relevant figure elements to the reader.
An easy way to achieve this emphasis is to color these figure elements in a color or set of colors that vividly stand out against the rest of the figure.
This effect can be achieved with accent color scales, which are color scales that contain both a set of subdued colors and a matching set of stronger, darker, and/or more saturated colors
When working with accent colors, it is critical that the baseline colors do not compete for attention.
Stephen Few has a good article, "Practical Rules for Using Color in Charts", outlining practical rules for using color correctly
If you want different objects of the same color in a table or graph to look the same, make sure that the background is consistent.
If you want objects in a table or graph to be easily seen, use a background color that contrasts sufficiently with the object.
Stephen Few has a good article, "Practical Rules for Using Color in Charts", outlining practical rules for using color correctly
If you want different objects of the same color in a table or graph to look the same, make sure that the background is consistent.
If you want objects in a table or graph to be easily seen, use a background color that contrasts sufficiently with the object.
Use color only when needed to serve a particular communication goal.
Use different colors only when they correspond to differences of meaning in the data.
Use soft, natural colors to display most information and bright and/or dark colors to highlight information that requires greater attention.
When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values.
Use soft, natural colors to display most information and bright and/or dark colors to highlight information that requires greater attention.
When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values.
Non-data components of tables and graphs should be displayed just visibly enough to perform their role, but no more so, for excessive salience could cause them to distract attention from the data.
To guarantee that most people who are colorblind can distinguish groups of data that are color coded, avoid using a combination of red and green in the same display.
From: "Data Visualization" by Kieran Healy
Choose a color palette based on its ability to express the data you are plotting.
An unordered categorical variable like Country
or Sex
, requires distinct colors that will not be easily confused with one another.
An ordered categorical variable like Level of Education
requires a graded color scheme of some kind running from less to more or earlier to later.
If your variable is ordered, is your scale centered on a neutral midpoint with departures to extremes in each direction, as in a Likert scale?
In general, the default color palettes that ggplot
makes available are well-chosen for their perceptual properties and aesthetic qualities. We can also use color and color layers as device for emphasis, to highlight particular data points or parts of the plot, perhaps in conjunction with other features.
We choose color palettes for mappings through one of the scale_
functions for color
or fill
.
You can use the RColorBrewer
package to make a wide range of named color palettes available to you, and choose from those. When used in conjunction with ggplot
, you access these colors by specifying the scale_color_brewer()
or scale_fill_brewer()
functions, depending on the aesthetic you are mapping.
RColorBrewer
Diverging palettes
Sequential palettes.
Qualitative palettes
RColorBrewer
Examplelibrary(tidyverse)p <- ggplot(data = iris, mapping = aes(x = Sepal.Length, y = Sepal.Width, color = Species))
RColorBrewer
Example (2)p + geom_point(size = 2) + scale_color_brewer(palette = "Set2") + theme(legend.position = "top")
Available qualitative palettes:
Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3
RColorBrewer
Example (3)p + geom_point(size = 2) + scale_color_brewer(palette = "Pastel2") + theme(legend.position = "top")
RColorBrewer
Example (4)p + geom_point(size = 2) + scale_color_brewer(palette = "Dark2") + theme(legend.position = "top")
You can also specify colors manually, via scale_color_manual()
or scale_fill_manual()
. These functions take a value argument that can be specified as vector of color names or color values that R knows about.
R knows many color names (like red
, and green
, and cornflowerblue
). Try running demo('colors')
in the console for an overview.
Alternatively, color values can be specified via their hexadecimal RGB value.
A way of encoding color values in the RGB color-space, where each channel can take a value from 0 to 255 like this. A color hex value begins with a hash or pound character, #
, followed by three pairs of hexadecimal or "hex" numbers.
Hex values are in Base 16, with the first six letters of the alphabet standing for the numbers 10 to 15. This allows a two-character hex number to range from 0 to 255.
You read them as #rrggbb
, where rr
is the two-digit hex code for the red channel, gg
for the green channel, and bb
for the blue channel.
So #CC55DD
translates in decimal to CC = 204
(red), 55 = 85
(green), and DD = 221
(blue).
Introduce a palette that is friendly to color-blind viewers:
cb_palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
p + geom_point(size = 2) + scale_color_manual(values = cb_palette) + theme(legend.position = "top")
5 tips on designing colorblind-friendly visualizations
https://www.tableau.com/about/blog/2016/4/examining-data-viz-rules-dont-use-red-green-together-53463
R Graphics Cookbook, 2nd edition Winston Chang
Robert Simmon has a great series of blog posts on Subleties of Color
Paletton's Colorpedia is an encyclopedia of colors.
I want hue to generate and refine palettes of optimally distinct colors.
viridis: Perceptually uniform color scales.
ColorBrewer: Sequential, diverging, and qualitative color palettes that take accessibility into account.
Colorgorical: Create color palettes based on mathematical rules for perceptual distance.
Photochrome: Word-based color palettes.
Colours Cafe: Instagram account for colours inspiration
Vischeck: Simulate how your images look for people with different forms of colorblindness (web-based)
Coolors.co: hundreds of color palettes and beautiful color schemes.
scales::show_col(c("#532D8E", "#B1B3B5", "#5CB8B2", "#AF95D3", "#202C61", "#FFE9E9", "#EBC1F4", "#AC62D2"), ncol = 8)