You’ll begin with an introduction to data visualization and its importance. Then, you’ll learn about statistics by computing mean, median, and variance for some numbers, and observing the difference in their values. You’ll also learn about key NumPy and Pandas techniques, such as indexing, slicing, iterating, filtering, and grouping. You’ll study different types of visualizations, compare them, and find out how to select a particular type of visualization using this comparison. You’ll explore different plots, including custom creations. After you get a hang of the various visualization libraries, you’ll learn to work with Matplotlib and Seaborn to simplify the process of creating visualizations. You’ll also be introduced to advanced visualization techniques, such as geoplots and interactive plots. You’ll learn how to make sense of geospatial data, create interactive visualizations that can be integrated into any webpage, and take any dataset to build beautiful and insightful visualizations. You’ll study how to plot geospatial data on a map using Choropleth plot, and study the basics of Bokeh, extending plots by adding widgets and animating the display of information.

**About the Authors**

**Mario DĂ¶bler** is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning, using state-of-the-art algorithms to develop cutting-edge products. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone.

**Tim GroĂŸmann** is a CS student with an interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different open source projects and actively speaks at meetups and conferences about his projects and experiences.

**Erik Sevre** is a Doctoral Student in Computational Science and Technology at Seoul National University. He is a researcher at Seoul National University.

### Importance of Data Visualization and Data Exploration

Python has recently emerged as a programming language that performs well for data analysis. Python has applications across data science pipelines that convert data into a usable format, analyze it, and extract useful conclusions from the data to represent it well. It provides data visualization libraries that can help you assemble graphical representations quickly.

In this course, you will learn how to use Python in combination with various libraries, such as NumPy, pandas, Matplotlib, seaborn, and geoplotlib, to create impactful data visualizations using real-world data. The GitHub link for this course is â€“ https://github.com/TrainingByPackt/Data-Visualization-with-Python-eLearning

Before you start this course, we'll install Python 3.6, pip, and the other libraries used throughout this course.

Unlike machines, people are not usually equipped for interpreting a lot of information from a random set of numbers and messages in a given piece of data. While they may know what the data is basically comprised of, they might need help to understand it completely. Out of all our logical capabilities, we understand things best through the processing of visual information. When data is represented visually, the probability of understanding complex builds and numbers increases.

Statistics is a combination of the analysis, collection, interpretation, and representation of numerical data. Probability is the measures of the likelihood that an event will occur and will be quantified as a number between 0 and 1. The higher the probability, the more likely the event. Let us learn about this in further detail.

When handling data, we often need a way to work with multidimensional arrays. As we discussed previously, we also must apply some basic mathematical and statistical operations on that data. This is exactly where NumPy positions itself. It provides support for large n-dimensional arrays and is the built-in support for many high-level mathematical and statistical operations.

The pandas Python library offers data structures and methods to manipulate different types of data, such as numerical and temporal. These operations are easy to use and highly optimized for performance. Let us learn about the different built-in solutions provided by pandas.

Let us quickly recap our learning from this lesson.

### All You Need to Know About Plots

In this lesson, we will focus on various visualizations and identify which visualization is best to show certain information for a given dataset. We will describe every visualization in detail and give practical examples, such as comparing different stocks over time or comparing the ratings for different movies.

Comparison plots are well-suited charts that compare multiple variables or variables over time. Different plots are used based on the data requirements like bar charts, line charts, vertical bar charts, radar charts and so on. We will learn each of these types in detail with examples.

Relation plots are perfectly suited to show relationships among variables. A scatter plot visualizes the correlation between two variables for one or multiple groups. Bubble plots can be used to show relationships between three variables. The additional third variable is represented by the dot size. Heatmaps are great for revealing patterns or correlating between two qualitative variables. A correlogram is a perfect visualization to show the correlation among multiple variables.

Composition plots are ideal if you think about something as a part of a whole. For static data, you can use pie charts, stacked bar charts, or Venn diagrams.

Distribution plots give a deep insight into how your data is distributed. For a single variable, a histogram is well-suited. For multiple variables, you can either use a box plot or a violin plot. The violin plot visualizes the densities of your variables, whereas the box plot just visualizes the median, the interquartile range, and the range for each variable.

Geological plots are a great way to visualize geospatial data. Choropleth maps can be used to compare quantitative values for different countries, states, and so on. If you want to show connections between different locations, connection maps are the way to go.

In this video, we will learn about the various factors that make a good visualization.

Let us quickly recap our learning from this lesson.

### A Deep Dive into Matplotlib

Matplotlib is the most popular plotting library for Python, used for data science and machine learning visualizations. Several features like the global style of MATLAB were introduced into Matplotlib to make the transition to Matplotlib easier. The Matplotlib library is a huge project which shows the level of abstraction worked into it to make the usage intuitive and convenient. Let us try to understand the concepts behind the plots.

Plots in Matplotlib have a hierarchical structure that nests Python objects to create a tree-like structure. Each plot is encapsulated in a Figure object. This Figure is the top-level container of the visualization. It can have multiple axes, which are basically individual plots inside this top-level container. Let us learn about this is more detail.

All the functions, except for the legend, create and return a matplotlib.text.Text() instance. We are mentioning it here so that you know that all the discussed properties can be used for the other functions as well. Let us learn about all the text functions in this video.

Let us look at the different types of basic plots in this video.

There are multiple ways to define a visualization layout in Matplotlib. We will start with subplots and how to use the tight layout to create visually appealing plots and then cover GridSpec, which offers a more flexible way to create multi-plots. Letâ€™s see how!

In case you want to include images in your visualizations or in case you are working with image data, Matplotlib offers several functions to deal with images. In this section, we will learn to load, save, and plot images with Matplotlib.

In case you need to write mathematical expressions within the code, Matplotlib supports TeX. You can use it in any text by placing your mathematical expression in a pair of dollar signs. There is no need to have TeX installed since Matplotlib comes with its own parser.

Let us quickly recap our learning from this lesson.

### Simplifying Visualizations Using seaborn

Unlike Matplotlib, Seaborn is not a standalone Python library. It is built on top of Matplotlib and provides a higher-level abstraction to make visually appealing statistical visualizations. A neat feature of Seaborn is the ability to integrate with DataFrames from the pandas library. With Seaborn, we attempt to make visualization a central part of data exploration and understanding. Internally, Seaborn operates on DataFrames and arrays that contain the complete dataset. This enables it to perform semantic mappings and statistical aggregations that are essential for displaying informative visualizations. Seaborn can also be solely used to change the style and appearance of Matplotlib visualizations.

Matplotlib is highly customizable. But this also has the effect that it is difficult to know what settings to tweak to achieve a visually appealing plot. In contrast, Seaborn provides several customized themes and a high-level interface for controlling the appearance of Matplotlib figures.

Color is a very important factor for your visualization. Color can reveal patterns in the data if used effectively or hide patterns if used poorly. Seaborn makes it easy to select and use color palettes that are suited to your task. The color_palette() function provides an interface for many of the possible ways to generate colors. seaborn.color_palette([palette], [n_colors], [desat]) returns a list of colors, thus defining a color palette.

Seaborn offers a very convenient way to create various bar plots. They can be also used in Seaborn to represent estimates of central tendency with the height of each rectangle and indicates the uncertainty around that estimate using error bars. Let us learn this through the following example which plots the salary based on the employee qualification in various districts.

In the previous section, we introduced a multi-plot, namely the pair plot. In this video, let us learn about a different way to create flexible multi-plots.

Many datasets contain multiple quantitative variables, and the goal is to find a relationship among those variables. We previously mentioned a few functions that show the joint distribution of two variables. It can be helpful to estimate relationships between two variables. We will only cover linear regression in this topic; however, Seaborn provides a wider range of regression functionality if needed. To visualize linear relationships determined through linear regression, use the function regplot(). Let us look at an example of a seaborn regression plot.

Matplotlib and seaborn do not offer tree maps, therefore, the Squarify library built on Matplotlib is used in place of tree maps. seaborn is a great addition to create color palettes. Let us look at the following example of a tree map using the Squarify library.

Let us quickly recap our learning from this lesson.

### Plotting Geospatial Data

Geoplotlib is an open source Python library for geospatial data visualizations. It has a wide range of geographical visualizations and supports hardware acceleration. It also provides performance rendering for large datasets with millions of data points.

Geoplotlib is an open-source Python library for geospatial data visualizations which contains a wide range of geographical visualizations. It has a simple interface.

Some of its features include:

Supports hardware acceleration

Provides performance rendering for large datasets

Provides map tiles for interactivity and simple animations

Geoplotlib supports the usage of different tile providers. This means that any OpenStreetMap tile server can be used as a backdrop to our visualization. Some of the popular free tile providers are Stamen Watercolor, Stamen Toner, Stamen Toner Lite, and DarkMatter.

Custom layers allow you to create more complex data visualizations. They also help with adding more interactivity and animation to them. Creating a custom layer starts by defining a new class that extends the BaseLayer class that's provided by Geoplotlib. Besides the __init__ method that initializes the class level variables, we must at least extend the draw method of the already provided BaseLayer class.

Let us quickly recap our learning from this lesson.

### Making Things Interactive with Bokeh

Bokeh has been around since 2013, with version 1.0.4 being released in 2018. It targets modern web browsers to present interactive visualizations to users rather than static images. In this lesson, we will design interactive plots using the Bokeh library.

Let us look at some of the features of Bokeh. Since we are using Jupyter Notebook throughout this courseware, it's worth mentioning that Bokeh, including its interactivity, is natively supported in Notebook.

One of the most powerful features of Bokeh is its ability to use widgets to interactively change the data that's displayed in the visualization. Bokeh widgets work best when used in combination with the Bokeh server. However, using the Bokeh server approach is beyond the content of this courseware, since we would need to work with simple Python files and can't leverage the power of Python notebook. Instead, we will use a hybrid approach that only works with the older Jupyter Notebook.

Let us quickly recap our learning from this lesson.