Data Visualization with Python
- Description
- Curriculum
- FAQ
- Reviews
You’ll begin with an introduction to data visualization and its importance. Then, you’ll learn about statistics by computing mean, median, and variance for some numbers, and observing the difference in their values. You’ll also learn about key NumPy and Pandas techniques, such as indexing, slicing, iterating, filtering, and grouping. You’ll study different types of visualizations, compare them, and find out how to select a particular type of visualization using this comparison. You’ll explore different plots, including custom creations. After you get a hang of the various visualization libraries, you’ll learn to work with Matplotlib and Seaborn to simplify the process of creating visualizations. You’ll also be introduced to advanced visualization techniques, such as geoplots and interactive plots. You’ll learn how to make sense of geospatial data, create interactive visualizations that can be integrated into any webpage, and take any dataset to build beautiful and insightful visualizations. You’ll study how to plot geospatial data on a map using Choropleth plot, and study the basics of Bokeh, extending plots by adding widgets and animating the display of information.
About the Authors
Mario Dƶbler is a graduate student with a focus in deep learning and AI. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of deep learning, using state-of-the-art algorithms to develop cutting-edge products. Currently, he dedicates himself to apply deep learning to medical data to make health care accessible to everyone.
Tim GroĆmann is a CS student with an interest in diverse topics ranging from AI to IoT. He previously worked at the Bosch Center for Artificial Intelligence in Silicon Valley in the field of big data engineering. He’s highly involved in different open source projects and actively speaks at meetups and conferences about his projects and experiences.
Erik Sevre is a Doctoral Student in Computational Science and Technology at Seoul National University. He is a researcher at Seoul National University.
-
1Course Overview
Python has recently emerged as a programming language that performs well for data analysis. Python has applications across data science pipelines that convert data into a usable format, analyze it, and extract useful conclusions from the data to represent it well. It provides data visualization libraries that can help you assemble graphical representations quickly.
In this course, you will learn how to use Python in combination with various libraries, such as NumPy, pandas, Matplotlib, seaborn, and geoplotlib, to create impactful data visualizations using real-world data. The GitHub link for this course is ā https://github.com/TrainingByPackt/Data-Visualization-with-Python-eLearning
-
2Installation and Setup
Before you start this course, we'll install Python 3.6, pip, and the other libraries used throughout this course.
-
3Introduction
Unlike machines, people are not usually equipped for interpreting a lot of information from a random set of numbers and messages in a given piece of data. While they may know what the data is basically comprised of, they might need help to understand it completely. Out of all our logical capabilities, we understand things best through the processing of visual information. When data is represented visually, the probability of understanding complex builds and numbers increases.
-
4Overview of Statistics
Statistics is a combination of the analysis, collection, interpretation, and representation of numerical data. Probability is the measures of the likelihood that an event will occur and will be quantified as a number between 0 and 1. The higher the probability, the more likely the event. Let us learn about this in further detail.
-
5NumPy
When handling data, we often need a way to work with multidimensional arrays. As we discussed previously, we also must apply some basic mathematical and statistical operations on that data. This is exactly where NumPy positions itself. It provides support for large n-dimensional arrays and is the built-in support for many high-level mathematical and statistical operations.
-
6pandas
The pandas Python library offers data structures and methods to manipulate different types of data, such as numerical and temporal. These operations are easy to use and highly optimized for performance. Let us learn about the different built-in solutions provided by pandas.
-
7Lesson Summary
Let us quickly recap our learning from this lesson.
-
8Test your knowledge
-
9Lesson Overview
In this lesson, we will focus on various visualizations and identify which visualization is best to show certain information for a given dataset. We will describe every visualization in detail and give practical examples, such as comparing different stocks over time or comparing the ratings for different movies.
-
10Comparison Plots
Comparison plots are well-suited charts that compare multiple variables or variables over time. Different plots are used based on the data requirements like bar charts, line charts, vertical bar charts, radar charts and so on. We will learn each of these types in detail with examples.
-
11Relation Plots
Relation plots are perfectly suited to show relationships among variables. A scatter plot visualizes the correlation between two variables for one or multiple groups. Bubble plots can be used to show relationships between three variables. The additional third variable is represented by the dot size. Heatmaps are great for revealing patterns or correlating between two qualitative variables. A correlogram is a perfect visualization to show the correlation among multiple variables.
-
12Composition Plots
Composition plots are ideal if you think about something as a part of a whole. For static data, you can use pie charts, stacked bar charts, or Venn diagrams.
-
13Distribution Plots
Distribution plots give a deep insight into how your data is distributed. For a single variable, a histogram is well-suited. For multiple variables, you can either use a box plot or a violin plot. The violin plot visualizes the densities of your variables, whereas the box plot just visualizes the median, the interquartile range, and the range for each variable.
-
14Geo Plots
Geological plots are a great way to visualize geospatial data. Choropleth maps can be used to compare quantitative values for different countries, states, and so on. If you want to show connections between different locations, connection maps are the way to go.
-
15What Makes a Good Visualization?
In this video, we will learn about the various factors that make a good visualization.
-
16Lesson Summary
Let us quickly recap our learning from this lesson.
-
17Test your knowledge
-
18Lesson Overview
Matplotlib is the most popular plotting library for Python, used for data science and machine learning visualizations. Several features like the global style of MATLAB were introduced into Matplotlib to make the transition to Matplotlib easier. The Matplotlib library is a huge project which shows the level of abstraction worked into it to make the usage intuitive and convenient. Let us try to understand the concepts behind the plots.
-
19Overview of Plots in Matplotlib
Plots in Matplotlib have a hierarchical structure that nests Python objects to create a tree-like structure. Each plot is encapsulated in a Figure object. This Figure is the top-level container of the visualization. It can have multiple axes, which are basically individual plots inside this top-level container. Let us learn about this is more detail.
-
20Basic Text and Legend Functions
All the functions, except for the legend, create and return a matplotlib.text.Text() instance. We are mentioning it here so that you know that all the discussed properties can be used for the other functions as well. Let us learn about all the text functions in this video.
-
21Basic Plots
Let us look at the different types of basic plots in this video.
-
22Layouts
There are multiple ways to define a visualization layout in Matplotlib. We will start with subplots and how to use the tight layout to create visually appealing plots and then cover GridSpec, which offers a more flexible way to create multi-plots. Letās see how!
-
23Images
In case you want to include images in your visualizations or in case you are working with image data, Matplotlib offers several functions to deal with images. In this section, we will learn to load, save, and plot images with Matplotlib.
-
24Writing Mathematical Expressions
In case you need to write mathematical expressions within the code, Matplotlib supports TeX. You can use it in any text by placing your mathematical expression in a pair of dollar signs. There is no need to have TeX installed since Matplotlib comes with its own parser.
-
25Lesson Summary
Let us quickly recap our learning from this lesson.
-
26Test your knowledge
-
27Lesson Overview
Unlike Matplotlib, Seaborn is not a standalone Python library. It is built on top of Matplotlib and provides a higher-level abstraction to make visually appealing statistical visualizations. A neat feature of Seaborn is the ability to integrate with DataFrames from the pandas library. With Seaborn, we attempt to make visualization a central part of data exploration and understanding. Internally, Seaborn operates on DataFrames and arrays that contain the complete dataset. This enables it to perform semantic mappings and statistical aggregations that are essential for displaying informative visualizations. Seaborn can also be solely used to change the style and appearance of Matplotlib visualizations.
-
28Controlling Figure Aesthetics
Matplotlib is highly customizable. But this also has the effect that it is difficult to know what settings to tweak to achieve a visually appealing plot. In contrast, Seaborn provides several customized themes and a high-level interface for controlling the appearance of Matplotlib figures.
-
29Color Palettes
Color is a very important factor for your visualization. Color can reveal patterns in the data if used effectively or hide patterns if used poorly. Seaborn makes it easy to select and use color palettes that are suited to your task. The color_palette() function provides an interface for many of the possible ways to generate colors. seaborn.color_palette([palette], [n_colors], [desat]) returns a list of colors, thus defining a color palette.
-
30Interesting Plots in seaborn
Seaborn offers a very convenient way to create various bar plots. They can be also used in Seaborn to represent estimates of central tendency with the height of each rectangle and indicates the uncertainty around that estimate using error bars. Let us learn this through the following example which plots the salary based on the employee qualification in various districts.
-
31Multi-plots in seaborn
In the previous section, we introduced a multi-plot, namely the pair plot. In this video, let us learn about a different way to create flexible multi-plots.
-
32Regression Plots
Many datasets contain multiple quantitative variables, and the goal is to find a relationship among those variables. We previously mentioned a few functions that show the joint distribution of two variables. It can be helpful to estimate relationships between two variables. We will only cover linear regression in this topic; however, Seaborn provides a wider range of regression functionality if needed. To visualize linear relationships determined through linear regression, use the function regplot(). Let us look at an example of a seaborn regression plot.
-
33Squarify
Matplotlib and seaborn do not offer tree maps, therefore, the Squarify library built on Matplotlib is used in place of tree maps. seaborn is a great addition to create color palettes. Let us look at the following example of a tree map using the Squarify library.
-
34Lesson Summary
Let us quickly recap our learning from this lesson.
-
35Test your Knowledge
-
36Lesson Overview
Geoplotlib is an open source Python library for geospatial data visualizations. It has a wide range of geographical visualizations and supports hardware acceleration. It also provides performance rendering for large datasets with millions of data points.
-
37Geoplotlib Basics
Geoplotlib is an open-source Python library for geospatial data visualizations which contains a wide range of geographical visualizations. It has a simple interface.
Some of its features include:
Supports hardware acceleration
Provides performance rendering for large datasets
Provides map tiles for interactivity and simple animations
-
38Tile Providers
Geoplotlib supports the usage of different tile providers. This means that any OpenStreetMap tile server can be used as a backdrop to our visualization. Some of the popular free tile providers are Stamen Watercolor, Stamen Toner, Stamen Toner Lite, and DarkMatter.
-
39Custom Layers
Custom layers allow you to create more complex data visualizations. They also help with adding more interactivity and animation to them. Creating a custom layer starts by defining a new class that extends the BaseLayer class that's provided by Geoplotlib. Besides the __init__ method that initializes the class level variables, we must at least extend the draw method of the already provided BaseLayer class.
-
40Lesson Summary
Let us quickly recap our learning from this lesson.
-
41Test your knowledge
Social Network