Big Data Analytics with PySpark + Power BI + MongoDB
- Description
- Curriculum
- FAQ
- Reviews
Welcome to the Big Data Analytics with PySpark + Power BI + MongoDB course. In this course we will be creating a big data analytics pipeline, using big data technologies like PySpark, MLlib, Power BI and MongoDB.
We will be working with earthquake data, that we will transform into summary tables. We will then use these tables to train predictive models and predict future earthquakes. We will then analyze the data by building reports and dashboards in Power BI Desktop.
Power BI Desktop is a powerful data visualization tool that lets you build advanced queries, models and reports. With Power BI Desktop, you can connect to multiple data sources and combine them into a data model. This data model lets you build visuals, and dashboards that you can share as reports with other people in your organization.
MongoDB is a document-oriented NoSQL database, used for high volume data storage. It stores data in JSON like format called documents, and does not use row/column tables. The document model maps to the objects in your application code, making the data easy to work with.
-
You will learn how to create data processing pipelines using PySpark
-
You will learn machine learning with geospatial data using the Spark MLlib library
-
You will learn data analysis using PySpark, MongoDB and Power BI
-
You will learn how to manipulate, clean and transform data using PySpark dataframes
-
You will learn how to create Geo Maps using ArcMaps for Power BI
-
You will also learn how to create dashboards in Power BI
Social Network