Have a question?
Message sent Close

Convert Scanned Documents to Text. JPG to Excel

How to extract data from scanned documents. From PDF format to images, table and text.
Instructor
Boiko Artem
23 Students enrolled
4.17
3 reviews
  • Description
  • Curriculum
  • FAQ
  • Reviews
8109

“Level Up” for Beginners who are interested in Big Data and Machine Learning (using Python).

This Step-by-Step Course is intended to be an initiation to learn #BigData and #MachineLearning with #Python programming for absolute beginners that have no background in programming.

In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic “Big data and machine learning”. 
Since the material turned out to be voluminous, I divided the course into five parts.

The Second part is devoted to the collection and extraction of data from scanned documents and Images. In this course, you will learn how to extract data from From Scanned Documents And Images, invoices, receipts, contracts and any other documents in PDF format or in Image format.

⇉ We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.

⇉ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.

Topics covered in this course:

Lecture 2. Python. Choosing python IDE. Anaconda. Install Python.

  • How to convert a scanned PDF to text?

  • Python or Anaconda?

  • Choosing an Python IDE for beginners.

  • How to install Visual Studio Code on Windows?

  • How to install Python?

  • How to run Python in VS Code?

Lecture 3. Scanned PDF files. Convert a PDF document to images using Python.

  • How to convert scanned PDF to JPEG?

  • How to Install Tesseract OCR?

  • What is Tesseract?

  • Google OCR in Python with Tesseract.

  • Extract a page from a pdf as a jpeg

  • How to convert a pdf document to images using python?

  • Convert PDF to Image using Python.

  • Install Poppler, Pillow (PIL) module.

Lecture 4. Installing Tesseract. User-defined functions in Python.

  • Installing Tesseract for Windows

  • Install PyTesseract OCR.

  • Iterate over files in a given directory.

  • How is try/except used in Python?

  • Writing user-defined functions in Python

Lecture 5. Regular Expression in Python. Pattern matching in Python.

  • What is regular expression?

  • How do you match in regex?

  • Online RegEx tester and debugger.

  • Use Findall in Python?

  • Using Regex for Text Manipulation in Python.

Lecture 6. Array und Function in Python. Add data to Array. Create function in Python.

  • Add a string to an array.

  • How to declare and add items to an array in Python?

  • Write a function in Python.

  • Save data to Pandas Dataframe.

Lecture 7. GeoPy – easy to locate the coordinates. Get the latitude and longitude of location.

  • How do I convert address to coordinates?

  • How do you geocode data?

  • Locate the coordinates.

  • How do I find the geocode of an address?

  • Install GeoPy module.

  • Install GDL, Fiona module.

Lecture 8. Kaggle. Jupiter Notebook. Plot data with matplotlib, seaborn, squarify.

  • Visualize a dataset.

  • Run Jupyter notebook using Kaggle.

  • Python Treemaps with Squarify and Matplotlib.

  • How do you create a TreeMap chart?

  • How to Convert Strings to Floats in Pandas DataFrame.

  • Replacing strings with numbers in Python

  • Plot a DataFrame with matplotlib and seaborn.

Lecture 9. Folium. Mapping in Python. Plot Geographic Data on a Map.

  • Plot Geographic Data on a Map.

  • How to use folium with Jupyter notebook?

  • Placing coordinates on a map.

  • How to plot data on maps in Jupyter.

  • Efficiently display a map with CircleMarker().

  • Mapping in Python with geopandas.

  • Black & White map with Folium.

Lecture 10. GitHub. Desktop GitHub. Store and manage code.

  • GitHub and how do you use it.

  • Upload files to GitHub.

  • Install GitHub Desktop.

  • Sync with a remote Git repository.

  • Adding a repository from your local computer to GitHub.

The course is best-suited for learners who are interested in Big Data and Machine Learning (using Python) or for learners who already have Python programming skills but want to practice with a hands-on, real-world data project can also benefit from this course.

How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.17
3 reviews
Stars 5
2
Stars 4
0
Stars 3
0
Stars 2
1
Stars 1
0