In this module, we explore the Pandas and Scikit-learn packages for machine learning tasks using real-world geoscience data examples. Students will gain a good overview of how to look at large datasets and solve problems with state-of-the-art data science tools.

We will answer questions like:

  • What is it that you’re trying to solve? How can machine learning help?
  • What's the difference between supervised and unsupervised methods?
  • What's the difference between classification and regression?
  • How can we choose a model, and tell when we have a good one?
  • How can we improve the performance of our model?

It is recommended that you take Foundations in geocomputing and/or Next steps in geocomputing before this course.

Duration: We recommend allowing 2 days for this course. We can only do supervised classification in 1 day.

Next: Follow this course with an in-house hackathon.

Instructor-to-student ratio: One instructor per 10 students.

Pricing: See training page. There are discounts for large groups or multiple runnings of the course; tell us what you need.

Ready to find out more? Get in touch!

 

Topics

  • Recognizing tasks
  • Loading data to Pandas
  • Data exploration
  • Data cleaning
  • Feature engineering
  • Supervised learning
  • Classification
  • Regression
  • Performance metrics
  • Model selection
  • Choosing hyperparameters
  • Unsupervised learning

See Module 5 in the Geocomputing curriculum for more details