Introduction to programming with Python
Participants become familiar with the interactive learning platform
– StackFuel‘s Data Lab – and the Python programming language.
Chapter 1 – Python Basics:
Participants navigate through the Data Lab for the first time and
become familiar with the basics of programming. They learn to
store numbers and text as variables in Python and to bundle them
as groups in lists. This basic knowledge is completed with the
proper way to read error messages.
Chapter 2 – Programming Basics:
Participants continue to build on their programming basics. This
chapter focuses on the use of functions and methods as well as
flow controls using conditions.
Chapter 3 – Loops and Functions:
The last chapter of the Basics module is dedicated to flow control
using loops. Participants expand their functionality by importing
additional Python packages and gain insight into code versioning
code with Git. By the end of the chapter, participants will know
the most important programming concepts that are important for
working as a data analyst.
Independent collection, analysis and visualization of data
Participants learn how to access, filter and merge new data sources.
They will practice making company data accessible interactively
in dynamic dashboards and independently perform classic data
processing operations (importing, filtering, cleaning, and
Chapter 1 – Data Pipelines (Pandas):
This chapter teaches participants how to efficiently use Pandas – the
standard tool of a data analyst in Python. Participants learn to use it
to read, clean, and aggregate data in CSV files.
As the second module begins, participants will receive assistance in
optimizing their online presence as a data analyst.
Chapter 2 – Data Exploration (Matplotlib):
Participants practice visualizing different types of data using
marketing data. Numerical data is represented as histograms and
scatter plots, while categorical data is represented as column
and pie charts.
Chapter 3 – Predictions (Statistics):
Participants learn statistical concepts such as medians and
quartiles using product ratings. They identify outliers and create
simple predictions using linear and logistic regression. In addition,
participants focus on creating their own data analytics portfolio, and
they receive practical tips about this.
Chapter 4 – Internal Data (SQL):
The participants learn to read databases using the example of an
employee database and to formulate standard SQL queries.
Chapter 5 – External Data (API):
Participants will use Python to access information such as web
pages and APIs designed by StackFuel on the Internet.
Chapter 6 – Advanced Jupyter:
Participants learn Jupyter functionalities and solve advanced
visualization problems such as live updates and interactivity in
the context of a stock market scenario.
Chapter 7 – Exercise Project:
Participants analyze a New York taxi data set with over one million
trips and use their Python skills as independently as possible to
answer given questions.
Chapter 8 – Final Project:
Participants analyze customer churn for a telecommunications
company. They work through the entire data pipeline
independently and answer typical questions. They present their
project in a 1-on-1 feedback session with StackFuel‘s
Solving supervised and unsupervised machine learning
problems with sklearn.
Participants create data science workflows with sklearn, evaluate
their model performance using appropriate metrics, and become
aware of the problem of overfitting.
Chapter 1 – Supervised Learning: Regression:
Using linear regression, participants learn how to use the Python
package sklearn. Furthermore, they deal with the assumptions of the
regression model and the evaluation of the generated predictions.
Participants learn about the bias-variance trade-off, regularization,
and various metrics of model quality.
Chapter 2 – Supervised Learning: Classification:
Participants are introduced to classification algorithms using the
k-Nearest-Neighbors algorithm and learn to evaluate the algorithm
and assess classification performance. They optimize the parameters
of their model and pay attention to dividing the data into training
and evaluation sets.
Chapter 3 – Unsupervised Learning: Clustering:
Participants learn about the k-Means algorithm as an example of an
unsupervised learning algorithm. The assumptions and performance
metrics of the algorithm are critically examined and a brief look is
taken at an alternative to k-Means clustering.
Chapter 4 – Unsupervised Learning:
Dimensionality Reduction: Participants learn how to reduce the
dimension of the data using Principal Component Analysis (PCA)
and use PCA to generate uncorrelated features from the original
data. In this context, the topic of feature engineering is explored
in more detail and new features are generated from the old ones.
Chapter 5 – Outlier Detection:
Participants learn about different approaches to identifying
outliers and understand how to deal with these unusual data
points. They use robust measures and models to minimize the
impact of outliers.
Expanding the data science toolkit.
Participants deepen their knowledge of data classification models. In
doing so, they expand their skills in collecting and preparing data.
Chapter 1 – Data Gathering:
Participants learn to gather data by mining web pages and PDF
documents. They structure collected text data using regular
expressions so that they can use it in conjunction with familiar
algorithms. As they begin this module, participants will receive help
in optimizing their online presence as a data Scientist.
Chapter 2 – Logistic Regression:
Participants learn a second classification algorithm: logistic
regression. They use new performance metrics to evaluate results
and learn how to prepare non-numeric data for their models.
Chapter 3 – Decision Trees and Random Forests:
Participants learn about the decision tree as an easy-to-interpret
model. They combine multiple models in an ensemble to improve
the predictions of their model. They also learn methods to deal
with unbalanced categories. In addition, participants will focus on
creating their own data science portfolio, and receive practical
tips for this.
Chapter 4 – Support Vector Machines:
Participants learn about a final classification algorithm – Support
Vector Machines (SVM) and highlight the behavior of different
kernels for SVM. They also learn the typical steps of Natural
Language Processing (NLP) and work through an NLP scenario
using bag-of-words models.
Chapter 5 – Neural Networks:
Participants will be introduced to artificial neural networks and
learn more about deep learning, to create a multilayer artificial
neural network and apply it to a data set.
Independent application of simple and complex modeling.
Participants gain confidence in solving data science problems and
learn to communicate results competently.
Chapter 1 – Visualization and Model Interpretation:
Participants learn important methods for interpreting and visualizing
machine learning models. By using model-agnostic methods for
interpretation, they learn to derive and communicate insights into
how their models work.
Chapter 2 – Spark:
Participants learn why working with distributed memory systems
is relevant. Using the Python package PySpark, they learn how to
read distributed databases, perform big data analysis, and use wellknown machine learning algorithms on distributed systems.
Chapter 3: Exercise Project:
Participants work on a prediction problem using a larger data set and
independently apply their data science skills from cleaning the data
set to interpreting the model. Participants receive feedback on their
approach to solving the problem in an individual project consultation
with StackFuel‘s mentoring team.
Chapter 4 – Final Project:
Participants are given another larger dataset to analyze
independently and solve with less assistance than they received for
the practice project. Participants receive feedback on their solution
approach in an individual project consultation with the StackFuel