Data science, human resources development and IT terms and definitions 




An algorithm is a formal instruction to achieve a certain result. It must be formulated in such a way that it works unseen and can be repeated according to a clear pattern. For a computer, the algorithm works like a recipe to be followed. 


Artificial Intelligence 

The field of artificial intelligence or AI deals with the automation of intelligent behavior and is closely related to machine learning. Artificial Intelligence is an interdisciplinary field and is divided into weak artificial intelligence, which has already found its way into our everyday lives, and strong artificial intelligence, which is currently still in the realm of science fiction. 


AI Literacy 

AI literacy is the ability to understand what artificial intelligence is and what it takes to make it work. These skills make it possible to critically evaluate AI technologies, communicate effectively with an AI system, work with it, and use it as a practical tool for one’s own work.

Big Data 

Big data is described as data sets that have the following characteristics:  

– a high volume 

– generated at a high speeds 

– highly varied 

– high quality 

Big data therefore also offers great added business value and helps to make trends measurable and to derive predictions and recommendations for action from them. 


Business analytics 

Business analytics describes specific capabilities, technologies and processes for continuous analysis of business developments, allowing valuable insights to be gained, improving business decisions. 


Business Intelligence (BI) 

Business intelligence is the process of processing collected data into meaningful insights. Business Intelligence is a discipline from business informatics and describes the systematic analysis of a company. 


Business Intelligence Analyst (BI Analyst) 

A business intelligence (BI) analyst extracts insights from company data and derives recommendations for action from this data. To do this, they use specialized software and IT systems such as Power BI, Tableau or Microsoft Excel. 


Clustering (also cluster analysis) refers to a method for discovering similarities and patterns in data sets. These groups of similar or identical data and elements are called clusters. 


Convolutional Neural Networks (CNNs) 

CNNs are a special artificial neural network. CNN stands for Convolutional Neural Network. These are used primarily for unstructured data, especially image and video material.  


Customer analytics 

Customer analytics is the data-driven assessment of a specific customer base. This enables the identification of profitable customer relationships and buyer interests in order to create suitable, individualized offers.


Data contains information. In computer science, data is made up of digital and binary values and information that are collected through observation, measurement, and statistical surveys, as well as formulable findings. 


Data Analysis 

Data analysis involves using statistical methods to gain insights from collected raw data. These methods include collecting, examining, cleaning, processing, and modeling data. 


Data Analyst 

A data analyst collects and processes data and information of all types that adds value to an organization. A data analyst visualizes and presents insights from data analysis to support decision makers using these insights.  


Data Awareness 

See Data Literacy 



Refers to a system for electronic data management. Data is collected and stored here. 


Data Driven Management  

Data-driven management means making business decisions based on data and facts. 


Data Engineer 

A data engineer is an important job role in the data environment. Data engineers are primarily tasked with installing and maintaining data pipelines. Data pipelines can be understood here as the automated transfer and processing of data. Data engineers ensure that data quality remains consistently high and that data processing is performant. 



Describes the digitization of everyday activities and areas of life and the associated generation of data. 


Data lake 

A data lake is a repository where data can be stored in its original form. Unlike a data warehouse, the data stored in the data lake is not ready to be analyzed, but unsorted. On the one hand, this allows greater flexibility when handling this data, but on the other hand, the hurdle is higher for users. This makes the data lake particularly useful for the requirements of data scientists and data analysts.  


Data Literacy  

Data literacy can be translated as basic data competence. This describes the ability to assess and deal with data. This includes understanding why and how data is collected, what value it adds, and how its analysis works. 


Data management 

Data management (also known as data administration) encompasses all aspects that form the basis for successful data strategies. The goal of data management is to optimally prepare and make data available for processing. 


Data Science 

Data science combines scientific and statistical applications for extracting knowledge from data. Statistics, higher mathematics, programming and artificial intelligence methods are used. 


Data Scientist 

Data scientist is one of the most sought-after professions and job roles of the decade. Data scientists use scientific methods, as well as machine learning and artificial intelligence methods, to work on complex problems involving data or to develop data products. In addition to programming, they also use higher mathematics and statistics methods. 


Data, structured and unstructured  

Digital information (data) usually has different structures. A distinction is made between structured, semi-structured and unstructured data. Depending on what form it is available in, it is simpler, or less complex to prepare the data for processing. 


Data Thinking 

Data thinking combines data science and design thinking methods to successfully develop data products and data strategies. Data Thinking considers and solves problems by taking the user’s point of view. 


Data Warehouse 

A data warehouse is a central database optimized for data analysis. The data is prepared and follows a defined structure. The structure of the data warehouse simplifies standardized queries, e.g. for creating reports. Due to the high degree of preparation and standardization of the data, there is a lower threshold for use, but the possible applications are less flexible than those of a data lake.  


Data visualization 

Data visualization is the graphical representation of data with the aim of getting a quick insight into the information resulting from the data and communicating it. 


Deep Learning 

Deep dearning is one of the most advanced and complex approaches to Machine Learning. Called neural networks, they consist of algorithms inspired by the structure and function of the brain, but significantly beyond human capabilities. Deep learning models can make their own predictions completely independent of humans. To do this, they analyze data following a logical structure to make the most likely prediction from the results. 


Design thinking 

Design thinking is used to create solutions to problems and approaches to new ideas. It is an approach that aims to create solutions for user-friendly applications. 





This is the transformation of analog processes into digital ones. This also fulfills the added value of creating digital data in this process, which facilitates collection and processing 



Digitization is the transformation of analog content into digital content. 


Keras is an open source library. Keras is used to perform rapid implementation of neural networks. Accordingly, Keras is often an important building block in deep learning applications. 


Key Performance Indicator (KPI) 

KPI is an abbreviation for key performance indicator and refers to key figures that are defined to measure the success of an activity or the performance of a person or a department. They are used in corporate processes to make performance measurable and evaluate it. 

Machine Learning 

Machine learning is a field of artificial intelligence that aims to enable computers to automatically draw data from their environment and make assessments in order to make better decisions. In doing so, the algorithm learns rules from data on its own. How well an artificial intelligence learns depends on the quality and quantity of the data it is provided with to learn from. (Principle of “garbage in, garbage out”). 



Matplotlib is a library of the Python programming language. It is particularly suitable for visualizing data. 

OOP – Object-oriented programming

OOP is a programming style that is well suited, e.g., for the production of software that is both complex and needs to be continuously updated. The code is organized around or by so-called objects, which are put in relation to each other. Everything that exists within a program is its own object, containing its unique attributes and behavior. Object-oriented programming is therefore a particularly efficient approach, as the program code can be reused and scaled.



Pandas is a program library of the Python programming language, which enables the data to be cleaned, managed and analysed. DataFrames are the main feature of Pandas, which represent data in a table made up of columns and rows.  


Predictive Maintenance 

Predictive maintenance is a maintenance process to minimize the “downtime” of machines and manufacturing equipment. It aims to optimize maintenance cycles by calculating the probability of system failures. Maintenance can be better planned and predictable failures can be prevented. For this purpose, predictive maintenance collects data about the machine being monitored in real time in order to detect deviations at an early stage.  



Python is an open-source programming language that can be used to develop software applications. Python is easy to learn compared to other programming languages and is therefore very popular. Python provides access to many libraries that are especially popular in the data environment, such as Pandas, Matplotlib, and Keras.  



PySpark is an interface from Adobe Spark to Python. This interface makes the functions of Adobe Spark available and usable in Python. Spark and PySpark are used in what are known as distributed systems. In this way, data and applications can be processed that would exceed the capacities of a single, normal computer. 

Regression analysis 

Regression analysis is the name of a statistical analysis method that is often used in machine learning. It involves predicting a continuous target variable from one or more so-called explanatory variables. In the course of a regression analysis, cause-effect relationships can be investigated in addition to predictions.  



R is a programming language that is used in particular for statistical calculations and graphics. 

Structured Query Language (SQL)  

SQL is a language used to communicate between relational databases. It is used to query, edit or remove data from databases.  


Similar to Keras, TensorFlow is a framework for implementing neural networks. Keras acts as an interface and makes it easier to use TensorFlow.  


Trend analysis 

Trend analysis allows us to observe and explore trends and their causes. Trend analysis can be used to predict the influence of a trend on a company and relevant markets. 

Learn more about the most important the most important data science, workforce development, and IT terms.

Follow StackFuel on

Non-binding consultation

Your next career step is already in reach.

Do you still learn data and IT terms or are you ready to take the next step? Make yourself data confident and prepare for the future. Whether you are looking for further training in your current job or for a career change – we will be happy to advise you free of charge and without obligation on your course options and discuss your funding options with you.  

Still learning data and it terms? Find the best data course for your needs.