Data Engineering and Artificial Intelligence

Data Analytics | Data Science | Big Data | Artificial Intelligence | Software

Benjamin, Directeur de Projet - ADEO Services

The department of Data Engineering and Artificial Intelligence trains multiskilled engineers in the processing of information, both  in its statistical and computational forms, for use in various business professions. Given the cross-disciplinary nature of IT and statistics as disciplines, the areas of activity in which they appear are numerous, especially in the tertiary sector and the IT aspects of the secondary sector.

Apart from the teaching of scientific concepts and the mastery of the tools used, the courses taught place particular importance on the learning of methods (ie the qualities of rigor, curiosity and inventiveness in scientific approaches) and the development of the personality of the student (communication and listening skills, team work, autonomy…)

During the 3rd year (semesters 5 and 6), courses focus mainly on the teaching of fundamental subjects, based on the three main areas of specialization of the department – IT, statistics and humanities. The specific classes offered to student engineers depend on the particular areas of previous study.

During semesters 7 and 8, the student engineer deepens his/her knowledge of statistics (statistical modelling, exploratory statistics…), IT (object-oriented programming, architectural software, data bases, advanced systems….) and project management (information system projects, project management…). During this year, greater emphasis is placed on projects carried out in groups.

During the 9th and 10th semesters, the student chooses a specialization through the choice of optional courses. These include software engineering, data mining, statistical methods for marketing, the mathematics of new financial products, bio-statistics…The specialization is also determined through the choice of final year project and placement.

Classes are taught in French.

Activity sectors

  • SSII (Sociétés de Services en Ingénierie Informatique – computer engineering and maintenance companies)
  • Banking and Insurance
  • Major retail outlets
  • Public administration
  • Research

Program

UE 5-1 - Fundamental of Mathematics

90h
ECTS
Hours
111100
UE 5-1 - Fundamental of Mathematics
7
90 h
111110
Mathematical Support 1
 
  • Goals to be achieved :
    * Review the fundamental concepts of mathematics (part 1)
  • Course details :
    - Reasoning and logic
    - Limits, continuity, derivation, integration (on a segment)
    - Exponential and logarithm functions
    - Study of functions
    - Trigonometry and complex numbers
    - Numerical sequencies
  • Reading list :



16 h Tutorial
111120
Mathematical Support 2
 
  • Prerequisites :
    111110
  • Goals to be achieved :
    * Review the fundamental concepts of mathematics (part 2)
  • Course details :
    Analysis
    - Multiple integrals (bounded region)
    - Polar coordinates
    - Series expansions

    Algebra
    - Determinant of a matrix
    - Inversion of matrices
    - Solving systems of linear equations
  • Reading list :

16 h Tutorial
111130
Mathematical Basics
3
40 h
  • Prerequisites :
    111110|111120
  • Goals to be achieved :
    * Review the fundamental concepts of mathematics (part 3)
  • Course details :
    Analysis
    - Improper integrals
    - Numerical series and power series
    - Sequences and series of functions

    Algebra
    - Normed vector spaces
    - Scalar product, Euclidean norm, orthogonality
    - Gram-Schmidt orthonormalisation
    - Orthogonal projection
  • Reading list :

12 h Lecture
28 h Tutorial
3 h DS
111140
Probability 1
2
20 h
  • Goals to be achieved :
    * Be familiar with tools for counting (arrangements, permutations and combinations) of finite sets, especially for computing probabilities in the equiprobable case.
    * Understand the notion of a probability space and its elementary properties. Be able to use the vocabulary of the set theory to describe events.
    * Understand the notion of independence of events and be able to calculate (conditional) probabilities with additional information
  • Course details :
    1. Enumeration
    1.1 Vocabulary for sets
    1.2 Finite sets and enumeration
    1.3 Countable sets
    2. Probability
    2.1 Notion of random experiment
    2.2 Events
    2.3 Probability as a function of sets
    3. Conditioning and independence
    3.1 Conditioning
    3.2 Independence of events
  • Reading list :




10 h Lecture
10 h Tutorial
1 h DS
111150
Probability 2
2
30 h
  • Prerequisites :
    111140|111130|111250
  • Goals to be achieved :
    * Understand the concepts of real random variable and probability law
    * Be able to describe and interpret the distribution function of a variable
    * Be able to describe and interpret the laws of discrete variables by the mass function and that of continuous variables by the density
    * Know the usual laws
    * Be able to calculate the expectation and moments of a variable
    * Understand the notion of characteristic function of a variable
  • Course details :
    1. Definition of a random variable (we will consider the notion of sigma algebra and probability).
    2. Types of random variables: quantitative, qualitative.
    3. Discrete and continuous real random variables.
    4. Probability distribution of a discrete random variable. Discrete random variable models (Uniform, Binomial, Poisson, Geometric).
    5. Probability distribution of a continuous random variable (density). Continuous random variable models (Uniform, Exponential, Normal).
    6. Distribution function. Properties (quantiles, link with density, transformation of a v.a.).
    7. Characterization of an a.v.: moments of a random variable.
    8. Properties of expectation and variance. Markov inequality. Illustration of the main models.
    9. Characteristic function
  • Reading list :



14 h Lecture
12 h Tutorial
4 h Practical
1 h DS

UE 5-2 Fondamental of Computer Science

114h

UE 5-3 Databases

64h

UE 5-4 Economics, Soft Skills

72h

UE 5-5 Languages

48h

UE 6-1 Probability and Statistics

112h
ECTS
Hours
112100
UE 6-1 Probability and Statistics
7
112 h
112110
Probability 3
2
34 h
  • Prerequisites :
    111150
  • Goals to be achieved :
    * Have a unified view of discrete and continuous random variable through basic notions of measure theory
    * Understand the notion of random vector and the related notions
    * Understand the usual convergences of random variables
    * Be able to use the classical theorems LGN and TCL
    * Be able to calculate conditional distributions, expectations and variances
  • Course details :
    - Chapter 1: Theory of measure in a nutshell
    - Chapter 2: Random vectors
    - Chapter 3: Independence of real random variables
    - Chapter 4: Moment generating function
    - Chapter 5: Convergence of sequences of real random variables and limit theorems
    - Chapter 6: Distributions and conditional expectations
  • Reading list :



14 h Lecture
14 h Tutorial
6 h Practical
1 h DS
112120
Statistical Inference
4
58 h
  • Prerequisites :
    112110
  • Goals to be achieved :
    * Understand the main concepts of inferential statistics: sampling, estimation, statistical testing
    * Be able to use the first decision support tools : estimators, parametric and non-parametric tests
    * Be able to implement applications in R
  • Course details :
    Sampling, descriptive statistics
    Estimation
    Minimum-variance unbiased estimation
    Method of moments, maximum likelihood and Bayesian estimators
    Confidence intervals
    Statistical hypothesis testing
    Tests for a single population : tests on parametres, goodness-of-fit, independance
    Tests for comparison of two populations
    Analysis of variance
  • Reading list :
    G. Saporta, "Probabilités, analyse des données et statistique", Editions Technip, 1990
    A.A. Borovkov, "Statistique mathématique", Editions Mir, 1987
28 h Lecture
18 h Tutorial
12 h Practical
1 h DS
112130
Linear regression
1
20 h
  • Prerequisites :
    112120
  • Goals to be achieved :
    * Be able to estimate and interpret a linear regression model
    * Be able to choose variables and set up a linear model
    * Be able to implement linear models in R
  • Course details :
    - The simple linear regression
    - The least squares criterion
    - The Gaussian case and the Gauss- Markov theorem
    - Multiple linear regression
    - Quality of a linear model
    - Selection of variables in linear regression
    - Applications in R
  • Reading list :
    Gilbert Saporta, Probabilités, analyse des donnees et statistiques, Ed. Technip
10 h Lecture
10 h Practical
1 h DS

UE 6-2 Decision Making

94h

UE 6-3 Computing

158h

UE 6-4 Project IS and Soft Skills

46h

UE 6-5 Languages

48h

UE 7-1 Statistical Modelling

100h
ECTS
Hours
113100
UE 7-1 Statistical Modelling
7
100 h
113110
Exploratory statistics
2.5
38 h
  • Prerequisites :
    112120
  • Goals to be achieved :
    * Be able to perform multivariate statistical analysis
    * Be able to interpret the results of a multivariate analysis
    * Be aware of the need for multivariate statistics in high dimension
    * Be able to implement applications in R and SPAD
  • Course details :
    1 Principal component analysis
    2 factorial correspondence analysis
    3 multiple corresponding Analysis
    4 classification method ( partitioning, hierarchical clustering )
    5 Linear Discriminant Analysis
  • Reading list :


    Gilbert Saporta: Probabilités, analyse des données et statistique (ed. Technip)
20 h Lecture
18 h Practical
1 h DS
113120
Supervised classification
1.5
20 h
  • Prerequisites :
    112130
  • Goals to be achieved :
    * Be able to build and evaluate a supervised classification statistical model
    * Be able to implement applications in R
  • Course details :
    1-Introduction + Reminders on classical tests (Chi2, Cramer, MANOVA)
    2-Factorial discriminant analysis
    3-Probabilistic discriminant analysis (homo and heteroscedastic)
    4-Rule evaluation (Bayes rule, ROC curve)
    5-Logistic regression
    (6-Random forest)
  • Reading list :
    Gilbert Saporta, Probabilités, analyse de données et statistique (ed. Technip)
8 h Lecture
2 h Tutorial
10 h Practical
1 h DS
113130
Markovian random models
3
42 h
  • Prerequisites :
    112110
  • Goals to be achieved :
    * Be able to model random Markovian dynamical systems in discrete and continuous time
    * Be able to predict their long time behavior
    * Be able to implement applications
  • Course details :
    Markov chains: definition, examples and properties

    Classification of states: irreducible classes, recurrence and transience, periodicity

    Stationary measures and limit theorems

    Properties of the exponential law

    Poisson process: definition and properties

    Markov jump processes: definition, transition rate and generator, stationary measures and limit theorems

    Queuing systems: Kendall's notation, M/M/1 queue example

    Project study of the long-time behavior of Markovian systems, with highlighting of this behavior by numerical simulations and statistical studies
  • Reading list :
    B. Ycart, "Modèles et algorithmes markoviens", Springer, 2002
22 h Lecture
20 h Tutorial
3 h DS

UE 7-2 Decision making

52h

UE 7-3 Software Engineering and Information Systems

58h

UE 7-4 Soft Skills

142h

UE 7-5 Languages

48h

UE 8-1 - Statistics and Decision Making

82h
ECTS
Hours
114100
UE 8-1 - Statistics and Decision Making
7
82 h
114110
Advanced Statistical Modelling
2
24 h
  • Prerequisites :
    113110|113120
  • Goals to be achieved :
    * Understand the penalized least squares methods for linear regression
    * Understand the contribution of non-parametric modeling in regression and classification
    * Be able to use random and fixed effects in an analysis of variance model
    * Be able to implement applications in R and Python
  • Course details :
    Ridge regression
    The lasso regression
    Regression quantile
    Additive models
    Analysis of variance repeated measures
    R and SAS applications
12 h Lecture
12 h Practical
1 h DS
114120
Time Series
2
24 h
10 h Lecture
14 h Practical
1 h DS
114130
Machine learning
2
22 h
  • Prerequisites :
    113110|113120|113210
  • Goals to be achieved :
    * Know the fundamental principles of machine learning
    * Be able to model machine learning problems
    * Be able to adopt a rigorous and scientific approach
    * Be able to use a library for machine learning
  • Course details :
    Lecture part :
    - Review the overall data processing chain
    - See or review different supervised learning methods
    - Understand the role of hyperparameters and understand automation
    - Take a look at neural networks and deep learning

    Practical part:
    - Familiarise yourself with the Scikit-Learn library (in Python)
    - Use supervised classification methods: decision trees and set methods
    - Compare different experimental protocols
    - Learn about classical hyperparameterisation methods
    - Approach neural networks and deep learning using the NNI platform
  • Reading list :

10 h Lecture
12 h Practical
1 h DS
114140
Data Privacy
1
12 h
  • Goals to be achieved :
    * Know the principles and actors of the data
    * Know the rights of individuals
    * Know the regulations on Internet tracers (such as cookies) and profiling
  • Course details :
    Outline :
    1. Introduction to Data (the aim is to introduce students to the world of Data, with a focus on personal data law and cybersecurity: an opportunity to explain the history of this protection and the Data challenges).
    2. Concepts and players (the aim is to define data protection terms - personal data / sensitive data / main principles of the RGPD)
    3. People's rights (the aim is to make students understand that everyone has rights over their data: it is therefore important to protect data as a Data project manager, for example, but also as a "person", knowing that the latter will also be able to obtain their data if they request it).
    4. Cookies and other tracers (the aim is to explain the regulations surrounding this subject, regulations which are changing a lot and which enormously concern future Data Scientists, Data Project Managers, etc. It's all about tracking web surfers on web sites. It is essential to program your cookies and trackers correctly, to avoid risking fines for your company).
    5. Profiling (As with cookies/trackers, profiling is an essential subject, subject to numerous sanctions in the event of non-compliance with consent when the latter is necessary, for example, or to justify a legitimate interest. It is essential to define profiling in accordance with European regulations and its impact on a future data project manager (e.g. in a company).

    "Quizzes" in Kahoot format are created to make the course dynamic and enable better learning.

    The exam is in MCQ format: 40 questions.
12 h Lecture
1 h DS

UE 8-2 - Software Systems and Databases

70h

UE 8-3 Projects IS

64h

UE 8-4 - Assistant Engineer Placement : Starting in mid-May (10-13 weeks)

120h

UE 8-5 Languages

48h

UE 9-1 - Specialisation courses

60h
ECTS
Hours
115100
UE 9-1 - Specialisation courses
5
60 h
115110
Introduction to Deep Learning
2
22 h
  • Prerequisites :
    112120|113110|113130
  • Goals to be achieved :
    * Be able to model and analyze a time series
    * Be able to predict its behavior
    * Be able to implement applications
  • Course details :
    Description of a time series
    Exponential smoothing methods
    Trends and seasonality
    ARMA stationary series modeling
    Non-stationary series modeling SARIMA
    Introduction to ARCH - GARCH processes
  • Reading list :
    * Gourieroux C. et Monfort A. : Cours de Séries Temporelles, Economica.

    * Applications réalisées à l'aide du logiciel R.
10 h Practical
1 h DS
115120
Ontologies
1
16 h
  • Prerequisites :
    113340
  • Goals to be achieved :
    * Being able to create an ontology to model the concepts and relationships of a knowledge domain.
    * Know how to use an ontology to create a knowledge base.
    * Know how to manipulate and query data in graph database.
  • Course details :
    - Introduction to the concept of ontology in computer science and knowledge bases
    - Modeling and creation of an ontology in OWL
    - Graph databases, knowledge graphs
  • Reading list :



8 h Practical
1 h DS
115130
Data Engineering
2
22 h
  • Prerequisites :
    113340|114130
  • Goals to be achieved :
    * Know the methods for ingesting data from several different types of source
    * Know how to manipulate data and prepare it for its final use
    * Know the different data storage areas
  • Course details :
    Although the Data Engineer does not directly create value from data, he/she is nonetheless an essential profile for all data teams, as it is he/she who implements the tools and technical processes required for the preparation and manipulation of data by other team members (Data Scientist, Data Analyst, etc.). The objectives of the introduction to data engineering module are to address both theoretically and practically the major concepts and tools of this profession, in particular by tackling the following points:
    - Data retrieval: How do you ingest data from several different types of source? Which ingestion method for which type of data (batch ingestion, real-time, etc.)?
    - Data transformation: How do you manipulate data to standardise it and prepare it for different business uses?
    - Data storage: How can data be stored in data lakes and data warehouses to make it usable by other data team members?
  • Reading list :


UE 9-2 - Elective Modules : 4 courses to be chosen from the following list

88h

UE 9-3 - Accounting, Management, Communication 3

75h

UE 9-4 - Final year Project

100h

Formative assessment

20h

UE 9-5 Languages

40h

Unit 10-1 Engineer placement

400h
ECTS
Hours
926100
Unit 10-1 Engineer placement
30
400 h
926110
Engineer Placement
30
400 h
400 h Project

Formative assessment

51h

Enterprising Challenge

35h

@ Polytech-Lille