Data Engineering and Artificial Intelligence
Data Analytics | Data Science | Big Data | Artificial Intelligence | Software
The department of Data Engineering and Artificial Intelligence trains multiskilled engineers in the processing of information, both in its statistical and computational forms, for use in various business professions. Given the cross-disciplinary nature of IT and statistics as disciplines, the areas of activity in which they appear are numerous, especially in the tertiary sector and the IT aspects of the secondary sector.
Apart from the teaching of scientific concepts and the mastery of the tools used, the courses taught place particular importance on the learning of methods (ie the qualities of rigor, curiosity and inventiveness in scientific approaches) and the development of the personality of the student (communication and listening skills, team work, autonomy…)
During the 3rd year (semesters 5 and 6), courses focus mainly on the teaching of fundamental subjects, based on the three main areas of specialization of the department – IT, statistics and humanities. The specific classes offered to student engineers depend on the particular areas of previous study.
During semesters 7 and 8, the student engineer deepens his/her knowledge of statistics (statistical modelling, exploratory statistics…), IT (object-oriented programming, architectural software, data bases, advanced systems….) and project management (information system projects, project management…). During this year, greater emphasis is placed on projects carried out in groups.
During the 9th and 10th semesters, the student chooses a specialization through the choice of optional courses. These include software engineering, data mining, statistical methods for marketing, the mathematics of new financial products, bio-statistics…The specialization is also determined through the choice of final year project and placement.
Classes are taught in French.
Activity sectors
- SSII (Sociétés de Services en Ingénierie Informatique – computer engineering and maintenance companies)
- Banking and Insurance
- Major retail outlets
- Public administration
- Research
Links with research units
Associated research unit :
- CRIStAL - (computer science, signal and automatic control : covers many digital science thems, from the most theoretical to the most applied... artificial intelligence, cybersecurity, digital health, robotics)
- Laboratoire Paul Painlevé - (probability and statistics, associated with the CNRS)
- -
In most cases, the lecturers in this speciality carry out their research activities within these laboratories. The student engineers can thus undertake projects or work placements in areas related to the specialities of these research laboratories.
Program
Program
UE 5-1 - Fundamental of Mathematics
- Goals to be achieved :
* Review the fundamental concepts of mathematics (part 1) - Course details :
- Reasoning and logic
- Limits, continuity, derivation, integration (on a segment)
- Exponential and logarithm functions
- Study of functions
- Trigonometry and complex numbers
- Numerical sequencies - Reading list :
- Prerequisites :
111110
- Goals to be achieved :
* Review the fundamental concepts of mathematics (part 2) - Course details :
Analysis
- Multiple integrals (bounded region)
- Polar coordinates
- Series expansions
Algebra
- Determinant of a matrix
- Inversion of matrices
- Solving systems of linear equations - Reading list :
- Prerequisites :
111110|111120
- Goals to be achieved :
* Review the fundamental concepts of mathematics (part 3) - Course details :
Analysis
- Improper integrals
- Numerical series and power series
- Sequences and series of functions
Algebra
- Normed vector spaces
- Scalar product, Euclidean norm, orthogonality
- Gram-Schmidt orthonormalisation
- Orthogonal projection - Reading list :
28 h Tutorial
3 h DS
- Goals to be achieved :
* Be familiar with tools for counting (arrangements, permutations and combinations) of finite sets, especially for computing probabilities in the equiprobable case.
* Understand the notion of a probability space and its elementary properties. Be able to use the vocabulary of the set theory to describe events.
* Understand the notion of independence of events and be able to calculate (conditional) probabilities with additional information - Course details :
1. Enumeration
1.1 Vocabulary for sets
1.2 Finite sets and enumeration
1.3 Countable sets
2. Probability
2.1 Notion of random experiment
2.2 Events
2.3 Probability as a function of sets
3. Conditioning and independence
3.1 Conditioning
3.2 Independence of events - Reading list :
10 h Tutorial
1 h DS
- Prerequisites :
111140|111130|111250
- Goals to be achieved :
* Understand the concepts of real random variable and probability law
* Be able to describe and interpret the distribution function of a variable
* Be able to describe and interpret the laws of discrete variables by the mass function and that of continuous variables by the density
* Know the usual laws
* Be able to calculate the expectation and moments of a variable
* Understand the notion of characteristic function of a variable - Course details :
1. Definition of a random variable (we will consider the notion of sigma algebra and probability).
2. Types of random variables: quantitative, qualitative.
3. Discrete and continuous real random variables.
4. Probability distribution of a discrete random variable. Discrete random variable models (Uniform, Binomial, Poisson, Geometric).
5. Probability distribution of a continuous random variable (density). Continuous random variable models (Uniform, Exponential, Normal).
6. Distribution function. Properties (quantiles, link with density, transformation of a v.a.).
7. Characterization of an a.v.: moments of a random variable.
8. Properties of expectation and variance. Markov inequality. Illustration of the main models.
9. Characteristic function - Reading list :
12 h Tutorial
4 h Practical
1 h DS
UE 5-2 Fondamental of Computer Science
UE 5-3 Databases
UE 5-4 Economics, Soft Skills
UE 5-5 Languages
UE 6-1 Probability and Statistics
- Prerequisites :
111150
- Goals to be achieved :
* Have a unified view of discrete and continuous random variable through basic notions of measure theory
* Understand the notion of random vector and the related notions
* Understand the usual convergences of random variables
* Be able to use the classical theorems LGN and TCL
* Be able to calculate conditional distributions, expectations and variances - Course details :
- Chapter 1: Theory of measure in a nutshell
- Chapter 2: Random vectors
- Chapter 3: Independence of real random variables
- Chapter 4: Moment generating function
- Chapter 5: Convergence of sequences of real random variables and limit theorems
- Chapter 6: Distributions and conditional expectations - Reading list :
14 h Tutorial
6 h Practical
1 h DS
- Prerequisites :
112110
- Goals to be achieved :
* Understand the main concepts of inferential statistics: sampling, estimation, statistical testing
* Be able to use the first decision support tools : estimators, parametric and non-parametric tests
* Be able to implement applications in R - Course details :
Sampling, descriptive statistics
Estimation
Minimum-variance unbiased estimation
Method of moments, maximum likelihood and Bayesian estimators
Confidence intervals
Statistical hypothesis testing
Tests for a single population : tests on parametres, goodness-of-fit, independance
Tests for comparison of two populations
Analysis of variance - Reading list :
G. Saporta, "Probabilités, analyse des données et statistique", Editions Technip, 1990
A.A. Borovkov, "Statistique mathématique", Editions Mir, 1987
18 h Tutorial
12 h Practical
1 h DS
- Prerequisites :
112120
- Goals to be achieved :
* Be able to estimate and interpret a linear regression model
* Be able to choose variables and set up a linear model
* Be able to implement linear models in R - Course details :
- The simple linear regression
- The least squares criterion
- The Gaussian case and the Gauss- Markov theorem
- Multiple linear regression
- Quality of a linear model
- Selection of variables in linear regression
- Applications in R - Reading list :
Gilbert Saporta, Probabilités, analyse des donnees et statistiques, Ed. Technip
10 h Practical
1 h DS
UE 6-2 Decision Making
UE 6-3 Computing
UE 6-4 Project IS and Soft Skills
UE 6-5 Languages
UE 7-1 Statistical Modelling
- Prerequisites :
112120
- Goals to be achieved :
* Be able to perform multivariate statistical analysis
* Be able to interpret the results of a multivariate analysis
* Be aware of the need for multivariate statistics in high dimension
* Be able to implement applications in R and SPAD - Course details :
1 Principal component analysis
2 factorial correspondence analysis
3 multiple corresponding Analysis
4 classification method ( partitioning, hierarchical clustering )
5 Linear Discriminant Analysis - Reading list :
Gilbert Saporta: Probabilités, analyse des données et statistique (ed. Technip)
18 h Practical
1 h DS
- Prerequisites :
112130
- Goals to be achieved :
* Be able to build and evaluate a supervised classification statistical model
* Be able to implement applications in R - Course details :
1-Introduction + Reminders on classical tests (Chi2, Cramer, MANOVA)
2-Factorial discriminant analysis
3-Probabilistic discriminant analysis (homo and heteroscedastic)
4-Rule evaluation (Bayes rule, ROC curve)
5-Logistic regression
(6-Random forest) - Reading list :
Gilbert Saporta, Probabilités, analyse de données et statistique (ed. Technip)
2 h Tutorial
10 h Practical
1 h DS
- Prerequisites :
112110
- Goals to be achieved :
* Be able to model random Markovian dynamical systems in discrete and continuous time
* Be able to predict their long time behavior
* Be able to implement applications - Course details :
Markov chains: definition, examples and properties
Classification of states: irreducible classes, recurrence and transience, periodicity
Stationary measures and limit theorems
Properties of the exponential law
Poisson process: definition and properties
Markov jump processes: definition, transition rate and generator, stationary measures and limit theorems
Queuing systems: Kendall's notation, M/M/1 queue example
Project study of the long-time behavior of Markovian systems, with highlighting of this behavior by numerical simulations and statistical studies - Reading list :
B. Ycart, "Modèles et algorithmes markoviens", Springer, 2002
20 h Tutorial
3 h DS
UE 7-2 Decision making
UE 7-3 Software Engineering and Information Systems
UE 7-4 Soft Skills
UE 7-5 Languages
UE 8-1 - Statistics and Decision Making
- Prerequisites :
113110|113120
- Goals to be achieved :
* Understand the penalized least squares methods for linear regression
* Understand the contribution of non-parametric modeling in regression and classification
* Be able to use random and fixed effects in an analysis of variance model
* Be able to implement applications in R and Python - Course details :
Ridge regression
The lasso regression
Regression quantile
Additive models
Analysis of variance repeated measures
R and SAS applications
12 h Practical
1 h DS
14 h Practical
1 h DS
- Prerequisites :
113110|113120|113210
- Goals to be achieved :
* Know the fundamental principles of machine learning
* Be able to model machine learning problems
* Be able to adopt a rigorous and scientific approach
* Be able to use a library for machine learning - Course details :
Lecture part :
- Review the overall data processing chain
- See or review different supervised learning methods
- Understand the role of hyperparameters and understand automation
- Take a look at neural networks and deep learning
Practical part:
- Familiarise yourself with the Scikit-Learn library (in Python)
- Use supervised classification methods: decision trees and set methods
- Compare different experimental protocols
- Learn about classical hyperparameterisation methods
- Approach neural networks and deep learning using the NNI platform - Reading list :
12 h Practical
1 h DS
- Goals to be achieved :
* Know the principles and actors of the data
* Know the rights of individuals
* Know the regulations on Internet tracers (such as cookies) and profiling - Course details :
Outline :
1. Introduction to Data (the aim is to introduce students to the world of Data, with a focus on personal data law and cybersecurity: an opportunity to explain the history of this protection and the Data challenges).
2. Concepts and players (the aim is to define data protection terms - personal data / sensitive data / main principles of the RGPD)
3. People's rights (the aim is to make students understand that everyone has rights over their data: it is therefore important to protect data as a Data project manager, for example, but also as a "person", knowing that the latter will also be able to obtain their data if they request it).
4. Cookies and other tracers (the aim is to explain the regulations surrounding this subject, regulations which are changing a lot and which enormously concern future Data Scientists, Data Project Managers, etc. It's all about tracking web surfers on web sites. It is essential to program your cookies and trackers correctly, to avoid risking fines for your company).
5. Profiling (As with cookies/trackers, profiling is an essential subject, subject to numerous sanctions in the event of non-compliance with consent when the latter is necessary, for example, or to justify a legitimate interest. It is essential to define profiling in accordance with European regulations and its impact on a future data project manager (e.g. in a company).
"Quizzes" in Kahoot format are created to make the course dynamic and enable better learning.
The exam is in MCQ format: 40 questions.
1 h DS
UE 8-2 - Software Systems and Databases
UE 8-3 Projects IS
UE 8-4 - Assistant Engineer Placement : Starting in mid-May (10-13 weeks)
UE 8-5 Languages
UE 9-1 - Specialisation courses
- Prerequisites :
112120|113110|113130
- Goals to be achieved :
* Be able to model and analyze a time series
* Be able to predict its behavior
* Be able to implement applications - Course details :
Description of a time series
Exponential smoothing methods
Trends and seasonality
ARMA stationary series modeling
Non-stationary series modeling SARIMA
Introduction to ARCH - GARCH processes - Reading list :
* Gourieroux C. et Monfort A. : Cours de Séries Temporelles, Economica.
* Applications réalisées à l'aide du logiciel R.
1 h DS
- Prerequisites :
113340
- Goals to be achieved :
* Being able to create an ontology to model the concepts and relationships of a knowledge domain.
* Know how to use an ontology to create a knowledge base.
* Know how to manipulate and query data in graph database. - Course details :
- Introduction to the concept of ontology in computer science and knowledge bases
- Modeling and creation of an ontology in OWL
- Graph databases, knowledge graphs - Reading list :
1 h DS
- Prerequisites :
113340|114130
- Goals to be achieved :
* Know the methods for ingesting data from several different types of source
* Know how to manipulate data and prepare it for its final use
* Know the different data storage areas - Course details :
Although the Data Engineer does not directly create value from data, he/she is nonetheless an essential profile for all data teams, as it is he/she who implements the tools and technical processes required for the preparation and manipulation of data by other team members (Data Scientist, Data Analyst, etc.). The objectives of the introduction to data engineering module are to address both theoretically and practically the major concepts and tools of this profession, in particular by tackling the following points:
- Data retrieval: How do you ingest data from several different types of source? Which ingestion method for which type of data (batch ingestion, real-time, etc.)?
- Data transformation: How do you manipulate data to standardise it and prepare it for different business uses?
- Data storage: How can data be stored in data lakes and data warehouses to make it usable by other data team members? - Reading list :
