cover image
Institut DataIA Paris-Saclay

Institut DataIA Paris-Saclay

dataia.eu

1 Job

15 Employees

About the Company

Créé en 2017 dans le cadre de la Stratégie Nationale pour l’Intelligence Artificielle, l’Institut DataIA est le pôle d’excellence en IA de l’Université Paris-Saclay. Il fédère 14 établissements d’enseignement supérieur et de recherche, dont CentraleSupélec et l’ENS Paris-Saclay, ainsi que des organismes nationaux et partenaires académiques. L’Institut œuvre à structurer l’écosystème IA autour de la recherche, de la formation et de l’innovation, avec des actions phares comme le projet SaclAI-School, lauréat de l’appel Compétences et Métiers d’Avenir.

Listed Jobs

Company background Company brand
Company Name
Institut DataIA Paris-Saclay
Job Title
Internship - Transfer learning models able to handle MISSing data for the survival analysis of rare cancer from multi-OMICS data
Job Description
**Job title** Internship – Transfer Learning Models for Survival Analysis of Rare Cancer with Missing Multi‑Omics Data **Role Summary** Develop and evaluate machine learning models that leverage transfer learning and joint dimensionality‑reduction to predict survival outcomes for rare cancers, using heterogeneous multi‑omics data with missing values. **Expectations** - Participate in end‑to‑end model development from data preprocessing to deployment. - Produce reproducible code and documentation. - Present experimental results in team meetings. - Collaborate with statisticians, bioinformaticians, and clinicians. **Key Responsibilities** 1. Acquire, clean, and preprocess multi‑omics datasets (genomics, epigenomics, transcriptomics, proteomics). 2. Design and implement missing‑data handling strategies (imputation, model‑based methods). 3. Build and train transfer‑learning architectures that incorporate clinical covariates and omics features. 4. Apply joint dimension‑reduction techniques to reduce high dimensionality before survival analysis. 5. Perform survival analysis using appropriate statistical models (Cox, DeepSurv, random survival forests). 6. Evaluate model performance with metrics such as concordance index, time‑dependent AUC, and calibration plots. 7. Compare new methods to baseline clinical‑only models and existing multi‑omics methods. 8. Document code, data pipelines, and experimental results for reproducibility. **Required Skills** - Programming: Python, R (or Java/Scala optional). - Machine learning frameworks: TensorFlow, PyTorch, or Keras. - Survival analysis libraries: lifelines, scikit-survival, or equivalent. - Statistical modeling and inference. - Experience with high‑dimensional data and dimensionality‑reduction (PCA, NMF, MultiFA, CCA). - Familiarity with missing data techniques (multiple imputation, EM, matrix completion). - Basic knowledge of genomics and bioinformatics pipelines. - Good version‑control usage (Git) and documentation practices. **Required Education & Certifications** - Current enrollment in or completion of a Bachelor’s or Master’s program in Computer Science, Statistics, Bioinformatics, Applied Mathematics, or related field. - Coursework or projects demonstrating experience with data science or machine learning. - No specific certifications required; familiarity with genomics resources (ENCODE, TCGA, GTEx) is a plus.
Gif-sur-yvette, France
On site
20-01-2026