1 Topics Related to Data Analytics, Data Science and Machine Learning
The topics in this category involve a comprehensive data analytics or machine learning pipeline to analyze an existing dataset. This process may include the following steps:
- Exploratory Data Analysis
- Data Preprocessing
- Model Building
- Model Evaluation
These topics are ideal for students who are familiar with Python or R and have a strong interest in data science. Advanced programming skills are not required.
1.1 LLM Prompt for Privacy Reducing Sensitivity Leakage in LLM Applied to Vehicular Systems
Background, Motivation, Problems, and Research Gaps
Large Language Models (LLMs)—such as GPT-4, Gemini, and Llama—are increasingly used to analyze complex mobility datasets, including vehicle telematics. These datasets contain time-stamped information on vehicle operations, energy consumption, driving behavior, and diagnostic events, often requiring time series imputation to address missing or irregular data points.
However, applying LLMs to telematics introduces a fundamental risk: sensitivity leakage
Telematics often exposes personal or confidential information such as: location traces, driver identity patterns, behavioral signatures, route preferences,or operational performance linked to individuals or fleets.
While privacy-preserving methods like anonymization, differential privacy, or regulated access control exist, they remain model-external and do not address how LLMs generate outputs based on sensitive prompts.
This leaves a large gap: How can prompt engineering itself be used as a privacy-preserving layer for LLMs?
Current literature does not systematically evaluate how prompt structure, abstraction levels, role assignment, and reasoning strategies influence the amount of sensitive information revealed by an LLM analyzing imputed telematics data.
Similarly, little is known about how different prompting styles interact with telemetry imputation quality, LLM model architectures, and sensitivity detection frameworks.
This research addresses these gaps by exploring whether prompt engineering can serve as a practical, lightweight mechanism to reduce sensitivity leakage while retaining analytical utility.
Research Questions
- How do different prompting strategies affect sensitivity leakage when LLMs analyze imputed vehicle telematics?
- How does time series imputation quality influence prompt behavior and leakage levels?
Tasks
- Prepare and preprocess a vehicle telematics dataset with missing values using LSTM-based time series imputation.
- Implement and evaluate 17 prompting strategies, such as: Zero-Shot, Few-Shot, Chain-of-Thought, ReAct, Self-Critique, Step-Back, Meta-Prompting, Output Priming, Role & Style prompting.
- Measure sensitivity leakage using: automated PII detectors, manual tagging, Sensitivity Leakage Rate (SLR), utility preservation metrics.
- Compare model behaviors across LLM platforms, including GPT-4o, Gemini-1.5, Gemini-2.0, and Llama-3.3 70B.
- Develop a prompt selection framework for privacy-aware analytics in automotive telematics use-cases.
Ideal for:
Students with strong interest in:
- AI safety and privacy
- Large Language Models
- Prompt engineering
- Time series analytics
- Vehicle telematics and mobility informatics
- Python-based LLM evaluation
Contact: Ega Rudy Graha <egraha@constructor.university>
1.2 Ground Truth Validation for Causal Discovery in Vehicle Telematics Using Dataset-Derived Lifecycle Features
Background, Motivation, Problems, and Research Gaps
Modern mobility systems generate large-scale telematics datasets containing time-stamped information on driving behavior, battery dynamics, charging sessions, environmental conditions, energy consumption, and the states of components. These datasets provide rich observational evidence of how vehicle subsystems interact with time.
Causal Discovery (CD) methods, such as PC, GES, NOTEARS, LiNGAM, DAG-GNN, and PCMCI+ can infer causal relationships from observational data. Such causal graphs are increasingly used in the following areas:
- predictive maintenance,
- degradation pathway modeling,
- anomaly detection, and
- Fleet-level risk analytics.
However, a critical challenge limits the reliability of Causal Discovery in real telematics settings: the absence of Ground Truth causal structures.
Telematics datasets typically lack explicit causal information. Unlike simulated datasets or benchmark nonlinear systems, vehicle manufacturers do not release underlying subsystem dependency graphs. This creates a major methodological gap.
How can we evaluate the correctness of a causal discovery algorithm if the true causal structure is unknown?
Existing validation approaches rely on the following:
- domain experts (subjective, inconsistent),
- simplistic synthetic datasets (not realistic),
- correlation-based heuristics (not causal).
Missing: a dataset-derived, engineering-consistent, and reproducible Ground Truth mechanism.
This thesis fills this gap by transforming the telematics dataset’s lifecycle features into a Ground Truth Causal Graph based on the following:
- mechanical and electrical engineering principles,
- time-ordered dependencies (temporal causality),
- physical constraints (battery physics, HVAC (Heating, Ventilation, and Air Conditioning) thermodynamics),
- automotive system logic (charging states and trip sequences).
This Ground Truth will serve as a baseline to test and compare the performance of multiple causal discovery algorithms.
Research Questions
- How can a Ground Truth causal graph be systematically constructed from vehicle telematics lifecycle features?
- How accurately do different causal discovery algorithms reconstruct the Ground Truth? Which feature groups (battery, driving behavior, environmental conditions, charging events, and trip summaries) have the strongest causal identifiability?
- How does dataset preprocessing influence the causal recovery performance?
Tasks
You will:
- A multivariate telematics dataset containing variables such as SoC, temperature, charging power, speed, distance, trip duration, HVAC data, and operational flags was prepared and preprocessed.
- Engineering a Ground Truth Causal Graph using
- physical battery equations (SoC dynamics),
- system-level logic (plug state → charging state),
- environmental influence modeling (outside temp → HVAC load)
- automotive control dependencies (speed → energy consumption),
- Temporal ordering constraints.
- Implement and benchmark causal discovery algorithms, including
- PC, FCI (constraint-based)
- GES, NOTEARS (score-based)
- LiNGAM, CAM (functional)
- DAG-GNN (deep learning-based)
- PCMCI+ (time-series causal method)
- The causal reconstruction accuracy was evaluated using:
- Structural Hamming Distance (SHD)
- Adjacency Precision, Recall, F1
- Orientation Accuracy (arrowhead correctness)
- CPDAG / Markov equivalence class comparison
- Sensitivity to feature grouping and preprocessing
- Develop a benchmarking framework for dataset-derived Ground Truth validation of causal discovery in telematics.
Ideal for:
Students with a strong interest in:
- Causal inference & causal discovery
- Time series analytics
- Automotive telematics & mobility informatics
- Machine learning and deep learning
- Data-driven engineering
- Python-based causal modeling frameworks
- Applied graph theory
Contacts: Ega Rudy Graha <egraha@constructor.university>
1.3 Object Detection of Surgical Equipment in Hospitals Using Explainable Deep Learning
Background, Motivation, and Problem
Hospitals rely heavily on surgical equipment to perform critical medical procedures. Proper management and timely maintenance of these instruments are essential to ensure patient safety, reduce infection risks, and optimize operational efficiency. Currently, the tracking and monitoring of surgical equipment usage are often manual or semi-automated, which can lead to errors, misplaced items, or overuse beyond safe limits. Advances in deep learning and computer vision have shown promise in object detection and recognition tasks across various domains. However, applying these technologies to surgical equipment management in hospitals remains underexplored.
The motivation for this research is to leverage explainable deep learning techniques to develop an automated system capable of accurately detecting and recognizing surgical instruments in real time. This system will categorize equipment and issue alerts when instruments exceed their usage time or limits, thereby improving hospital workflow, equipment maintenance, and patient safety.
Research Gaps
- Limited application of deep learning for surgical equipment detection in real hospital environments.
- Lack of explainability in existing deep learning models makes it difficult for medical staff to trust automated decisions.
- Insufficient integration of object detection with equipment usage tracking and alert systems.
- Challenges in handling diverse and visually similar surgical instruments under varying lighting and occlusion conditions.
- Limited research on multi-category grouping and usage limit warnings for surgical tools using AI.
Research Questions
- How can deep learning models be optimized to accurately detect and recognize various surgical instruments in hospital settings?
- What explainability techniques can be integrated to enhance trust and interpretability of the deep learning model’s decisions by medical staff?
- How can the system effectively categorize surgical equipment and track its usage to provide timely warnings for overuse?
- What are the challenges in deploying such a system in real-time hospital environments, and how can they be addressed?
- How can the system be designed to integrate seamlessly with existing hospital management workflows?
Tasks
- Conduct a comprehensive literature review on object detection, explainable AI, and surgical equipment management.
- Collect and annotate a dataset of surgical equipment images/videos from hospital environments.
- Develop and train a deep learning model for real-time object detection and recognition of surgical instruments.
- Implement explainability methods (e.g., Grad-CAM, LIME) to provide insights into the model’s predictions.
- Design a categorization module to group detected equipment by type.
- Develop a usage tracking system that monitors the usage time/limit of each instrument and triggers alerts when thresholds are exceeded.
- Evaluate the system’s performance in terms of accuracy, explainability, and usability in simulated or real hospital scenarios.
- Propose recommendations for integration and deployment in hospital management systems.
Contact: Rahmat Hidayat <rhidayat@constructor.university>
1.4 Graph Neural Networks for Knowledge Graph Construction and Completion for Resilient and Sustainable Supply Chain Management
Background, Motivation, and Problem
Knowledge graphs (KGs) are increasingly used to represent complex domains such as supply chains, logistics, regulatory compliance, or ESG reporting by encoding entities (e.g., suppliers, products, facilities) and their relations (e.g., delivers, certified-by, emits-to). However, real-world KGs are rarely complete or clean: they are populated from heterogeneous text sources (reports, regulations, contracts) and structured systems (ERP, logistics, emissions databases, certifications), which leads to missing links, ambiguous entities, duplicated nodes, and outdated information.
Graph Neural Networks (GNNs) have emerged as a powerful class of models that leverage graph structure through message passing and neighborhood aggregation. Recent surveys position GNNs as core tools for several KG tasks: (1) populating graphs from text and structured data via improved relation extraction and entity disambiguation, (2) completing and enriching KGs by predicting missing edges and attributes, (3) aligning and merging KGs from different systems, and (4) maintaining and evolving KGs by detecting anomalies and inconsistent links. At the same time, practitioners still mostly rely on classical KG embedding models (e.g., TransE/DistMult) and ad-hoc rules, with limited understanding of when and how GNN-based approaches provide tangible benefits.
The motivation of this thesis is to systematically study how GNN-based methods for knowledge graph completion compare to classical embedding models and how their insights can support the broader lifecycle of KGs (population, alignment, maintenance). The work will implement a minimal yet reproducible pipeline for GNN-based KG completion, evaluate it on a public benchmark, and relate the findings to practical use-cases such as combining existing resilient supply chain ontologies with new domain knowledge.
Research Gaps
- Limited empirical comparisons of GNN-based KG completion (e.g., R-GCN, CompGCN, GraIL-style subgraph models) versus classical KG embedding methods on standard benchmarks, especially with a focus on practical reproducibility.
- Insufficient understanding of the trade-offs between relation-aware neighborhood message passing and subgraph-based reasoning for different KG structures and sparsity patterns.
- Fragmented treatment of KG tasks: population, completion, alignment, and maintenance are often studied in isolation rather than as interconnected steps in one KG lifecycle.
- Lack of small, well-documented GNN pipelines that students and practitioners can easily adopt, extend, and apply to domain-specific ontologies (e.g., resilient supply chain knowledge graphs).
- Limited analysis of failure modes, limitations, and anomaly patterns in GNN-based KG models, which is needed to make them trustworthy for real-world decision support.
Research Questions
- KG completion (core)
- How do GNN-based KG completion methods exploit neighborhood structure to outperform (or fail to outperform) classical KG embedding models?
- What are the practical trade-offs between relation-aware message passing (e.g., R-GCN/CompGCN) and subgraph reasoning (GraIL-type methods) in terms of accuracy, scalability, and implementation complexity?
- KG construction from text/structured data (optional angle)
- To what extent can GNNs improve relation extraction and entity disambiguation when populating KGs from documents and heterogeneous structured sources, compared to a transformer-only pipeline?
- Entity alignment and maintenance (optional angle)
- How well do GNN-based entity alignment methods handle noisy duplicates across different KGs, and which signals (graph structure, attributes, text embeddings) contribute most?
- Can GNN-based anomaly detection identify outdated or inconsistent links in evolving KGs, and how can such signals support KG maintenance workflows?
Tasks
- Conduct a structured literature review on:
- GNN fundamentals (message passing, oversmoothing, heterophily, attention; inductive vs. transductive learning),
- KG completion problem setup and evaluation metrics (MRR, Hits@K),
- GNN-for-KGC taxonomies, including neighborhood-based and subgraph-based approaches, and their relation to the four KG use-cases (population, completion, alignment, maintenance).
- Produce a concise “concept map” that links GNN concepts to the four KG tasks relevant for real-world applications.
- Implement a baseline KG embedding model (e.g., DistMult or TransE) on a public KG completion dataset, and report standard metrics (MRR, Hits@K).
- Implement at least one GNN-based KG completion model (e.g., R-GCN, optionally CompGCN or a GraIL-family subgraph model) using a modern PyTorch Geometric pipeline, ensuring reproducible training and evaluation.
- Systematically compare GNN-based models with the baseline embedding model in terms of performance, training stability, data efficiency, and computational cost; analyze failure cases and limitations.
- (Optional) Develop a small proof-of-concept extension for one of the following:
- KG population from text (NER + relation extraction + triples → KG, optionally with GNN refinement),
- GNN-based entity alignment for merging two small KGs, or
- Neighborhood-based anomaly detection for KG maintenance.
- Relate the experimental findings to a concrete domain ontology (e.g., resilient/robotic supply chain ontology) by illustrating how improved KG completion or alignment could support more robust decision-making.
- Deliver a final thesis document and a clean GitHub repository with code, configuration files, documentation, and result logs, following the agreed evaluation rubric (literature review, experiment correctness, analysis, and repo hygiene).
Contact: Edwin Hartarto <edwin.hartarto91@gmail.com>
1.5 Comparative Study of Causal Discovery Methods: Machine Learning Approaches (Causal Trees/Forests) vs. Functional Models (LiNGAM, RESIT)
Background, Motivation, Problems, and Research Gaps:
Understanding causal relationships is crucial for informed decision-making in fields like healthcare, economics, and social sciences. Causal discovery methods are divided mainly into two categories: machine learning-based methods (e.g., Causal Trees and Causal Forests) and functional models (e.g., Linear Non-Gaussian Acyclic Model (LiNGAM), Regression with Subsequent Independence Test (RESIT)).
Machine learning models like Causal Trees and Forests are popular for their ability to handle complex, non-linear interactions and to estimate heterogeneous treatment effects. Functional models, on the other hand, rely on explicit mathematical formulations to identify causal directions, offering direct interpretability. Despite the individual strengths of these approaches, there is a lack of comprehensive comparative analysis between machine learning-based causal discovery and functional models, particularly in terms of accuracy, robustness, and interpretability. This research aims to fill this gap by evaluating and contrasting these methods in various data scenarios.
Research Questions:
- How do machine learning-based causal discovery methods (e.g., Causal Trees, Causal Forests) compare with functional models (e.g., LiNGAM, RESIT) in terms of accuracy in identifying causal relationships?
- What are the strengths and limitations of each approach in different types of data, including linear, non-linear, and noisy datasets?
- How do machine learning-based and functional causal discovery methods differ in terms of interpretability and computational efficiency?
Tasks:
- Conduct a literature review on causal discovery methods, focusing on machine learning-based approaches (Causal Trees, Causal Forests, NOTEARS) and functional models (LiNGAM, RESIT).
- Implement the selected causal discovery methods using Python libraries such as CausalML, EconML for causal trees/forests, and CausalDiscoveryToolbox (cdt), CausalLearn for functional models.
- Apply these methods to real-world datasets with known causal structures, incorporating varying levels of complexity and noise.
- Compare the performance of these methods based on metrics like Structural Hamming Distance (SHD), True Positive Rate (TPR) / Sensitivity / Recall, False Positive Rate (FPR), Edge Orientation Accuracy, etc.
- Analyze the results to identify scenarios where one approach outperforms the other and provide recommendations for selecting appropriate methods for different types of causal discovery tasks.
Ideal for:
Students with a background in data science and machine learning who are interested in causal inference, model comparison, and the practical application of different causal discovery techniques. Proficiency in Python and familiarity with libraries such as scikit-learn, Causal Learn, CausalDiscoveryToolbox (cdt), CausalML, and lingam will be advantageous.
1.6 Uncovering Causal Relationships in Text Documents with Large Language Models (LLMs)
Background, Motivation, Problems, and Research Gaps:
Causal discovery is crucial for understanding cause-effect relationships in various domains like healthcare, economics, and social sciences. Traditional causal discovery methods rely heavily on structured numerical data, but a vast amount of causal knowledge exists in unstructured text formats, such as research papers, news articles, and policy documents. With recent advancements in natural language processing (NLP), Large Language Models (LLMs) like GPT-3 and GPT-4 have demonstrated impressive abilities in understanding complex language patterns, including the identification of causal relationships embedded in text.
However, using LLMs for causal discovery from text remains underexplored. The challenge lies in extracting meaningful causal relationships while differentiating correlation from true causation. Existing research has yet to fully utilize LLMs’ potential in automatically extracting and interpreting causal information from large corpora of text data. This research aims to bridge this gap by exploring how LLMs can be leveraged for causal discovery in unstructured text documents.
Research Questions:
- How effectively can LLMs extract causal relationships from unstructured text documents?
- What are the limitations of using LLMs in distinguishing between correlation and true causation in textual data?
- How can prompt engineering or fine-tuning improve the accuracy of causal extraction using LLMs?
Tasks:
- Conduct a literature review on causal discovery methods, focusing on techniques using LLMs for text analysis.
- Collect and preprocess a dataset of text documents from a domain of interest (e.g., medical articles, policy papers) that contain causal statements.
- Use prompt engineering and/or fine-tuning to train an LLM (e.g., GPT-3, GPT-4) for extracting causal relationships from the text.
- Extract causal pairs and represent them in a structured format (e.g., causal graphs) using tools like networkx.
- Evaluate the effectiveness of the LLM-based causal discovery by comparing the extracted relationships against known causal knowledge or benchmarks.
- Analyze the results to identify strengths, limitations, and potential improvements in using LLMs for causal discovery from text.
Ideal for:
Students interested in natural language processing, causal inference, and the application of large language models to complex data analysis tasks. Basic knowledge of Python and experience with NLP tools (e.g., Hugging Face Transformers, OpenAI API) will be advantageous. This project is ideal for those curious about how advanced AI models can be applied to extract meaningful insights from unstructured textual data.
1.7 Causal AI in Federated Learning for ESG Data Causal Analysis
Background, Motivation, Problems, and Research Gaps
Understanding the causal relationships within ESG data (e.g., the impact of sustainability initiatives on company performance) is more informative than simple correlations. Causal AI models can identify and quantify these relationships, but using centralized data for such analysis raises privacy concerns. By integrating causal AI into federated learning, we can perform causal analysis on distributed ESG data without exposing sensitive information. This study addresses the research gap in combining causal inference techniques with federated learning for privacy-preserving causal analysis of ESG factors.
Research Questions
- How can causal AI be integrated into federated learning for ESG data analysis?
- What challenges arise when applying causal inference in a federated learning context?
- How can federated causal AI models improve the understanding of ESG factors’ impact on sustainability performance?
Tasks
- Implement a federated learning system using PySyft or TensorFlow Federated.
- Integrate causal AI models (e.g., CausalNex, DoWhy) into the federated learning environment.
- Develop a mechanism to aggregate causal analysis results from different clients while preserving privacy.
- Test the integrated system on ESG datasets to identify and interpret causal relationships.
Ideal for: Students with a strong interest in Python programming, causal inference, federated learning, and ESG data analysis.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
1.8 Causal Discovery in ESG Data Using Federated Learning
Background, Motivation, Problems, and Research Gaps
Identifying causal relationships in ESG data is essential for understanding how various factors, such as corporate policies or environmental initiatives, influence sustainability outcomes. Traditional methods rely on centralized data, which poses privacy concerns. Federated learning allows decentralized causal discovery, enabling multiple organizations to contribute to causal analysis without exposing sensitive data. The research gap lies in applying causal discovery algorithms within federated learning frameworks to reveal actionable ESG insights.
Research Questions
- How can causal discovery algorithms be integrated into federated learning for ESG data analysis?
- What are the key challenges in performing causal analysis on distributed data?
- How can federated causal discovery enhance our understanding of ESG factors affecting organizational performance?
Tasks
- Implement a federated learning system using Flower or PySyft.
- Integrate causal discovery algorithms (e.g., PC algorithm, GES) within the federated environment using Causal Learn library.
- Develop a method for sharing and aggregating causal structures among clients.
- Apply the system to ESG datasets to identify causal relationships and validate findings.
Ideal for: Students with an interest in Python, causal AI, and federated learning techniques applied to ESG analysis.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
1.9 Integrating LLMs into Federated Learning for Automated ESG Report Generation
Background, Motivation, Problems, and Research Gaps
ESG reports provide critical insights into a company’s sustainability performance, but their generation involves complex data processing and a careful balance between data utility and privacy. Traditional methods often involve manual data compilation and risk of data exposure. Integrating Large Language Models (LLMs) like GPT into federated learning frameworks can automate the generation of these reports while preserving data privacy. However, there is a gap in research focusing on how LLMs can be effectively fine-tuned within federated environments for comprehensive and secure ESG reporting.
Research Questions
- How can LLMs be integrated into a federated learning setup to facilitate automated ESG report generation?
- What privacy and data security challenges arise when using federated learning for ESG data?
- How does the quality of ESG reports generated by federated LLMs compare to those generated through traditional methods?
Tasks
- Set up a federated learning environment using a framework like Flower or PySyft.
- Integrate a pre-trained LLM (e.g., BERT, GPT) into this federated setup.
- Develop a pipeline for ESG data collection, processing, and automated report generation.
- Evaluate the generated reports for quality, privacy, and accuracy in comparison with traditional methods.
Ideal for: Students proficient in Python, interested in federated learning, NLP, and data privacy in ESG reporting.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
1.10 LLM-Assisted Collection of ESG Data from News Articles and Media Sources
Background, Motivation, Problems, and Research Gaps
ESG performance is not only reflected in company reports but also in media coverage, news articles, and other public sources. Collecting and analyzing this external data is essential for a holistic ESG assessment, but manual data collection is resource-intensive. Large Language Models (LLMs) can assist in processing vast amounts of text data, extracting relevant ESG-related information from diverse media sources. However, applying LLMs for this specific purpose remains relatively unexplored, representing a research gap this study seeks to address.
Research Questions
- How can LLMs be used to automatically collect and extract ESG-related information from news articles and media sources?
- What are the limitations and challenges in using LLMs for identifying ESG topics in diverse media content?
- How effective is LLM-based data collection compared to manual methods?
Tasks
- Conduct a literature review on the use of LLMs in text analysis and media monitoring.
- Utilize pre-trained LLMs to collect ESG-related information from a set of news articles and media sources.
- Analyze the results to identify trends, key themes, and the overall effectiveness of LLM-assisted data collection.
- Compare the LLM-generated data with manually collected data to evaluate accuracy and comprehensiveness.
Ideal for: Students interested in ESG assessments, media analysis, and applying natural language processing tools, requiring minimal programming skills.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
1.11 LLMs for Extracting Key ESG Indicators from Public Reports
Background, Motivation, Problems, and Research Gaps
Publicly available ESG reports contain crucial information on various indicators like carbon footprint, diversity metrics, and waste management. However, manually extracting specific indicators from these extensive documents is time-intensive. Large Language Models (LLMs) have the potential to automate this extraction process, but their application in efficiently identifying and collecting key ESG indicators is underexplored. This study aims to address this gap by utilizing LLMs to streamline ESG data collection.
Research Questions
- How can LLMs be applied to identify and extract key ESG indicators from publicly available reports?
- What specific challenges do LLMs face when extracting diverse indicators from various reporting formats?
- How effective are LLMs in extracting ESG indicators compared to traditional manual methods?
Tasks
- Review literature on LLM usage for text extraction and analysis, particularly in ESG contexts.
- Apply pre-trained LLMs to a sample set of ESG reports to identify and extract key indicators.
- Compare the extraction results from LLMs with those obtained through manual methods.
- Analyze the effectiveness and accuracy of LLMs in automating ESG data collection.
Ideal for: Students interested in sustainability data analysis and text processing, with a focus on using LLMs and minimal programming.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
1.12 Exploring Causal Reinforcement Learning to Enhance Sustainability in Multimodal City Logistics
Background, Motivation, Problems, and Research Gaps:
City logistics is increasingly becoming a complex challenge as urban areas expand, leading to higher traffic congestion, pollution, and inefficiencies in goods delivery. Traditional logistics models often struggle to adapt to the dynamic nature of cities, particularly when multiple transportation modes (e.g., trucks, bicycles, electric vehicles) are involved. Ensuring sustainability in multimodal city logistics requires understanding the causal factors affecting transportation efficiency, environmental impact, and resource allocation.
Causal reinforcement learning (RL) offers a promising approach to address this challenge. By incorporating causal knowledge into decision-making processes, RL can adapt to changing urban environments and make informed, sustainable logistics decisions. However, research on applying causal reinforcement learning specifically to multimodal city logistics is limited, leaving a gap in exploring how this advanced technique can optimize logistics systems for better sustainability outcomes.
Research Questions:
- How can causal reinforcement learning be applied to improve the sustainability of multimodal city logistics?
- What are the key causal factors affecting the efficiency and environmental impact of different transportation modes in urban logistics?
- How does incorporating causal relationships in reinforcement learning models influence decision-making and system performance in city logistics?
Tasks:
- Conduct a literature review on multimodal city logistics and the use of causal reinforcement learning in transportation and logistics systems.
- Identify key sustainability factors in city logistics, such as emissions, energy consumption, and delivery times, and explore their causal relationships.
- Develop a causal reinforcement learning model that incorporates these relationships to optimize decisions in a simulated multimodal city logistics environment.
- Evaluate the model’s performance in terms of sustainability outcomes (e.g., reduced emissions, improved efficiency) and compare it to traditional logistics models.
- Analyze the results to identify the strengths, limitations, and potential areas for improvement in using causal reinforcement learning for sustainable city logistics.
Ideal for:
Students interested in sustainability, urban logistics, and the application of advanced AI techniques. This topic is particularly suitable for those with a background in data science, machine learning, and an understanding of reinforcement learning concepts. Basic programming skills (e.g., Python) and experience with simulation environments or logistics models would be advantageous.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.13 Building Feature Graphs from Logistics Data for Causal Analysis
Background, Motivation, Problems, and Research Gaps:
Modern logistics operations generate large amounts of data (orders, shipments, vehicles, routes, time stamps, sensor values). To discover causal relationships (e.g. which factors drive delays or CO₂ emissions), these data need to be transformed into feature graphs that capture entities and their connections over time. Today, this transformation is often done informally in scripts, without clear documentation or semantic grounding. There is a need for simple guidelines and examples on how to build meaningful feature graphs from logistics data.
Research Questions:
- Which entities (orders, vehicles, hubs, time windows, routes) and relations are essential when constructing feature graphs for logistics?
- How can we design a graph schema that is both faithful to the underlying data and suitable for causal discovery algorithms?
- Which simple statistics and visualisations help judge whether a feature graph is “good enough” for downstream analysis?
Tasks:
- Review basic concepts of graphs and causal discovery (at an intuitive level) and explore one logistics dataset (synthetic or anonymised).
- Propose a feature graph schema for one use case (e.g. last-mile deliveries with time windows and temperatures).
- Implement a small ETL process (e.g. Python + pandas + NetworkX) that transforms tabular data into the proposed graph.
- Compute basic graph statistics (e.g. degree, connected components) and generate visualisations to inspect the structure.
- Derive simple practical guidelines for other students or practitioners who want to construct feature graphs from logistics data.
Ideal for:
BSc/MSc DSSB, DE, CS, SCM, or IEM students with basic Python skills and interest in graphs and data modelling; causal inference knowledge is a plus but not mandatory.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.14 Ontology-Guided Hybrid Causal Discovery in Logistics Data
Background, Motivation, Problems, and Research Gaps:
Causal discovery algorithms (e.g. LiNGAM, IGCI, DECI) attempt to infer cause–effect structures from data. In logistics, however, there is rich expert knowledge: some causal directions are known (e.g. departure time influences arrival time, not vice versa), and certain causal links are impossible (e.g. sensor noise cannot cause road weather). Purely data-driven algorithms often ignore such domain knowledge, leading to implausible or unstable graphs. There is a research gap in systematically injecting domain and ontology knowledge into causal discovery for logistics processes.
Research Questions:
- How can prior knowledge from a logistics ontology (e.g. allowed/forbidden edges, temporal order) be turned into constraints or priors for causal discovery algorithms?
- Does combining ontology knowledge with data-driven methods improve the accuracy and stability of discovered causal graphs?
- What is the impact on computational effort and scalability when adding such constraints in realistic logistics scenarios?
Tasks:
- Review selected causal discovery methods and their existing support for constraints or priors.
- Extract domain knowledge from a logistics ontology and convert it into formal constraints (e.g. “A cannot cause B”, “time must flow forward”).
- Implement a prototype hybrid causal discovery pipeline (e.g. existing libraries + custom constraint handling).
- Evaluate the approach on at least one synthetic benchmark dataset and one logistics-like dataset (real or semi-synthetic).
- Compare constrained vs. unconstrained methods in terms of accuracy, stability, runtime, and qualitative plausibility of the resulting graphs.
Ideal for:
BSc/MSc CS, DE, DSSB, RIS, IEM, SCM students with solid Python and ML basics; ideal for those who enjoy algorithmic work and may want to publish their thesis.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.15 Prototype Anomaly Detection in Aluminium Casting Process
Background, Motivation, Problems, and Research Gaps:
Sensor drift, outliers and faulty readings can severely degrade data quality and mislead downstream AI models. Automated anomaly detection for selected High-Pressure Die Casting (HPDC) signals would help clean the data and trigger maintenance actions, but a lightweight prototype is still needed.
Research Questions:
- Which classical and simple ML-based outlier detection methods are suitable for typical HPDC sensor signals?
- How well can they detect abnormal cycles or sensor faults compared to expert judgement?
Tasks:
- Select 1–2 relevant time-series sensors (e.g. temperature, pressure) from provided data.
- Implement basic anomaly detection methods (Z-Score, IQR, moving window statistics; optionally Isolation Forest) in Python.
- Evaluate results with simple metrics and, if possible, expert feedback or labelled data.
- Provide recommendations for integrating anomaly checks into the preprocessing pipeline.
Ideal for:
BSc/MSc IEM, CS, DE, DSSB, SCM with beginner-to-intermediate Python & statistics skills.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.16 Comparative Evaluation of Causal Discovery Algorithms for Aluminium Production Data
Background, Motivation, Problems, and Research Gaps:
Aluminium High-Pressure Die Casting (HPDC) processes are complex, with many interacting variables and time dependencies. Various Causal Discovery algorithms (e.g. NOTEARS, DAG-GNN, FCI, PCMCI) exist, but their relative performance under noisy, high-dimensional, time-dependent industrial data conditions is unclear.
Research Questions:
- How do different causal structure learning algorithms perform on simulated and real HPDC datasets?
- Which algorithmic choices (regularisation, temporal modelling, prior constraints) lead to practically useful causal graphs?
Tasks:
- Implement or configure several Causal Discovery methods in Python using existing libraries.
- Create synthetic benchmark datasets that mimic HPDC characteristics (high sampling rates, latent variables).
- Apply algorithms to synthetic and real datasets, evaluate with structural and predictive metrics, and compare against expert expectations.
- Provide guidelines on algorithm selection and parameterisation for HPDC use cases.
Ideal for:
MSc/BSc DE, CS, SCM, DSSB, IEM, Math, RIS with some Python, ML and statistics background; interest in causal inference.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.17 Causal Treatment Effect Estimation for Continuous Process Parameters in Aluminium Castin Process
Background, Motivation, Problems, and Research Gaps:
Process parameters such as injection speed, pressure or die temperature are continuous “treatments” that influence scrap probability, cycle time and quality. Estimating their causal effects under real-world confounding is challenging but necessary for robust optimisation.
Research Questions:
- How can causal effects of continuous High-Pressure Die Casting (HPDC) process parameters on scrap be estimated under realistic confounding?
- Which estimators yield the most robust and interpretable results for practitioners?
Tasks:
- Select a concrete use case (e.g. one part type and a small set of key parameters).
- Implement several causal estimators (e.g. DML, Causal Forests, TMLE) in Python/R using state-of-the-art libraries.
- Conduct robustness checks (unobserved confounding sensitivity, domain shift) and visualise heterogeneous treatment effects.
- Translate findings into interpretable recommendations for process settings.
Ideal for:
MSc/BSc SCM, DSSB, DE, IEM, CS with some causal inference, econometrics and programming skills.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.18 RAG-Based xLM Assistant for Aluminium Manufacturing Process and Quality Engineers
Background, Motivation, Problems, and Research Gaps:
Engineers frequently search for parameter limits, troubleshooting guides and best practices scattered across documents and systems. A domain-specific assistant using Retrieval-Augmented Generation (RAG) and an expert-tuned language model could provide consistent, context-aware answers. A focused prototype for High-Pressure Die Casting (HPDC) is still missing.
Research Questions:
- How can HPDC process knowledge, causal graphs and documentation be modelled and indexed for effective retrieval?
- How well can a RAG-based xLM answer typical engineering questions and generate consistent explanations?
Tasks:
- Collect and preprocess a small corpus of HPDC documentation, variable catalogues and (simplified) causal graph exports.
- Implement a minimal RAG pipeline (document store, retriever, LLM interface) using open-source frameworks.
- Design representative user queries and evaluate answer quality, consistency and latency.
- Suggest improvements for prompt design, retrieval strategy and safety mechanisms.
Ideal for:
MSc/BSc CS, DE, DSSB, IEM, SCM with experience in NLP, LLMs and Python.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
1.19 Causal AI for Infrastructure System Interventions with Sparse Data: Transport Modal Shift Application
Background, Motivation, Problems, and Research Gaps:
Infrastructure systems worldwide face critical challenges in planning optimal interventions under uncertainty, limited data, and complex causal interactions. Traditional policy evaluation relies on correlational analysis or expert judgment, which cannot reliably distinguish true causal drivers from spurious associations. This leads to misallocated resources, failed interventions, and inability to predict policy outcomes.
Recent advances in causal artificial intelligence, particularly; causal discovery algorithms, quasi-experimental methods, and counterfactual simulation, offer promising tools to address these challenges. However, these methods are primarily developed for data-rich domains (tech, healthcare, e-commerce) with hundreds of thousands of observations. Infrastructure systems typically have sparse panel data (<200 temporal observations), high-dimensional state spaces, unmeasured confounders, and non-stationary dynamics, making direct application of existing causal AI methods problematic.
The motivation for this research is to develop, validate, and deploy causal AI frameworks specifically optimized for sparse, complex infrastructure data. We focus on three methodological innovations: (1) expert-augmented causal discovery that combines algorithmic learning with domain knowledge, (2) hybrid causal inference that ensembles multiple quasi-experimental estimators with DAG-guided covariate selection, and (3) real-time validation systems with Bayesian updating for deployed causal models.
These methods are demonstrated and validated on South Africa’s rail freight modal shift crisis, where 30 billion tonne-km of freight shifted from rail to road despite rail’s economic and environmental advantages, a R18+ billion annual economic burden. However, the methodological contributions are generalizable to any infrastructure domain (power grids, water systems, transportation networks, telecommunications) facing intervention design under data scarcity.
Research Questions:
- How can constraint-based and score-based causal discovery algorithms be optimized for sparse panel data (<200 observations) with high dimensionality?
- What hybrid framework best combines algorithmic causal discovery with expert domain knowledge to improve DAG reliability while quantifying epistemic uncertainty?
- How sensitive are discovered causal structures to missing data, measurement error, and non-stationarity in real-world infrastructure time series?
- Can Bayesian model averaging across multiple causal discovery algorithms reduce false edge detection and improve structural stability compared to single-algorithm approaches?
Tasks:
- Benchmarking Causal Discovery Algorithms on Synthetic Data
Objective: Compare 6 causal discovery algorithms on synthetic datasets with known ground truth - Expert Knowledge Elicitation for Causal Prior Construction
Objective: Design protocol to elicit causal DAGs from domain experts and compare to data-driven discovery - Time-Varying Causal Structure Detection in Non-Stationary Systems
Objective: Develop methods to detect when causal structure changes over time (structural breaks) - Bayesian Model Averaging for Causal Discovery Ensembles
Objective: Combine multiple causal discovery algorithms using Bayesian Model Averaging to improve reliability
Ideal for:
Any student with an interest in Python/Statistics, human-AI Collaboration, time series analysis and Bayesian statistics
Contact: Precious Sephooko, M.Sc. <psephooko@constructor.university>
2 Topics Related to Survey and Interview and Data Analysis
These topics involve several key steps:
- Conducting a literature review to identify essential variables and factors.
- Developing a hypothesis model to establish relationships between the identified variables.
- Collecting data through public surveys or expert interviews/surveys.
- Performing quantitative analysis of the survey/interview results using statistical methods, Multi-Criteria Decision Making (MCDM) techniques, or Large Language Models (LLMs).
These topics are well-suited for students interested in the complete quantitative research process, from hypothesis development and data collection to hypothesis testing and data analysis. Depending on the chosen data analysis method, programming skills may not be required.
2.1 Causal Analysis of Digital Product Passports on Consumer Purchasing Behavior
Background, Motivation, Problems, and Research Gaps:
Digital Product Passports (DPPs) are emerging as a vital tool for promoting transparency and sustainability in consumer products. These passports provide detailed information on a product’s origin, materials, manufacturing processes, and environmental impact, enabling consumers to make more informed purchasing decisions. While DPPs have gained attention for their potential to influence consumer behavior towards more sustainable choices, the actual causal impact of DPPs on purchasing decisions remains unclear.
Existing research primarily focuses on the descriptive and predictive aspects of DPPs without exploring the causal relationships between the availability of product information and changes in consumer behavior. This thesis aims to fill this gap by applying causal analysis to understand how DPPs affect purchasing patterns, revealing the key factors that drive consumer decision-making in the context of sustainability.
Research Questions:
- What specific information in DPPs (e.g., product origin, environmental impact) influence on consumer decisions?
- What external factors influence on consumer decisions?
- How does the presence of DPPs affect consumers’ willingness to pay for sustainable products?
Tasks:
- Conduct a literature review on Digital Product Passports and their role in sustainable consumer behavior.
- Identify key variables in DPPs (e.g., product lifecycle information, environmental footprint) that may influence purchasing decisions.
- Develop hypotheses.
- Collect and preprocess data through public surveys.
- Apply causal analysis methods (e.g., Structural Equation Modeling, Causal Inference) to identify and quantify the causal impact of DPPs on consumer purchasing behavior.
- Analyze the results to identify which aspects of DPPs are most influential in driving sustainable consumer choices and discuss potential implications for manufacturers and policymakers.
Ideal for:
Students interested in sustainability, consumer behavior, and data analysis. This topic is particularly suitable for those with a background in industrial engineering and management, or data science and a keen interest in applying causal analysis techniques. Basic knowledge of statistical analysis, causal inference methods, and experience with data analysis tools (e.g., SmartPLS, Python) will be advantageous.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
2.2 Assessing the Critical Factors for Digital Product Passport Adoption in the Circular Economy Using Multicriteria Decision Making (MCDM)
Background, Motivation, Problems, and Research Gaps
Digital product passports are critical for implementing circular economy practices, yet their adoption is influenced by various factors, such as technological capability, regulatory compliance, and market demand. There is a research gap in systematically evaluating these factors to guide stakeholders in making informed adoption decisions.
Research Questions
- What are the critical factors influencing the adoption of digital product passports within the circular economy?
- How can the TOPSIS method be used to rank these factors to facilitate decision-making?
Tasks
- Perform a literature review to identify factors affecting DPP adoption in the circular economy.
- Develop a TOPSIS model to rank these factors.
- Conduct expert survey/interview.
- Analyze the results to identify the most influential factors driving DPP adoption.
- Provide recommendations for policymakers and industry practitioners based on the TOPSIS ranking.
Ideal for: Students interested in sustainability, decision-making models, and the implementation of digital product solutions using MCDM method such as TOPSIS, DEMATEL, and AHP
These topics emphasize analytical approaches using decision-making models, causal analysis, and systematic literature reviews, making them suitable for students with limited programming skills.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3 Topics Related to Mathematical Modelling, Operation Research, and Simulation
The topics in this category involve capturing real-world problems and representing them using mathematical or formal models, such as Mixed-Integer Linear Programming (MILP), Linear Programming (LP), dynamic programming, reinforcement learning, and others. This process includes simulating the models with real data and conducting sensitivity analysis. These topics are well-suited for students with a strong interest in applying mathematical modeling to address real-world challenges. Depending on the technique applied, programming skill at the beginner level may be required.
3.1 Enhancing Sustainability in Multimodal City Logistics through Reinforcement Learning
Background, Motivation, Problems, and Research Gaps:
Urban areas are facing increasing challenges in managing city logistics due to rapid population growth, rising e-commerce demand, and the need for sustainable transportation. Multimodal logistics, which involves the use of various transportation modes (e.g., trucks, bicycles, electric vehicles), presents a viable solution. However, optimizing these systems for sustainability is complex due to the dynamic nature of urban environments, fluctuating demand, and the interplay between different transportation options.
Traditional logistics models often lack the adaptability required to respond to these changing conditions effectively. Reinforcement learning (RL) offers a promising approach by allowing logistics systems to learn and adapt to real-time changes, improving decision-making, efficiency, and sustainability. Despite its potential, research on applying RL specifically to the sustainability challenges in multimodal city logistics is limited, highlighting a gap in understanding how RL can be effectively used to optimize these complex systems.
Research Questions:
- How can reinforcement learning be applied to optimize multimodal city logistics for improved sustainability?
- What are the key factors that influence sustainability in multimodal logistics, and how can RL be used to address them?
- How does the performance of an RL-based logistics model compare to traditional logistics optimization methods in urban settings?
Tasks:
- Conduct a literature review on multimodal city logistics and the application of reinforcement learning in logistics and transportation systems.
- Identify key sustainability factors in city logistics, such as emissions, energy consumption, delivery times, and their interaction in a multimodal context.
- Develop a reinforcement learning model tailored to optimize decision-making in a simulated multimodal city logistics environment.
- Evaluate the model’s performance, focusing on its impact on sustainability outcomes (e.g., reducing emissions, improving delivery efficiency) compared to traditional logistics models.
- Analyze the results to identify the strengths, limitations, and potential areas for future research in applying RL to city logistics.
Ideal for:
Students passionate about sustainability, urban logistics, and the application of AI and machine learning. This thesis is particularly suited for those with a background in data science, machine learning, and an understanding of reinforcement learning concepts. Basic programming skills (e.g., Python) and experience with simulation tools or logistics models would be beneficial.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.2 Optimizing Production Planning for Circular Economy Integration: A Mathematical Modeling Approach
Background, Motivation, Problems, and Research Gaps:
The circular economy is a sustainable approach that focuses on minimizing waste, promoting recycling, and extending the lifecycle of products and materials. In the manufacturing sector, integrating circular economy principles into production planning can significantly reduce resource consumption, enhance environmental sustainability, and improve cost-efficiency.
Traditional production planning models primarily focus on linear processes, from material procurement to product disposal. However, they often overlook opportunities for recycling, reuse, remanufacturing, and waste minimization. This creates a research gap in developing production plans that account for the closed-loop nature of material flows inherent in a circular economy.
This thesis aims to bridge this gap by developing a mathematical optimization model that integrates circular economy concepts into production planning. The goal is to create a model that optimizes production while incorporating aspects like material reuse, recycling, remanufacturing, and waste reduction.
Research Questions:
- How can circular economy principles be effectively integrated into production planning using mathematical optimization models?
- What are the key factors and constraints that need to be considered when developing a production plan aligned with circular economy objectives?
- How does incorporating circular economy concepts into production planning impact resource utilization, cost-efficiency, and waste reduction?
Tasks:
- Conduct a literature review on circular economy principles and existing production planning models.
- Identify key circular economy practices (e.g., recycling, reuse, remanufacturing) and incorporate them into the model’s design.
- Develop a mathematical optimization model for production planning that includes variables and constraints related to circular economy activities.
- Implement the model using optimization tools (e.g., Python with PuLP or Gurobi), and test it with different scenarios to evaluate its impact on resource efficiency and sustainability.
- Analyze the results to identify how integrating circular economy practices affects production outcomes and sustainability goals.
Ideal for:
Students with an interest in sustainable manufacturing, production planning, and operations research. This thesis is particularly suited for those who enjoy mathematical modeling and optimization and want to explore its application in promoting circular economy practices. Basic knowledge of optimization methods and familiarity with programming tools (e.g., Python, MATLAB) will be advantageous.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.3 Integrating Circular Economy in Production Planning: A Reinforcement Learning Approach
Background, Motivation, Problems, and Research Gaps:
The circular economy emphasizes resource efficiency, waste minimization, and the reuse and recycling of materials to create a more sustainable production process. In manufacturing, incorporating circular economy principles into production planning can significantly reduce environmental impact while improving operational efficiency. However, traditional production planning methods often focus on linear processes and lack the flexibility to adapt to dynamic, closed-loop material flows inherent in a circular economy.
Reinforcement Learning (RL) offers a promising solution to this challenge. By enabling a system to learn and adapt based on real-time feedback, RL can optimize production decisions while considering complex factors like material reuse, recycling rates, and fluctuating demand. Despite its potential, research on applying RL to integrate circular economy concepts into production planning is limited, leaving a gap in developing adaptive, data-driven planning strategies that align with sustainability goals.
Research Questions:
- How can reinforcement learning be used to integrate circular economy principles into production planning?
- What key factors (e.g., recycling rates, material availability) should be considered in an RL-based production planning model for a circular economy?
- How does the performance of an RL-based production planning approach compare to traditional planning methods in terms of sustainability and efficiency?
Tasks:
- Conduct a literature review on circular economy practices and the application of reinforcement learning in production planning.
- Identify key elements of circular economy integration, such as recycling, remanufacturing, and waste reduction, to be included in the RL model.
- Develop a reinforcement learning model for production planning that dynamically adapts to changes in material availability, demand, and recycling rates.
- Implement the model in a simulated production environment using tools like Python (e.g., TensorFlow, PyTorch) to train the RL agent.
- Evaluate the model’s performance by comparing it with traditional production planning methods, focusing on sustainability outcomes (e.g., reduced waste, improved resource efficiency).
- Analyze the results to identify the strengths, limitations, and potential improvements of using RL for circular economy-driven production planning.
Ideal for:
Students interested in sustainable manufacturing, operations research, and the application of artificial intelligence in production planning. This thesis is particularly suited for those with a background in machine learning, data science, or reinforcement learning, and who are eager to explore advanced, adaptive decision-making techniques. Basic programming skills (preferably in Python) and familiarity with RL concepts and libraries (e.g., TensorFlow, PyTorch) will be advantageous.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.4 Modelling Safety and Emission Constraints in Logistics as Constrained Markov Decision Processes (MDPs)
Background, Motivation, Problems, and Research Gaps:
Reinforcement Learning (RL) is increasingly used for routing, dispatching, and resource allocation in logistics. However, many real-world constraints must be strictly respected: temperature thresholds in cooled transport, separation of dangerous goods, emission budgets, delivery time windows, etc. These can be modelled using Constrained Markov Decision Processes (CMDPs), but there is little pedagogical work showing how typical logistics constraints can be translated into CMDPs in a clear and reusable way.
Research Questions:
- How can common logistics constraints (temperature, European Agreement concerning the International Carriage of Dangerous Goods by Road (ADR) rules, CO₂ budgets, service levels) be translated into CMDP components (state, action, constraints, penalties)?
- Which simple toy environments are suitable to illustrate the effect of different constraint formulations on RL policies?
- How can resulting CMDP models be documented in a way that logistics practitioners can understand and validate?
Tasks:
- Identify typical constraints in cooled and dangerous-goods logistics from literature and practice reports.
- Formulate 1–2 small CMDP models (state space, actions, reward, constraint functions) for selected scenarios.
- Implement these models in a simple simulation environment (e.g. gridworld, queueing model) in Python.
- Experiment with different constraint settings (hard vs. soft, different penalty weights) and observe changes in learned policies.
- Summarise the CMDP models and results in a practitioner-oriented technical note.
Ideal for:
BSc/MSc IEM, Math, CS, SCM, DE, DSSB students with basic probability/operation research knowledge and beginner–intermediate Python skills.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.5 Safety-Constrained Causal Reinforcement Learning for CO₂-Efficient Fleet Dispatching
Background, Motivation, Problems, and Research Gaps:
Many fleet operators want to reduce CO₂ emissions while maintaining service levels and complying with safety and regulatory constraints. Standard RL methods can optimise dispatching or routing, but they are typically black-box, may violate constraints during training, and ignore known causal relations between actions, environment, and emissions. There is a need for RL approaches that (1) respect safety and regulation, (2) leverage causal knowledge, and (3) remain interpretable to decision-makers.
Research Questions:
- How can causal knowledge (e.g. from a causal graph relating speed, load, distance, and emissions) be integrated into RL algorithms for fleet dispatching?
- To what extent does a safety-constrained causal RL approach improve CO₂ efficiency and constraint satisfaction compared to standard RL baselines?
- How robust are these policies when the environment changes (e.g. demand peaks, new emission limits, different traffic patterns)?
Tasks:
- Review literature on CMDPs, safe RL, and causal RL, focusing on applications to operations and logistics.
- Design a simplified but realistic fleet dispatching environment (e.g. small city with depots, customers, time windows, vehicle types, emissions).
- Implement at least one safety-constrained causal RL method (e.g. constraint-aware policy optimisation with causal structure) and one standard RL baseline.
- Run simulation experiments to compare performance (CO₂, cost, on-time delivery, constraint violations) and robustness under scenario variations.
- Analyse how interpretable the resulting policies are (e.g. via policy visualisations or causal attributions) and discuss deployment implications.
Ideal for:
MSc/BSc CS, DSSB, DE, IEM, SCM, Math, RIS students with strong Python and RL background; excellent topic for a research/publishable master thesis.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.6 Designing Counterfactual “What-If” Scenarios for Logistics Digital Twin Replays
Background, Motivation, Problems, and Research Gaps:
Logistics companies often ask “What if we had loaded differently?” or “What if the truck had left earlier?” Digital twins that can replay historical operations and simulate alternative decisions can answer such questions. However, SMEs rarely have a catalogue of well-defined counterfactual test cases, nor clear metrics to evaluate them. There is a need for simple, understandable templates for “what-if” scenarios that can be reused in logistics decision support.
Research Questions:
- Which types of “what-if” questions are most common and relevant for logistics managers (e.g. timing changes, alternative routes, different vehicle types)?
- How can these questions be translated into reusable counterfactual test templates?
- Which metrics (CO₂, delay, utilisation, costs) best communicate the impact of counterfactual scenarios to non-experts?
Tasks:
- Gather typical decision scenarios from logistics practice (literature, interviews, or project documentation).
- Define 5–8 counterfactual test templates that describe how to change inputs or decisions in a replay (e.g. earlier departure, alternative route, consolidation rules).
- Use a simple dataset or toy simulation to illustrate each template with before/after comparisons on selected metrics.
- Sketch dashboard mock-ups that show the results in a clear and intuitive way.
- Summarise which templates are most useful for SME decision-making and why.
Ideal for:
BSc/MSc IEM, SCM, IBA, Math students who enjoy conceptual work with light data analysis and visualisation; only basic Excel/R/Python skills required.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.7 Stress Testing Logistics Policies with Counterfactual Digital Twins and Off-Policy Evaluation
Background, Motivation, Problems, and Research Gaps:
Once dispatching or routing policies (possibly learned by RL) are deployed, decision-makers want to know: how do these policies behave under disruptions (strikes, extreme weather, demand peaks)? Running real-world experiments is risky and expensive. A counterfactual digital twin can replay historical data under alternative policies and simulated disruptions, but there is still a lack of concrete methods and tools for systematic stress-testing in logistics.
Research Questions:
- How can counterfactual digital twins be used to stress-test logistics policies under rare but realistic disruption scenarios?
- Which off-policy evaluation methods and robustness metrics (e.g. regret, violation probability, recovery time) are most informative for logistics stakeholders?
- How transferable are stress-test insights across different logistics scenarios (small vs. medium operators, different networks)?
Tasks:
- Review methods for off-policy evaluation (OPE) and robustness analysis in RL and operations management.
- Define a set of disruption scenarios (e.g. blocked routes, capacity reduction, sudden demand spikes) for a logistics simulation environment.
- Implement a basic digital twin/simulation wrapper that can replay historical or simulated trajectories under different policies and interventions.
- Integrate one or more OPE methods (e.g. importance sampling, doubly robust estimators) to assess policy performance under stress scenarios.
- Analyse robustness across scenarios and provide design recommendations for a stress-testing dashboard aimed at logistics managers.
Ideal for:
MSc/BSc SCM, IEM, DE, DSSB, CS students interested in simulation, RL, and decision support; basic Python and statistics skills required.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.8 Methodology for Integrating Expert Knowledge into Causal Models for HPDC
Background, Motivation, Problems, and Research Gaps:
Combining data-driven Causal Discovery with domain-expert knowledge (rules, heuristics, physical constraints) is essential to obtain realistic and trusted models. However, practical methods for eliciting and encoding this knowledge for die-casting are not yet standardised.
Research Questions:
- Which methods are suitable to elicit and formalise expert knowledge about HPDC process–quality relationships?
- How can this knowledge be mapped to constraints, priors or structure hints for causal graphs?
- What are best-practice guidelines for hybrid (data + expert) causal models in HPDC?
Tasks:
- Review literature on knowledge elicitation, expert systems, hybrid causal models and constraint-based causal discovery.
- Design templates/workshop formats for capturing HPDC expert knowledge (causal maps, decision trees, rules).
- Propose and document a step-by-step methodology to translate this into constraints/prior knowledge for causal graphs.
- Validate the methodology with a small expert case study (interview/workshop) and reflect on strengths and limitations.
Ideal for:
MSc/BSc IEM, SCM; strong conceptual/methodological interest; only basic analytics skills.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
3.9 Causal Graph-Based “What-If” Simulation Module for HPDC Decision Support
Background, Motivation, Problems, and Research Gaps:
Decision-makers want to explore “what-if” scenarios, e.g. “What happens to scrap and cycle time if we increase die temperature by 5 °C?”. Using causal graphs and effect estimates to power an interactive simulation module would make such questions answerable, but concrete design patterns for HPDC are still lacking.
Research Questions:
- How can estimated causal graphs and treatment effects be used to compute intuitive “what-if” scenarios for HPDC process parameters?
- How should an interactive simulation interface be designed for engineers to explore trade-offs between scrap, energy and cycle time?
Tasks:
- Select a small causal model (or build a simplified one) with a few key parameters and outcomes.
- Implement a backend that propagates interventions (do-operations) through the model and computes expected outcome changes.
- Develop a simple interactive UI (e.g. web app with sliders) to explore scenarios.
- Validate plausibility with domain experts or against held-out data if available.
Ideal for: MSc/BSc CS, DSSB, DE, SCM with good skills in Python (or similar), basic causal reasoning and web/app development.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
4 Topics Related to Systematic Literature Review and Theoretical Model Development
The following topics focus on conducting systematic literature reviews, extracting key insights from research, and developing theoretical models. The literature review and information extraction will be enhanced using various AI tools for literature management and analysis. These thesis topics do not require programming skills but do require an interest in reading and analyzing scientific papers.
4.1 Design of KPI and Constraint Ontology Modules for Sustainable Urban Logistics
Background, Motivation, Problems, and Research Gaps:
Urban logistics companies (e-commerce, parcel, food delivery) must track KPIs such as delivery punctuality, vehicle utilisation, and CO₂ emissions while also complying with constraints like temperature limits and dangerous-goods (ADR) rules. In practice, these aspects are usually stored in Excel sheets, ERP/TMS systems, or even in people’s heads, making analysis and automation difficult. A logistics ontology (a formal semantic model) can provide a shared vocabulary for KPIs and constraints, but existing models are either too generic or not tailored to SME logistics. There is a gap for a compact, understandable ontology module that captures the most important KPIs and constraints for sustainable urban logistics.
Research Questions:
- Which KPIs and constraints are most critical for typical SME logistics scenarios (e.g. last-mile, cooled transport, parcel delivery)?
- How can these KPIs and constraints be modelled in a lightweight ontology that domain experts can still understand?
- Does such an ontology make querying, validation, and documentation of logistics performance easier than today’s spreadsheet-/system-specific solutions?
Tasks:
- Conduct a short literature and practice review on key logistics KPIs and operational/regulatory constraints (CO₂, punctuality, load factor, temperature, ADR).
- Select a small, representative set (e.g. 8–12) of KPIs and constraints for urban logistics scenarios.
- Model these concepts in OWL/RDFS using a tool like Protégé (classes, properties, simple constraints).
- Create small example datasets (instances) and write a few SPARQL queries answering typical management questions.
- Document the ontology in a way that logistics practitioners (non-IT experts) can follow.
Ideal for:
BSc Industrial Engineering and Management, M.Sc. in Supply Chain Managment, BSc. International Business Administration
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
4.2 Exploring Federated Learning for ESG (Sustainability) Data Sharing in Supply Chains
Background, Motivation, Problems, and Research Gaps
ESG assessments in supply chains often require sharing sensitive data, raising privacy concerns. Federated learning enables collaborative model-building without sharing raw data, making it an attractive solution. However, its application in ESG data sharing within supply chains has not been thoroughly explored, creating a research gap. This study aims to investigate the potential of federated learning for secure ESG data collaboration in multi-tier supply chains.
Research Questions
- How can federated learning be applied to facilitate ESG data sharing in supply chains?
- What are the benefits and challenges of using federated learning for ESG assessments in supply chains?
- How does federated learning impact data privacy and collaboration among supply chain partners?
Tasks
- Review literature on federated learning and its applications in data privacy and collaborative modeling.
- Explore case studies of supply chains that would benefit from secure ESG data sharing.
- Propose a conceptual framework for using federated learning in ESG data sharing.
- Analyze the potential challenges and benefits of implementing federated learning in a supply chain context.
Ideal for: Students interested in supply chain management, data privacy, and collaborative modeling, with a focus on conceptual understanding rather than programming.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
4.3 Developing a Framework for Federated Learning-Enhanced LLMs in ESG Data Assessment
Background, Motivation, Problems, and Research Gaps
Assessing ESG data accurately requires comprehensive analysis of diverse and sensitive information. LLMs have demonstrated the ability to process large volumes of textual data, but privacy concerns restrict sharing ESG data across organizations. Federated learning provides a solution by enabling collaborative model training without exposing raw data. There is a research gap in creating a standardized framework that combines federated learning and LLMs for multi-organizational ESG assessment while ensuring data privacy and quality.
Research Questions
- What are the key components of a framework that integrates federated learning and LLMs for ESG data assessment?
- How can federated learning enhance LLM-based ESG assessments in a multi-organizational environment?
- What challenges need to be addressed when developing such an integrated framework?
Tasks
- Conduct a literature review on federated learning and LLM applications in ESG data assessment.
- Identify the key components required for integrating federated learning with LLMs.
- Develop a conceptual framework for using federated learning-enhanced LLMs for ESG assessment.
- Discuss the benefits, potential challenges, and limitations of the proposed framework.
Ideal for: Students interested in data analysis, sustainability assessments, and conceptual framework development, with a focus on integrating advanced technologies without extensive programming.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
4.4 Developing a Conceptual Framework for Using Large Language Models (LLMs) in Sustainability/ESG Data Collection and Report Generation: A Systematic Literature Review
Background, Motivation, Problems, and Research Gaps:
The importance of sustainability and Environmental, Social, and Governance (ESG) reporting has been growing rapidly as stakeholders demand greater transparency and accountability from organizations. However, creating comprehensive and accurate ESG reports is challenging due to the vast amount of unstructured data that must be collected, processed, and analyzed. Large Language Models (LLMs) like GPT-4 have the potential to revolutionize this process by automating data collection, extracting relevant information, and generating well-structured sustainability/ESG reports.
Despite the potential of LLMs in this context, there is a lack of research exploring a standardized approach for their application in sustainability/ESG report generation. This thesis aims to bridge this gap by conducting a systematic literature review (SLR) to develop a conceptual framework for utilizing LLMs in the data collection and report generation processes for sustainability/ESG reporting.
Research Questions:
- What current methods and practices exist for using LLMs in sustainability/ESG data collection and report generation?
- What are the key challenges and limitations of applying LLMs in the context of ESG reporting?
- How can a conceptual framework guide the use of LLMs to improve the efficiency and accuracy of sustainability/ESG data collection and report generation?
Tasks:
- Conduct a systematic literature review (SLR) to gather research on the use of LLMs for sustainability/ESG data collection and report generation.
- Analyze the findings to identify key trends, methodologies, challenges, and gaps in using LLMs for ESG reporting.
- Develop a conceptual framework outlining the processes, tools, and best practices for leveraging LLMs in collecting data and generating sustainability/ESG reports.
- Validate the conceptual framework through expert feedback or illustrative case studies (if applicable).
- Provide recommendations for future research and practical implementation of LLMs in ESG reporting.
Ideal for:
Students interested in sustainability, ESG reporting, natural language processing (NLP), and the application of AI models in data analysis and report generation. This thesis is ideal for those who are curious about exploring how advanced AI techniques can enhance sustainability practices. Familiarity with LLMs (e.g., GPT-3, GPT-4), systematic literature reviews, and report generation processes will be advantageous.
Contact: Widyasmoro “Dyas” Priatmojo <wpriatmojo@constructor.university>
4.5 A Conceptual Framework for Integrating Domain Expertise with Data-Driven Methods in Generating Causal Models: A Systematic Literature Review
Background, Motivation, Problems, and Research Gaps:
Causal modeling is essential for understanding complex systems in fields like supply chain, economics, healthcare, and environmental science. Traditionally, causal models are developed using either domain expertise or data-driven methods such as statistical analysis and machine learning. While data-driven approaches can uncover patterns in large datasets, they often lack the nuanced insights provided by domain experts. Conversely, expert-driven models might overlook data complexities due to human limitations in processing vast information.
There is a growing need for frameworks that effectively integrate domain expertise with data-driven methods to generate more robust and accurate causal models. However, the research on combining these two approaches remains fragmented, with no standardized conceptual framework to guide the integration process. This thesis aims to address this gap by conducting a systematic literature review (SLR) to develop a comprehensive framework for combining domain knowledge with data-driven causal discovery techniques.
Research Questions:
- What existing methods and practices are used to integrate domain expertise with data-driven approaches for generating causal models?
- What challenges and limitations arise in combining expert knowledge with data-driven methods in causal discovery?
- How can a conceptual framework be developed to guide the effective integration of domain expertise and data-driven techniques in generating causal models?
Tasks:
- Conduct a systematic literature review (SLR) to identify studies and approaches that integrate domain expertise with data-driven methods for causal modeling.
- Analyze the literature to identify key trends, methodologies, challenges, and gaps in existing integration practices.
- Develop a conceptual framework that outlines the processes, tools, and best practices for combining domain knowledge with data-driven methods to generate causal models.
- Validate the proposed framework through feedback from domain experts or illustrative case studies (if applicable).
- Provide recommendations for future research and implementation strategies for integrating domain expertise in data-driven causal modeling.
Ideal for:
Students interested in causal inference, data science, and interdisciplinary approaches to model building. This thesis is particularly suitable for those who have a basic understanding of data-driven methods (e.g., machine learning, statistical analysis) and are keen on exploring how to enhance these methods using domain-specific expertise. Familiarity with systematic literature review methodologies and causal discovery techniques will be beneficial.
4.6 Linking Logistics Ontologies to Causal and Decision Models
Background, Motivation, Problems, and Research Gaps:
In data-driven decision support, we often need both a semantic view (what are orders, vehicles, emissions?) and a causal/decision view (what influences what, and which actions can we take?). For logistics planning and optimisation, this means connecting an ontology (formal domain model) with structural causal models (SCMs) and Markov decision processes (MDPs/CMDPs). Currently, this mapping is usually done informally and ad hoc: data scientists design their own variables, while domain experts think in processes and KPIs. There is a lack of systematic patterns and tools for mapping ontological concepts to causal graphs and decision models in logistics.
Research Questions:
- How can entities and relationships in a logistics ontology (orders, vehicles, time windows, emissions, constraints) be mapped systematically to SCM nodes/edges and MDP state–action–reward structures?
- Does ontology-guided mapping improve consistency, reusability, and explainability of causal and decision models compared to ad-hoc modelling?
- Which generic mapping patterns can be reused across different logistics scenarios (e.g. cooled transport vs. parcel delivery)?
Tasks:
- Analyse an existing logistics ontology (classes, relationships, KPIs, constraints) and identify elements relevant for causal and decision models.
- Propose a mapping framework: rules or patterns that translate ontology elements into SCM and MDP/CMDP components.
- Implement a prototype mapping layer (e.g. in Python using RDFLib or OWLready2) for one or two logistics scenarios.
- Demonstrate the mapping by building at least one causal graph and one small decision model for a logistics use case.
- Evaluate the approach regarding correctness, reusability, and ease of explanation to domain experts (e.g. through expert interviews or qualitative feedback).
Ideal for:
BSc/MSc IEM, DSSB, CS, or DE students with basic Python skills and interest in AI, semantics/ontologies, and decision modelling
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
4.7 Process Mapping and Critical Variable Identification for HPDC Scrap Reduction
Background, Motivation, Problems, and Research Gaps:
Engineers know that scrap in HPDC is driven by subtle interactions between many process steps and parameters, but this knowledge is often implicit. A formal end-to-end process map with clearly identified critical variables and measurement points is rarely documented in a structured way.
Research Questions:
- How can the aluminium HPDC line be modelled as an end-to-end process including all relevant sub-steps and information flows?
- Which process variables are most likely to influence scrap formation from a process and quality management perspective?
- Where are the gaps between current measurement practice and the ideal set of critical variables?
Tasks:
- Conduct interviews / workshops with process engineers, quality staff and operators.
- Create detailed process flow diagrams (e.g. BPMN, VSM) for the HPDC line.
- Identify and classify critical process and quality variables (temperature, pressure, cycle time, etc.).
- Analyse gaps between existing and desired measurements and propose a prioritised list for additional sensors or measurements.
Ideal for:
BSc/MSc IEM, SCM; strong interest in process modelling and quality; no programming required.
Contact: Prof. Hendro Wicaksono <hwicaksono@constructor.university>
5 Topics Related to Software Development
The topics in this category are ideal for computer science students or those in related fields who are interested in gaining practical experience in software development and programming.