This study explores the integration of generative artificial intelligence (AI) into numerical analysis workflows in geotechnical engineering to address the challenges of generating synthetic datasets. This study aims to create a framework that allows practitioners with limited programming skills to automate complex simulations, enabling the development of extensive data sets for AI and machine learning applications.
The study proposes a seven-step methodology using the finite element method and Python programming to auotmate numerical modelling. Generative AI, specifically ChatGPT, is used as a virtual assistant to guide practitioners through automation. The methodology is validated through a pilot study predicting excavation-induced ground displacement in Sydney’s Hawkesbury Sandstone.
Integrating generative AI into numerical analysis workflows accelerates data generation and improves the quality of synthetic datasets. The pilot study indicates that the generated datasets closely align with real-world measurements, confirming the robustness and reliability of the proposed framework.
The study’s accuracy may be affected by assumptions in numerical analysis and input parameter quality. Future research should explore more complex geotechnical conditions, such as 3D effects, to further validate and enhance the methodology.
This framework provides an efficient solution for geotechnical practitioners to generate extensive datassets for AI training, reducing reliance on experienced programmers. It streamlines workflows and enhances data-driven decision-making in geotechnical engineering.
The paper introduces a novel integration of generative AI into numerical analysis workflows, offering an innovative approach to automate synthetic data generation. It serves as a valuable tool for advancing AI applications in geotechnical engineering, particularly for those with limited programming experience.
List of notations
1. Introduction
Artificial intelligence (AI) and machine learning (ML) are increasingly being applied in geotechnical engineering to enhance decision-making, optimise designs, and improve process efficiency (Baghbani et al., 2022, Pirnia et al., 2018; Zhang et al., 2022). However, ML in this field faces challenges due to limited, high-cost data, resulting in small datasets that may be insufficient for training robust models. Additionally, data heterogeneity across sites complicates the development of standardised, reliable ML models. The use of synthetic data sets is recommended as a solution to expand datasets, standardise inputs, simulate various scenarios and build ML models that mimic the physics of geoengineering systems (Pacheco et al., 2022; Zhang et al., 2022). However, its effectiveness hinges on accurately representing real-world conditions and validating against actual data to prevent bias (Phoon and Shuku, 2024).
Numerical analysis has been used as a powerful tool for generating synthetic databases in geotechnical engineering by simulating diverse ground conditions and scenarios (Alvarado-Gutierrez et al., 2024; Liu et al., 2024; Mitelman et al., 2023; Nie et al., 2018; Papadopoulos and Benardos, 2023; Xu et al., 2022). This approach enhances ML model training and validation by providing broader, more varied data than typically available from field data alone, thereby improving model robustness and applicability. However, the simulation of numerous scenarios is time-consuming and resource-intensive, posing challenges in synthetic database generation. Automating numerical analysis can address these challenges by increasing efficiency, optimising resource usage, reducing human error, and enabling scalability, making the process more practical and cost-effective. Despite its potential, automating numerical analysis can be a challenging procedure for geotechnical practitioners due to their time constraints and limited programming skills. Generative AI (GEN AI) platforms can serve as reliable virtual assistants to help practitioners overcome these challenges (Parsa-Pajouh et al., 2024).
This study explores the potential of integrating GEN AI into geotechnical numerical analysis workflows to streamline the generation of synthetic datasets. A practical methodology is proposed where GEN AI acts as a virtual assistant, guiding geotechnical practitioners through the automation of numerical simulations and the generation of diverse synthetic data. This integration allows for the rapid production of data sets tailored to specific geotechnical problems. The proposed methodology was validated through a pilot study predicting ground displacement due to excavation into the Hawkesbury Sandstone in Sydney. The results indicate that GEN AI not only accelerate the data generation process but also enhances the overall quality and applicability of synthetic datasets in geotechnical engineering. The paper also explores the challenges associated with using GEN AI platforms.
2. Automation of numerical analysis
The automation of numerical analysis in geotechnical design involves using software and programming to streamline model creation, execution, and analysis, reducing manual effort. This automation offers significant advantages, including time and cost savings by handling repetitive tasks more efficiently, allowing engineers to focus on more technical and complex aspects of the projects. It also enhances accuracy by minimising human error and facilitates rapid scenario comparisons (Parsa-Pajouh et al., 2024). However, successful implementation requires careful validation and verification to ensure the accuracy of automated results. Engineers must have proper training and understanding of the underlying methods, as automation should complement, not replace, expert judgement. Risk management is also crucial, with attention to the reliability of tools and strategies to mitigate the risks of over-reliance on automation.
Automating numerical models typically involves using a combination of tools and software. The primary component is numerical software with an embedded scripting language, allowing users to automate tasks and customise workflows. In this study, PLAXIS 2D was used for numerical modelling, with Python as the scripting language. The PLAXIS Python API facilitated programmatic interaction with the software for automation tasks.
3. Application of generative artificial intelligence for synthetic data generation
The built-in programming languages in numerical packages, like Python, can automate various tasks in numerical modelling, such as model creation, parameter adjustment, execution of the model, and results extraction. These automation scripts can also be modified to conduct extensive parametric studies and generate large synthetic datasets for training ML models. Geotechnical practitioners with a strong understanding of geotechnical problems and access to field data can make significant contributions to this process by validating the outputs and generating more reliable synthetic datasets. However, geotechnical practitioners often struggle with time constraints and lack of programming skills to develop these automation scripts.
To address this, recent advancements in AI, particularly GEN AI, offer new learning opportunities. Research indicates that GEN AI platforms, like ChatGPT, can assist in identifying coding errors, interpreting code, and optimising execution, making them valuable tools for engineers with limited coding experience (Fatah et al., 2023). This paper outlines a workflow developed by the author that can be used by geotechnical partitioners with little to no programming experience to develop practical automation scripts and also generate synthetic datasets. It demonstrates how GEN AI can act as a virtual assistant to accelerate coding procedure and enhance the productivity of geotechnical engineers.
4. Methodology and workflow
A systematic approach is proposed for integrating GEN AI into geotechnical numerical analysis to generate synthetic datasets. The aim of the proposed methodology is to enable geotechnical practitioners with limited coding experience to develop practical scripts to automate numerical analysis eliminating the dependency on experienced programmers. This methodology includes seven steps, which are explained in the following sections.
4.1 Step 1: define the problem and objectives
The geotechnical problem that requires synthetic datasets and the objectives should be clearly defined. In this study, the geotechnical problem is the prediction of maximum lateral ground displacement induced by excavation in the Sydney sandstone. The objective is to conduct extensive numerical parametric studies changing the excavation geometry in different classes of sandstone and generate large datasets.
4.2 Step 2: select tools for numerical analysis and automation
A reliable software for numerical analysis should be chosen, which supports scripting or automation through a programming language. Additionally, the chosen software and automation tools must be well-supported and dependable. In this study, PLAXIS 2D program and its embedded programming language (i.e. Python) were used to conduct numerical analysis and develop automation scripts, respectively.
4.3 Step 3: in-house training
The in-house training is a crucial stage of this methodology and workflow, forming the core knowledge of numerical automation for geotechnical practitioners. During this stage, practitioners should gain the following knowledge and skills: basics of Python programming (Hamedani, 2020), basics of numerical automation with Python (Seequent, 2024a), knowledge of converting simple numerical commands to Python scripts (Seequent, 2024b), and the ability to run Python scripts in a numerical program’s API (Seequent, 2024c). The author recommends using GEN AI (e.g., ChatGPT) as an assistant to facilitate the learning process. ChatGPT can serve as a personalised tutor, providing real-time explanations and examples, guiding learners through Python code step-by-step, and offering interactive exercises with feedback to enhance understanding.
4.4 Step 4: development of base automation script
The following procedure is recommended for generating a script to automate the base numerical analysis for a defined geotechnical problem: first, create a simple numerical model of the geotechnical problem, run it, and extract the results; next, extract the required commands for generating the base numerical analysis and results extraction; finally, convert these commands into Python scripts and run them using the Python editor embedded in the numerical software. ChatGPT can assist in translating numerical program commands into Python equivalents, converting their syntax and structure into appropriate Python code formats. In cases where the translation is unsuccessful, engineers can refer to available Python script examples for automating numerical analysis (Bentley, 2024) or seek assistance from the technical support team of the numerical software provider to resolve the issue.
4.5 Step 5: use generative artificial intelligence as a virtual assistant to expand base automation script
After generating the base automation code in Step 4, engineers can use ChatGPT as a virtual assistant to expand the base script and enhance its functionality. The workflow for Step 5 is summarised as follows: provide the base code generated in Step 4 to the GEN AI and write a detailed prompt outlining the objective of expanding the code; run the expanded code in the Python editor; debug the code by communicating with GEN AI; and finally, validate the automation code by comparing its results with outputs obtained through the traditional methods.
4.6 Step 6: alter automation script to generate synthetic data sets
In this step, the automation Python script developed in Step 5 should be modified with the assistance of GEN AI (ChatGPT) to conduct extensive parametric studies and generate synthetic datasets. To achieve this, a well-crafted prompt (or series of prompts) detailing the requirements of the parametric study should be provided to ChatGPT. This ensures that the generated code accurately reflects the desired outcomes and effectively automates the creation of synthetic data sets.
4.7 Step 7: validation of generated synthetic data sets
As the final stage, the generated synthetic datasets must be rigorously validated to ensure their reliability and accuracy. This process involves comparing the synthetic datasets with real-world data to assess how well they represent actual geotechnical conditions. Various metrics and statistical methods can be employed to quantify the degree of correlation between the synthetic and real-world data. For instance, the generated data can be evaluated against field measurements, laboratory test results, or historical data records to determine the extent of their consistency.
In addition to direct comparison with empirical data, expert judgement plays a critical role in this validation process. Experienced geotechnical practitioners should review the datasets to verify that they conform to known behaviours and expected patterns within the geotechnical domain. This expert oversight is crucial because it helps identify any anomalies, biases, or unrealistic trends in the synthetic datasets that might have arisen during the automation process.
The workflow of the adopted methodology described above is presented in Figure 1.
Adopted workflow to automate numerical analysis and generate synthetic datasets with the assistance of generative AI
Adopted workflow to automate numerical analysis and generate synthetic datasets with the assistance of generative AI
5. Pilot study: prediction of excavation-induced displacement in Sydney area
5.1 Introduction
Excavations alter the stress state in the ground surrounding the excavated area, leading to excavation-induced lateral displacement. Ground movements resulting from deep basement excavations can pose significant risks to nearby infrastructure and utilities. It is essential to evaluate the displacements caused by these excavations to determine their impact on surrounding assets and to implement necessary measures to prevent adverse outcomes (Alvarado-Gutierrez et al., 2024; Hewitt and Kitson, 2022).
The impact of the locked-in in situ stress field on open excavations is well recognised in the Sydney region (McQueen, 2004; Pells, 2004). In Sydney, tectonic activity has resulted in shallow horizontal stresses that exceed vertical stress, with in situ stress typically considered using established relationships that describe it as a function of depth (Bertuzzi, 2014; Chan et al., 2005; Hewitt and Kitson, 2022; Oliveira and Parker, 2014; Pells, 2004). Numerical analysis has been widely used as a reliable tool to predict maximum lateral excavation-induced displacement in Sydney region (Alvarado-Gutierrez et al., 2024; Egan et al., 2023; Hewitt and Kitson, 2022; Oliveira and Chan, 2017).
The estimation of maximum lateral displacement due to excavation can be influenced by factors such as excavation depth and rock strength parameters, including elastic modulus and in situ locked-in horizontal stress. Machine learning (ML) models can be employed to predict maximum lateral excavation-induced displacement by considering a variety of factors and parameters. However, these ML models require training on large and comprehensive data sets, which is challenging due to the limited published data available, as presented in Figures 2 and 3 (Hewitt and Kitson, 2022; Oliveira and Wong, 2012).
The generation of synthetic datasets using numerical models could be a potential solution to this challenge. However, conducting numerical analysis on this scale is inefficient due to the time, skilled resources, and cost requirements involved. Automation can overcome these limitations, making numerical analysis a highly efficient solution for generating synthetic data sets. This pilot study presents the procedure adopted to automate the numerical analysis of excavations with the assistance of GEN AI, estimating the maximum lateral displacement by conducting a comprehensive parameter study.
5.2 Assumptions and considerations for numerical analysis
The pilot study was conducted based on several assumptions and considerations: the two-dimensional finite element (2DFE) method was used to simulate the excavation in rock and estimate the maximum lateral ground displacement due to the excavation; the major horizontal stress was used to simulate the locked-in in situ stress; it was assumed that the excavation is dry; the rock was modelled as a continuum medium without bedding planes or discontinuities; and the Mohr–Coulomb model was used to simulate the sandstone behaviour, adopting parameters in Table 1 proposed by Bertuzzi (2014) based on Pells’ classification system (Pells et al., 2019).
Adopted parameters to simulate the sandstone behaviour in numerical analysis
| Material . | γ (kN/m3) . | C (kPa) . | φ (°) . | σt (kPa) . | E (MPa) . | υ . | ε (%) . | ko . |
|---|---|---|---|---|---|---|---|---|
| Class I sandstone | 24 | 1,000 | 55 | 300 | 3,000 | 0.25 | 0.05 | 4.7 |
| Class II sandstone | 24 | 500 | 50 | 100 | 2,000 | 0.25 | 0.06 | 3.8 |
| Class III sandstone | 24 | 300 | 50 | 40 | 1,000 | 0.25 | 0.07 | 2.4 |
| Class IV sandstone | 24 | 200 | 40 | 10 | 500 | 0.3 | 0.08 | 1.5 |
| Material . | γ (kN/m3) . | C (kPa) . | φ (°) . | σt (kPa) . | E (MPa) . | υ . | ε (%) . | ko . |
|---|---|---|---|---|---|---|---|---|
| Class I sandstone | 24 | 1,000 | 55 | 300 | 3,000 | 0.25 | 0.05 | 4.7 |
| Class II sandstone | 24 | 500 | 50 | 100 | 2,000 | 0.25 | 0.06 | 3.8 |
| Class III sandstone | 24 | 300 | 50 | 40 | 1,000 | 0.25 | 0.07 | 2.4 |
| Class IV sandstone | 24 | 200 | 40 | 10 | 500 | 0.3 | 0.08 | 1.5 |
Equations (1)–(4) proposed by (Oliveira and Parker, 2014) were used to generate the major horizontal stress (σH) in the numerical model and simulate the in situ locked-in stresses of Sydney Sandstone. The volumetric strain and ratio of horizontal to vertical stress (ko) were used in the numerical analysis to simulate constant value and second component of the in situ locked-in stress, respectively. The adopted volumetric strains and ko values in the numerical analysis are presented in Table 1.
5.3 Automation procedure with assistance of generative artificial intelligence
The proposed workflow in Section 3 and Figure 1 was used to automate the numerical analysis explained in Section 5.2. PLAXIS 2D program and Python programming language were used to carry out 2DFE analysis and develop scripts for automation of numerical analysis, respectively. The base numerical model for the excavation was developed based on the assumptions detailed in Section 5.2. Then, the PLAXIS 2D commands for generation of the base numerical model for simulation of an excavation with a specific depth were extracted and converted to Python scripts by using resources and assistance of ChatGPT as the GEN AI platform. The output of the Python script is the maximum lateral ground displacement due to the excavation in sandstone. Then, the generated Python script for the base numerical model was validated by comparing the outputs against the results extracted from the numerical model using the traditional methods.
5.4 Generation of synthetic data sets
The validated base Python script was modified with the assistance of GEN AI to automate the numerical analysis and conduct the parametric studies. A detailed prompt was entered into ChatGPT to expand the code to repeat the numerical analysis for different excavation depths, ranging from 5 m to 25 m in 1 m intervals for all types of rock presented in Table 1. Finally, the altered Python script was executed, and 80 numerical analyses were completed in less than 30 min. The maximum excavation-induced lateral displacements generated by the automation Python script were plotted against excavation depth in Figure 4.
Maximum excavation-induced lateral displacements generated by the automated Python script
Maximum excavation-induced lateral displacements generated by the automated Python script
5.5 Validation of generated synthetic data sets
To ensure the reliability and robustness of the generated synthetic data sets, they were validated by comparing them with real-world measurements. The available field measurements of excavations in Sydney sandstone are presented in Figure 2 (Hewitt and Kitson, 2022) and Figure 3 (Oliveira and Wong, 2012). The synthetic data sets generated in this study, as shown in Figure 4, indicate that the maximum lateral displacements range from 1.1 mm to 2.2 mm per metre of excavation depth. This range aligns well with the field data reported in Figures 2 and 3, suggesting a good level of accuracy in the synthetic data set generation process.
It should be noted that this pilot study was conducted based on the assumptions outlined in Section 5.2 to demonstrate the capabilities of GEN AI as a virtual assistant for geotechnical practitioners. The accuracy of the results can be further enhanced by considering more realistic geotechnical conditions, such as 3D effects and the inclusion of minor horizontal stress in the model.
6. Discussion
The integration of GEN AI into geotechnical engineering workflows has demonstrated a strong capability to overcome challenges in synthetic data generation. This study has shown how GEN AI can serve as a virtual assistant, enabling practitioners with limited programming skills to automate complex numerical analyses efficiently. By implementing the proposed methodology, repetitive and complex tasks are automated, accelerating the creation of diverse synthetic datasets and allowing engineers to focus on higher-level analyses and decision-making. The pilot study on predicting ground displacement in Sydney sandstone validated the effectiveness of this approach, with the generated synthetic datasets showing good agreement with available field data. This methodology not only speeds up data generation but also enhances the accuracy and reliability of datasets for AI and ML training, reducing dependency on experienced programmers and opening new avenues for geotechnical practitioners to engage in data-driven model development.
This study also acknowledges certain limitations and challenges in integrating GEN AI into geotechnical engineering workflows, which are discussed below.
6.1 Inherent uncertainties and variabilities in geotechnical parameters
Uncertainties and variabilities in geotechnical parameters within the proposed workflow are addressed through extensive parametric studies, systematically varying key parameters within realistic ranges to ensure comprehensive coverage of potential conditions. The generated datasets undergo rigorous validation against field data, laboratory results, and historical records, with expert oversight ensuring consistency with geotechnical principles and expected behaviours.
Established numerical models, such as the Mohr–Coulomb model, provide a robust foundation for simulations, while GEN AI enhances automation scripts and identifies potential errors. The iterative process enables continuous refinement based on validation outcomes, accommodating variability in input parameters and adapting models to new insights.
While the pilot study results align well with field measurements, further refinements are recommended to address more complex conditions, such as 3D effects, for improved accuracy and robustness.
6.2 Rigorous validation and expert oversight to avoid over-reliance on generative artificial intelligence
Rigorous validation and expert oversight are essential in the proposed workflow to ensure that the generated scripts adhere to engineering principles, accurately reflect realistic geotechnical behaviours, and mitigate potential biases from numerical analysis assumptions. Real-world data validation, iterative model refinement, and comprehensive parametric studies strengthen the robustness, accuracy, and applicability of synthetic datasets to diverse geotechnical conditions.
Expert judgement plays a critical role in reviewing outputs, verifying alignment with realistic behaviours, and refining AI-generated scripts. The pilot study uses established numerical models, such as Mohr–Coulomb, as a reliable foundation, with transparent documentation of assumptions to facilitate continuous improvement. AI-generated scripts are manually validated, debugged iteratively, and cross-referenced with numerical software documentation and domain expertise to ensure reliability.
Scripts are developed incrementally, with each segment carefully reviewed for accuracy before expansion. Outputs are cross-checked using trusted tools like PLAXIS 2D, while transparency in the scripting process creates an audit trail to address any inaccuracies. This ensures the generated code remains robust, reliable, and suitable for geotechnical applications.
Balancing automation with expert validation is critical to maximising the benefits of GEN AI while minimising the risks of over-reliance on automated processes.
6.3 Scalability and adoptability
The proposed methodology demonstrates scalability and adaptability for larger and more complex geotechnical projects, including 3D modelling. Tools such as PLAXIS 3D, Python APIs, and GEN AI platforms like ChatGPT are leveraged to refine and expand automation scripts, accommodating added complexity. Comprehensive parametric studies ensure that synthetic data sets represent a wide range of geotechnical conditions, while scalable automation scripts facilitate adaptation to diverse scenarios.
The modular and stepwise workflow supports incremental scaling, minimising risks and errors. Computational demands are addressed through cloud-based platforms or high-performance computing resources. Iterative validation against real-world data, combined with practitioner input, ensures accuracy and applicability beyond the pilot study, enabling customisation to specific geotechnical environments.
While the pilot study validates the approach for 2D scenarios, scaling to 3D presents challenges such as increased data requirements, longer computation times, and higher resource demands. Addressing these challenges involves efficient automation, robust data management strategies, and further testing on 3D projects to refine the methodology, enhance computational efficiency, and ensure robustness for broader engineering and ML applications. Tailored parametric studies, supported by advanced validation using multi-site field data and laboratory experiments, ensure the methodology’s reliability and scalability for diverse geotechnical applications.
The pilot study was conducted based on simplified assumptions, such as dry excavation and modelling Sydney’s Hawkesbury sandstone as a continuum without considering bedding planes or discontinuities. These assumptions leveraged the sandstone’s well-documented properties and alignment with field data to demonstrate feasibility. While these simplifications were sufficient for the pilot study, the methodology is designed to accommodate more complex and heterogeneous geological settings. Features such as bedding planes, discontinuities, variable material properties, and 3D modelling can be incorporated to capture three-dimensional effects and extend the methodology’s applicability to more intricate geotechnical conditions.
6.4 Suitability for machine learning models
To ensure that synthetic datasets are suitable for training ML models, a diverse parametric study can be conducted using the proposed methodology to capture a wide range of geotechnical scenarios, generating large-scale data sets through automation to reduce overfitting risks, and validating outputs against real-world data to ensure accuracy and relevance. Domain experts review the datasets to align them with expected geotechnical behaviours, while iterative refinement improves robustness based on validation feedback.
6.5 Quantitative metrics
While the paper validates synthetic data primarily through visual alignment with field measurements, it recognises the importance of incorporating quantitative metrics for rigorous assessment. Although specific measures like root mean square error (RMSE) or R-squared are not explicitly used, these metrics could enhance validation by quantifying the accuracy and variance explained by the synthetic data. Including these metrics in future work would provide objective, quantifiable evidence of accuracy, addressing current limitations and strengthening the methodology’s robustness and applicability.
6.6 Accessibility for practitioners with limited programming experience
The paper emphasises accessibility for practitioners with limited programming experience by leveraging GEN AI tools like ChatGPT for real-time guidance in script generation, debugging, and optimisation, reducing technical barriers. However, practitioners must undergo foundational training in Python, numerical analysis, and automation to build baseline proficiency. While the framework simplifies the process with structured workflows and modular steps, troubleshooting complex errors or adapting scripts may still require external support from experienced programmers or numerical software providers. The scalability of the framework allows users to progress from simple to complex tasks gradually, but the need for iterative refinement and occasional challenges with AI-generated suggestions highlights the importance of supplemental resources. Future refinements, such as pre-built templates or interactive tutorials, could further enhance accessibility, but initial support remains critical for non-programmers to effectively use the framework.
7. Conclusion
This study presents a methodology that leverages generative AI to aid geotechnical practitioners in automating numerical analysis and generating synthetic datasets . The workflow is designed to empower those with limited programming skills to automate complex simulations efficiently. The pilot study on excavation-induced displacements in Sydney sandstone demonstrated the effectiveness of this approach, showing good agreement between the generated synthetic datasets and field measurements. This confirms that the integration of generative AI into numerical analysis workflows in geotechnical engineering can be a powerful tool for accelerating data generation, improving efficiency, and minimising errors.
The study highlights the methodology’s adaptability for complex geotechnical scenarios, including 3D modelling, and its scalability through advanced parametric studies and automation tools. It also emphasises the importance of rigorous validation and expert oversight to ensure the reliability of AI-generated scripts and data sets. While the current study addresses inherent uncertainties and parameter variabilities through extensive validation and refinement, future research should explore the integration of additional complexity, such as 3D effects and heterogeneous geological conditions, to further enhance applicability and robustness.
By combining automation with expert validation, this methodology opens new opportunities for geotechnical practitioners to engage in data-driven modelling, bridging the gap between traditional workflows and AI-powered engineering applications.
The author sincerely thank JK Geotechnics for their continued support and encouragement throughout this study. Their commitment to innovation and advancement in geotechnical engineering has been instrumental in the successful development and application of this framework.





