Expert Coaches Examine ChatGPT’s Ability To Create Running Training Plans

Can we replace expert running coaches with technology?

Last Updated: Mar 27, 2024 3:16 pm

Ben Gibbons

Ben Gibbons

Qualified Personal Trainer, Sports Massage Therapist, UESCA Running Coach

Here's My Full Profile

learn about our editorial process

Reviewed by Katelyn Tocci

Katelyn Tocci

Katelyn Tocci is our Head Coach and Training Editor; 100-mile ultrarunner, RRCA + UESCA Certified Running Coach

Here's My Full Profile

learn about our editorial process

ChatGPT, the artificial intelligence (AI) marvel that has swept the world in the past year, has emerged as a potential tool for runners seeking to optimize their training plans. Can this digital coach go the extra mile, or will it just hit the wall?

The growing potential of AI goes beyond just having interesting conversations; researchers have already been studying how well ChatGPT can respond to medical inquiries.

Studies¹ examining ChatGPT’s performance in healthcare settings show that it can provide responses of similar quality to those given by medical professionals, although there is still room for improvement.

Similarly, in running, novice athletes without professional coaching may turn to ChatGPT for guidance on training plans.

With its ability to process vast amounts of data, ChatGPT holds immense potential to revolutionize the creation of training plans for runners.

While ChatGPT processes academic and non-academic sources, it does not differentiate well based on evidence levels, potentially incorporating misconceptions from unreliable sources.

For example, in a training program for runners, ChatGPT regularly recommends the importance of stretching. However, as we covered recently, the scientific evidence for stretching and its effects on performance and injury is poor.

So, the critical question looms: How reliable is ChatGPT in generating training plans, and can they be improved based on additional inputs?

Credit: Marathon Handbook staff

Against this backdrop of technological promise, a new study published in the Journal of Sports Science and Medicine² seeks to unravel the mysteries surrounding ChatGPT-generated training plans.

The objective of this study was clear: to evaluate the efficacy of ChatGPT-generated six-week training plans for runners, contingent upon the granularity of input information provided.

In this study, three distinct training plans were generated by ChatGPT.

Drawing upon 22 quality criteria derived from existing literature on training plan design for novice runners, coaching experts were tasked with evaluating the efficacy of these plans on a 1-5 Likert Scale.

A 1-5 Likert scale is a commonly used psychometric scale in which respondents are asked to indicate their level of agreement or disagreement with a statement by selecting a number on the scale.

In this study, 1 typically represents “Very Poor,” 2 represents “Poor,” 3 represents “Average,” 4 represents “Good,” and 5 represents “Very Good.”

A person holding their phone with ChatGPT open on the screen.

These criteria covered a range of primary and secondary considerations, such as:

Screening for individuals at increased risk for adverse exercise-related events.
Definition of a clear goal for the training plan.
Use of reliable and valid testing procedures to assess initial performance status and define training variables.
Implementation of a monitoring strategy, including internal load, external load, and contextual factors.
Definition of training type and specific variables like frequency, intensity, and volume.
Consideration of strategic variation in volume, intensity, and frequency (periodization).
Progression of training over time.
Nutritional aspects, including carbohydrate intake and hydration.
Incorporation of recovery procedures, such as sleep.
Attention to psychological skills, like motivation and fatigue management.
Consideration of skill acquisition aspects, such as running technique.

The initial prompts for ChatGPT to generate each training plan looked something like this:

“Please provide a running training plan for the next six weeks.”
“I’m a 20-year-old male running twice a week, 8 kilometers each session, aiming to improve performance using a smartwatch. Can you design a 6-week training plan?”
“I’m a 20-year-old male who has been running twice a week for a year, 8 kilometers each run, with a mean heart rate of 155-170 bpm. My goal is to boost performance by 3-5% in 6 weeks. I exclusively engage in long runs, have no health issues, and have access to a breathing gas analyzer and treadmill. Can you create a 6-week training plan?

For each training plan, the median Likert Scale ratings were analyzed to determine the overall quality assessment.

The assessment was made by ten coaching experts, each with at least a master’s degree in sports science and five years of coaching experience.

Training Plan 1 received the lowest median rating, with a score of 2, indicating suboptimal quality.

Training Plan 2 received a median rating of 3, indicating moderate quality. It was rated lower than Training Plan 3 for several criteria but showed improvement compared to Training Plan 1.

Training Plan 3 received the highest median rating, with a score of 4, indicating relatively higher quality compared to the other plans. It outperformed Training Plan 1 significantly and showed improvement over Training Plan 2 in certain aspects.

As the input information became more detailed, the quality of the training plans increased, with plan 3 outshining its predecessors in several key criteria.

However, despite the incremental improvement observed with additional input information, the overarching verdict remains clear: ChatGPT-generated training plans, while promising, fall short of optimal quality standards as assessed by coaching experts.

The disparity in ratings underscores the importance of understanding the intricacies of programming distance running training and highlights the pitfalls of relying solely on AI-generated plans without expert guidance.

Thus, the authors urged caution against using ChatGPT-generated training plans exclusively and advocated for the invaluable oversight of expert coaches in guiding runners.

ChatGPT’s Shortcomings

Although training plan 3 received the most favorable reviews, there were also significant areas for improvement that received a rating of lower than 3 (neutral).

Areas for improvement in training plan 3 included health screening, testing procedures, monitoring of contextual factors, progression of training frequency, and training of psychological skills and skill acquisition.

Additionally, there were suggestions for improvement in the progression of volume.

#1: The Human Side Of Coaching

While AI chatbots can provide valuable information and guidance for various aspects of running, the fundamental elements of the human-to-human relationship in coaching cannot be replicated or replaced. At least not yet.

One crucial aspect is the emotional support and understanding that a human coach offers.

For instance, when a runner faces a debilitating injury, it’s not just about the physical rehabilitation but also the mental and emotional strain that goes with it.

A coach can provide empathy, encouragement, and personalized advice to help the runner get through this challenging time.

Establishing a rapport with a runner by offering emotional support and encouragement throughout their training journey can not currently be replicated by an AI.

A coach can hold you accountable for your goals, celebrate your achievements, and provide constructive feedback. This personalized attention helps to build trust and commitment, fostering a strong coach-runner relationship based on mutual respect and collaboration.

Knowing that someone is invested in your progress and success can be a powerful motivator.

#2: Lack Of Personalization

Additionally, ChatGPT currently lacks individualization in training prescriptions beyond user-provided information.

How we run matters, and a coach may deem it risky to increase volume before they have worked with you to refine certain running techniques. ChatGPT can’t help there.

However, with the increasing availability of wearable technologies collecting physiological data, such as smart insoles, integrating such information into ChatGPT algorithms could lead to more personalized and effective training plans.

#3: Health Screening And Testing

The plan lacked comprehensive health screening protocols, which are crucial for identifying individuals at increased risk for exercise-related adverse events such as cardiovascular or pulmonary issues.

Usually, a human coach would conduct thorough assessments and screenings to ensure the safety of the runner, taking into account various health factors that may affect their training.

Secondly, the testing procedures in the plan were not clearly defined or comprehensive enough.

A human coach would typically employ reliable and valid testing protocols to assess the runner’s initial performance status and determine individual training variables accurately.

These tests might include assessments of aerobic capacity, lactate threshold, or biomechanical analyses to tailor the training plan effectively.

Is ChatGPT The Future Of Running Training Plans?

In the ever-evolving landscape of running technology, ChatGPT emerges as a formidable contender, offering a glimpse into the future of personalized training guidance.

Yet, as the study underscores, the path to excellence is fraught with challenges, uncertainties, and, frankly, poor advice.

While ChatGPT-generated training plans show promise, they fall short of optimal quality standards as assessed by coaching experts.

Aspiring runners are urged to tread cautiously, recognizing the invaluable role of expert coaches in navigating the complexities of training optimization.

As the journey unfolds, one thing remains clear: the intersection of technology and athleticism holds boundless potential, awaiting exploration and innovation.

While AI holds immense potential to revolutionize the coaching landscape, its limitations must be acknowledged and addressed. Informed decision-making, grounded in empirical evidence and expert insight, remains paramount.