AI-Assisted Coding

Last updated on 2026-04-30 | Edit this page

Overview

Questions

Objectives

The introduction of ChatGPT-4 in 2024 was a watershed moment for using natural language to communicate with computers. ChatGPT and rival models like Claude and Gemini provide human-like answers in response to increasingly complex prompts. These large-language models (LLMs) have come to dominate conversations about the future of knowledge, art, and tech.

What is a large-language model?

A full explanation of how LLMs work is beyond the scope of this lesson, but a simplified description may be useful. In short, ChatGPT, Claude, and Gemini are all transformer models. When a user submits a prompt, these models split the prompt into tokens, which are then converted into numerical representations called vectors.

A token is a fragment of data representing part of a word, image, or other data object
A vector is a numerical representation of the token

The vectorized prompt is then passed through a series of transformers, each of which examines and transfers information between elements of the prompt. As the prompt makes its way through the transformers, the model refines its interpreation, ultimately using this output to to predict the tokens that make up the response.

The process of generating an output from a prompt is called inference.

Challenges using LLMs

LLMs can be a useful tool but have significant limitations. Some risks associated with using LLMs include:

Responses may be inaccurate. LLMs famously hallucinate, that is, provide plausible but incorrect responses. Hallucinations are believed to be intrinsic to how these models are built and trained. Models can also inherit biases in their training data and do not reliably return exact text. Certain hallucinations in coding pose security risks, which we will discuss later.
Responses are non-deterministic. The same prompt may give different responses at different times, posing a challenge for reproducibility.
Information submitted to an LLM may be disclosed to other entities. Prompts may be added to the model and shared with other users. Paid accounts may protect submitted data, but LLMs can still leak information in other ways, for example, by revealing chat histories in response to an adversarial prompt. Disclosure of information to an LLM may also have intellectual property implications. As a rule, do not submit sensitive or confidential data to an LLM unless your organization has explicitly approved it for such use.
Over-reliance on LLMs may degrade associated cognitive skills. Using an LLM to learn a new skill may reduce independent performance and persistence.
Recent changes to billing practices may increase cost for complex workflows. Technology companies have subsidized the cost of using LLMs, particularly for heavy users, and a shift to usage-based billing is intended to bring user payments in line with compute costs.
The process of training LLMs raises serious ethical concerns. Training and using an LLM requires significant resources. Legal questions about how these models were trained persist. Large models require vast quantities of data, and large models haved frequently been found to have been trained on unlicensed, copyrighted materials. They may return copyrighted material without disclosing it.

Callout

Follow organizational policies

Many organizations have formal AI use policies. Familiarize yourself with expectations around AI in your organization and discipline before using it in research.

Knowing about these risks does not make you immune to them. As one example, the primary AI reporter for Ars Technica, a prominent technology blog, was recently fired for publishing hallucinated quotes generated by an AI tool despite being intimately familiar with these issues. Likewise, an ever-increasing number of lawyers are being disciplined for citing invented cases. Like journalists and lawyers, researchers are responsible for the accuracy of their work and must be vigilant in how they use AI and document its contributions to their research.

General strategies

Verify information provided by the LLM
Generate code, not information
Test results against validated data

Best practices for open science apply whenever you publish:

Retain and archive copies of your original data
Test code against multiple datasets, not just your exact data
Publish associated datasets in suitable repositories
Provide instructions for installation, testing, and use

Key Points