LLMs: An (im)perfectly human approach to secure coding?

cASE sTUDY

LLMs: An (im)perfectly human approach to secure coding?

Laptop in darkness with streaks of light coming from the screen.

A version of this article appeared in Dark Reading. It has been updated and syndicated here.

‍

From the first rumbles of hype for the latest, culture-shattering AI tools, developers and the coding-curious alike have been using them to generate code at the touch of a button. Security experts quickly pointed out that, in many cases, the code being produced was poor quality and vulnerable, and in the hands of those with little security awareness, could cause an avalanche of insecure apps and web development to hit unsuspecting consumers.

And then, there are those who have enough security knowledge to use it for, well, evil. For every mindblowing AI feat, it seems there is a counter-punch of the same technology being used for nefarious purposes. Phishing, deep fake scam videos, malware creation, general script kiddie shenanigans… these disruptive activities are achievable much faster, with lower barriers to entry.

There is certainly a lot of clickbait touting this tooling as revolutionary, or at least coming out on top when matched with “average” human skill. While it is looking inevitable that LLM-style AI technology will change the way we approach many aspects of work - not just software development - we must take a step back and consider the risks beyond the headlines.

And as a coding companion, its flaws are perhaps its most “human” attribute.

Poor coding patterns dominate its go-to solutions

With ChatGPT trained on decades of existing code and knowledge bases, it's no surprise that for all its marvel and mystery, it too suffers from the same common pitfalls people face when navigating code. Poor coding patterns are the go-to, and it still takes a security-aware driver to generate secure coding examples by asking the right questions and delivering the right prompt engineering.

Even then, there is no guarantee that the code snippets given are accurate and functional from a security perspective; the technology is prone to hallucination, even making up non-existent libraries when asked to perform some specific JSON operations, as discovered by Mike Shema. This could lead to “hallucination squatting” by threat actors, who would be all too happy to spin up some malware disguised as the fabricated library recommended with full confidence by ChatGPT.

Ultimately, we have to face the reality that, in general, we have not expected developers to be sufficiently security-aware, nor have we as an industry adequately prepared them to write secure code as a default state. This will be evident in the enormous amount of training data fed into ChatGPT, and we can expect similar lackluster security results from its output, at least initially. Developers would have to be able to identify the security bugs, and either fix them themselves or design better prompts for a more robust outcome.

The first large-scale user study examining how users interact with an AI coding assistant to solve a variety of security-related functions - conducted by researchers at Stanford University - supports this notion, with one observation concluding:

“We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet also more likely to rate their insecure answers as secure compared to those in our control group.”

This speaks to a level of default trust in the output of AI coding tools as producing code that is always inherently secure, when in fact it is not.

Between this and the inevitable AI-borne threats that will permeate our future, now more than ever, developers must hone their security skills and raise the bar for code quality no matter its origin.

The road to a data breach disaster is paved with good intentions

It should come as no surprise that AI coding companions are popular, especially as developers are faced with increasing responsibility, tighter deadlines, and the ambitions of a company’s innovation resting on their shoulders. However, even with the best intentions, a lack of actionable security awareness when using AI for coding will inevitably lead to glaring security problems. All developers with AI/ML tooling will generate more code, and its level of security risk will depend on their skill level. Organizations need to be acutely aware that untrained people will certainly generate code faster, but so too will they increase the speed of technical security debt.

Even our preliminary test (April 2023) with ChatGPT has revealed it will generate very basic mistakes that could have devastating consequences. When we asked it to build a login routine in PHP using a MySQL database, functional code was generated quickly. However, it defaulted to storing passwords in plaintext in a database, storing database connection credentials in code, and using a coding pattern that could result in SQL injection (although, it did do some level of filtering on the input parameters and spitting out database errors). All rookie errors by any measure:
‍

ChatGPT's recommendations are not necessarily secure, and can be dangerous in some instances.

‍
Further prompting ensured the mistakes were amended, but it takes significant security knowledge to course-correct. Unchecked and widespread use of these tools is no better than unleashing junior developers onto your projects, and if this code is building sensitive infrastructure or processing personal data, then we’re looking at a ticking time bomb.

Of course, just like junior developers undoubtedly increase their skills over time, we expect AI/ML capabilities to improve. A year from now, it may not make such obvious and simple security mistakes. However, that will have the effect of dramatically increasing the security skill required to track down the more serious, hidden, non-trivial security errors it is still in danger of producing.

We remain ill-prepared to find and fix security vulnerabilities, and AI widens the gap

While there has been much talk of “shifting left” for many years at this point, the fact remains that, for most organizations, there is a significant lack of practical security knowledge among the development cohort, and we must work harder to provide the right-fit tools and education to help them on their way.

As it stands, we’re not prepared for the security bugs we are accustomed to encountering, not to mention the new AI-borne issues like prompt injection and hallucination squatting that represent entirely new attack vectors that are set to take off like wildfire. AI coding tools do represent the future of a developer’s coding arsenal, but the education to wield these productivity weapons safely must come now.

‍

Want to learn more? Download our latest white paper.

View Resource

From the first rumbles of hype for the latest, culture-shattering AI tools, developers and the coding-curious alike have been using them to generate code at the touch of a button. Security experts quickly pointed out that, in many cases, the code being produced was poor quality and vulnerable, and in the hands of those with little security awareness, could cause an avalanche of insecure apps and web development to hit unsuspecting consumers.

And then, there are those who have enough security knowledge to use it for, well, evil. For every mindblowing AI feat, it seems there is a counter-punch of the same technology being used for nefarious purposes. Phishing, deep fake scam videos, malware creation, general script kiddie shenanigans… these disruptive activities are achievable much faster, with lower barriers to entry.

There is certainly a lot of clickbait touting this tooling as revolutionary, or at least coming out on top when matched with “average” human skill. While it is looking inevitable that LLM-style AI technology will change the way we approach many aspects of work - not just software development - we must take a step back and consider the risks beyond the headlines.

And as a coding companion, its flaws are perhaps its most “human” attribute.

Poor coding patterns dominate its go-to solutions

With ChatGPT trained on decades of existing code and knowledge bases, it's no surprise that for all its marvel and mystery, it too suffers from the same common pitfalls people face when navigating code. Poor coding patterns are the go-to, and it still takes a security-aware driver to generate secure coding examples by asking the right questions and delivering the right prompt engineering.

Even then, there is no guarantee that the code snippets given are accurate and functional from a security perspective; the technology is prone to hallucination, even making up non-existent libraries when asked to perform some specific JSON operations, as discovered by Mike Shema. This could lead to “hallucination squatting” by threat actors, who would be all too happy to spin up some malware disguised as the fabricated library recommended with full confidence by ChatGPT.

Ultimately, we have to face the reality that, in general, we have not expected developers to be sufficiently security-aware, nor have we as an industry adequately prepared them to write secure code as a default state. This will be evident in the enormous amount of training data fed into ChatGPT, and we can expect similar lackluster security results from its output, at least initially. Developers would have to be able to identify the security bugs, and either fix them themselves or design better prompts for a more robust outcome.

The first large-scale user study examining how users interact with an AI coding assistant to solve a variety of security-related functions - conducted by researchers at Stanford University - supports this notion, with one observation concluding:

“We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet also more likely to rate their insecure answers as secure compared to those in our control group.”

This speaks to a level of default trust in the output of AI coding tools as producing code that is always inherently secure, when in fact it is not.

Between this and the inevitable AI-borne threats that will permeate our future, now more than ever, developers must hone their security skills and raise the bar for code quality no matter its origin.

The road to a data breach disaster is paved with good intentions

It should come as no surprise that AI coding companions are popular, especially as developers are faced with increasing responsibility, tighter deadlines, and the ambitions of a company’s innovation resting on their shoulders. However, even with the best intentions, a lack of actionable security awareness when using AI for coding will inevitably lead to glaring security problems. All developers with AI/ML tooling will generate more code, and its level of security risk will depend on their skill level. Organizations need to be acutely aware that untrained people will certainly generate code faster, but so too will they increase the speed of technical security debt.

Even our preliminary test (April 2023) with ChatGPT has revealed it will generate very basic mistakes that could have devastating consequences. When we asked it to build a login routine in PHP using a MySQL database, functional code was generated quickly. However, it defaulted to storing passwords in plaintext in a database, storing database connection credentials in code, and using a coding pattern that could result in SQL injection (although, it did do some level of filtering on the input parameters and spitting out database errors). All rookie errors by any measure:
‍

‍
Further prompting ensured the mistakes were amended, but it takes significant security knowledge to course-correct. Unchecked and widespread use of these tools is no better than unleashing junior developers onto your projects, and if this code is building sensitive infrastructure or processing personal data, then we’re looking at a ticking time bomb.

Of course, just like junior developers undoubtedly increase their skills over time, we expect AI/ML capabilities to improve. A year from now, it may not make such obvious and simple security mistakes. However, that will have the effect of dramatically increasing the security skill required to track down the more serious, hidden, non-trivial security errors it is still in danger of producing.

We remain ill-prepared to find and fix security vulnerabilities, and AI widens the gap

While there has been much talk of “shifting left” for many years at this point, the fact remains that, for most organizations, there is a significant lack of practical security knowledge among the development cohort, and we must work harder to provide the right-fit tools and education to help them on their way.

As it stands, we’re not prepared for the security bugs we are accustomed to encountering, not to mention the new AI-borne issues like prompt injection and hallucination squatting that represent entirely new attack vectors that are set to take off like wildfire. AI coding tools do represent the future of a developer’s coding arsenal, but the education to wield these productivity weapons safely must come now.

‍

Author

Pieter Danhieux

Pieter Danhieux is a globally recognized security expert, with over 12 years experience as a security consultant and 8 years as a Principal Instructor for SANS teaching offensive techniques on how to target and assess organizations, systems and individuals for security weaknesses. In 2016, he was recognized as one of the Coolest Tech people in Australia (Business Insider), awarded Cyber Security Professional of the Year (AISA - Australian Information Security Association) and holds GSE, CISSP, GCIH, GCFA, GSEC, GPEN, GWAPT, GCIA certifications.

The Resource Hub

Dive into onto our latest secure coding insights on the blog.

Our extensive resource library aims to empower the human approach to secure coding upskilling.

View Blog

Want more?

Get the latest research on developer-driven security

Our extensive resource library is full of helpful resources from whitepapers to webinars to get you started with developer-driven secure coding. Explore it now.

Resource Hub

LLMs: An (im)perfectly human approach to secure coding?

LLMs: An (im)perfectly human approach to secure coding?

Poor coding patterns dominate its go-to solutions

The road to a data breach disaster is paved with good intentions

We remain ill-prepared to find and fix security vulnerabilities, and AI widens the gap

Table of contents

Poor coding patterns dominate its go-to solutions

The road to a data breach disaster is paved with good intentions

We remain ill-prepared to find and fix security vulnerabilities, and AI widens the gap

Pieter Danhieux

Related Articles We Recommend

Top 10 predictions for 2024

Secure Code Warrior 2024 predictions in cybersecurity

Why developers need security skills to effectively navigate AI development tools

Predicts 2024: Generative AI is reshaping software engineering

Going beyond compliance: How Secure Code Warrior empowered Netskope developers to code cloud solutions at scale

Related Articles We Recommend