What is Trojan Source and how does it sneak into your source code
In the beginning of November, the University of Cambridge released their research called Trojan-Source. This research focused on how backdoors can be hidden in source code and comments, using directional formatting characters. These can be used to craft code for which the logic is interpreted differently by the compiler than a human code reviewer.
This vulnerability is new—although Unicode has been used nefariously in the past, such as by hiding the true filename extension of a file by reversing the direction of the last part of a filename. The recent research revealed that many compilers will ignore Unicode characters in source code without warning whereas text editors, including code editors, may reflow lines containing comments and code based on them. Thus the editor may display the code and comments differently, and in a different order, from how the compiler will parse it—even interchanging code and comments.
Read on to find out more. Or if you would like to roll up your sleeves and try out simulated hacking of Trojan Source, jump into our free and public mission to experience it for yourself.
Bidirectional text
One of these Trojan-Source attacks makes use of the Unicode Bidi (bi-directional) algorithm, which handles how to put together text with a different display order, such as English (left to right) and Arabic (right to left). Directional formatting characters can be used to reorganize the grouping and display the order of characters.
The table above contains some of the Bidi override characters relevant to the attack. Take for example,
RLI e d o c PDI
The abbreviation RLI stands for Right-to-Left Isolate. It will isolate the text from its context (delimited by PDI, Pop-Directional-Isolate), and will read it from right to left. Resulting in:
c o d e
Compilers and interpreters, however, do not typically process formatting control characters, including Bidi overrides, prior to parsing source code. If they simply ignore the directional formatting characters, they’ll parse:
e d o c
Old wine in new bottles?
Of course, this is nothing new under the sun. In the past, directional formatting characters have been inserted in file names to disguise their malicious nature. An email attachment displayed as 'myspecialexe.doc' might look innocent enough, were it not for the RLO (Right-to-Left override) character present which reveals the real name to be 'myspecialcod.exe'.
The Trojan Source attack inserts directional formatting characters in comments and strings present in the source code, as these won't generate any syntax or compiling errors. These control characters change the display order of logic of the code causing the compiler to read something entirely different than a human would.
For example a file containing the following bytes in this order:

will be reordered as follows by the directional formatting characters

causing the code to be rendered like this if directional formatting characters are not explicitly called out:

The RLO flips the closing brace to an opening brace, and vice versa in the last line. The result of executing this code would be: “You are an admin”. The admin check was commented out, however the control characters give the impression it was still present.
(Source: https://github.com/nickboucher/trojan-source/blob/main/C%23/commenting-out.csx)
How could this affect you?
Many languages are vulnerable to the attack: C, C++, C#, JavaScript, Java, Rust, Go, and Python, and it is assumed that there are more. Now, the average developer might frown upon seeing directional formatting characters in source code, but a novice might just as well shrug their shoulders and think nothing of it. Moreover, the visualization of these characters is highly IDE dependent, so it's never a guarantee they’ll be spotted.
But how could this vulnerability sneak into the source code in the first place? First and foremost, this can happen when using source code from untrustworthy sources, where malicious code contributions have gone unnoticed. Secondly, it could happen by a simple copy-paste from code found on the internet, something most of us developers have done before. Most organizations rely on software components from multiple vendors. This poses the question to what extent can we fully trust and rely on this code? How can we screen for source code that contains hidden backdoors?
Whose problem is it?
On the one hand, compilers and build pipelines should disallow source code lines with more than one direction, unless one direction is strictly limited to strings and comments. Note that a directional formatting character in a string or comment can, if not popped, extend a direction change until the end of the line. In general, code editors should explicitly render and highlight suspicious Unicode characters, such as homoglyphs and directional formatting characters. Since November, GitHub now adds a warning sign and message to every line of code containing bi-directional unicode text, although it does not highlight where in the line these characters are. This may still allow malicious direction changes to sneak in along with benign direction changes.
Awareness amongst developers and code reviewers is essential, which is why we have created a walkthrough illustrating the vulnerability. Currently this walkthrough is available for Java, C#, Python, GO, and PHP.
So if you want to know more, try out our simulation (public missions) of Trojan Source, and read the Trojan Source research.

Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
Book a demoLaura Verheyde is a software developer at Secure Code Warrior focused on researching vulnerabilities and creating content for Missions and Coding labs.


In the beginning of November, the University of Cambridge released their research called Trojan-Source. This research focused on how backdoors can be hidden in source code and comments, using directional formatting characters. These can be used to craft code for which the logic is interpreted differently by the compiler than a human code reviewer.
This vulnerability is new—although Unicode has been used nefariously in the past, such as by hiding the true filename extension of a file by reversing the direction of the last part of a filename. The recent research revealed that many compilers will ignore Unicode characters in source code without warning whereas text editors, including code editors, may reflow lines containing comments and code based on them. Thus the editor may display the code and comments differently, and in a different order, from how the compiler will parse it—even interchanging code and comments.
Read on to find out more. Or if you would like to roll up your sleeves and try out simulated hacking of Trojan Source, jump into our free and public mission to experience it for yourself.
Bidirectional text
One of these Trojan-Source attacks makes use of the Unicode Bidi (bi-directional) algorithm, which handles how to put together text with a different display order, such as English (left to right) and Arabic (right to left). Directional formatting characters can be used to reorganize the grouping and display the order of characters.
The table above contains some of the Bidi override characters relevant to the attack. Take for example,
RLI e d o c PDI
The abbreviation RLI stands for Right-to-Left Isolate. It will isolate the text from its context (delimited by PDI, Pop-Directional-Isolate), and will read it from right to left. Resulting in:
c o d e
Compilers and interpreters, however, do not typically process formatting control characters, including Bidi overrides, prior to parsing source code. If they simply ignore the directional formatting characters, they’ll parse:
e d o c
Old wine in new bottles?
Of course, this is nothing new under the sun. In the past, directional formatting characters have been inserted in file names to disguise their malicious nature. An email attachment displayed as 'myspecialexe.doc' might look innocent enough, were it not for the RLO (Right-to-Left override) character present which reveals the real name to be 'myspecialcod.exe'.
The Trojan Source attack inserts directional formatting characters in comments and strings present in the source code, as these won't generate any syntax or compiling errors. These control characters change the display order of logic of the code causing the compiler to read something entirely different than a human would.
For example a file containing the following bytes in this order:

will be reordered as follows by the directional formatting characters

causing the code to be rendered like this if directional formatting characters are not explicitly called out:

The RLO flips the closing brace to an opening brace, and vice versa in the last line. The result of executing this code would be: “You are an admin”. The admin check was commented out, however the control characters give the impression it was still present.
(Source: https://github.com/nickboucher/trojan-source/blob/main/C%23/commenting-out.csx)
How could this affect you?
Many languages are vulnerable to the attack: C, C++, C#, JavaScript, Java, Rust, Go, and Python, and it is assumed that there are more. Now, the average developer might frown upon seeing directional formatting characters in source code, but a novice might just as well shrug their shoulders and think nothing of it. Moreover, the visualization of these characters is highly IDE dependent, so it's never a guarantee they’ll be spotted.
But how could this vulnerability sneak into the source code in the first place? First and foremost, this can happen when using source code from untrustworthy sources, where malicious code contributions have gone unnoticed. Secondly, it could happen by a simple copy-paste from code found on the internet, something most of us developers have done before. Most organizations rely on software components from multiple vendors. This poses the question to what extent can we fully trust and rely on this code? How can we screen for source code that contains hidden backdoors?
Whose problem is it?
On the one hand, compilers and build pipelines should disallow source code lines with more than one direction, unless one direction is strictly limited to strings and comments. Note that a directional formatting character in a string or comment can, if not popped, extend a direction change until the end of the line. In general, code editors should explicitly render and highlight suspicious Unicode characters, such as homoglyphs and directional formatting characters. Since November, GitHub now adds a warning sign and message to every line of code containing bi-directional unicode text, although it does not highlight where in the line these characters are. This may still allow malicious direction changes to sneak in along with benign direction changes.
Awareness amongst developers and code reviewers is essential, which is why we have created a walkthrough illustrating the vulnerability. Currently this walkthrough is available for Java, C#, Python, GO, and PHP.
So if you want to know more, try out our simulation (public missions) of Trojan Source, and read the Trojan Source research.

In the beginning of November, the University of Cambridge released their research called Trojan-Source. This research focused on how backdoors can be hidden in source code and comments, using directional formatting characters. These can be used to craft code for which the logic is interpreted differently by the compiler than a human code reviewer.
This vulnerability is new—although Unicode has been used nefariously in the past, such as by hiding the true filename extension of a file by reversing the direction of the last part of a filename. The recent research revealed that many compilers will ignore Unicode characters in source code without warning whereas text editors, including code editors, may reflow lines containing comments and code based on them. Thus the editor may display the code and comments differently, and in a different order, from how the compiler will parse it—even interchanging code and comments.
Read on to find out more. Or if you would like to roll up your sleeves and try out simulated hacking of Trojan Source, jump into our free and public mission to experience it for yourself.
Bidirectional text
One of these Trojan-Source attacks makes use of the Unicode Bidi (bi-directional) algorithm, which handles how to put together text with a different display order, such as English (left to right) and Arabic (right to left). Directional formatting characters can be used to reorganize the grouping and display the order of characters.
The table above contains some of the Bidi override characters relevant to the attack. Take for example,
RLI e d o c PDI
The abbreviation RLI stands for Right-to-Left Isolate. It will isolate the text from its context (delimited by PDI, Pop-Directional-Isolate), and will read it from right to left. Resulting in:
c o d e
Compilers and interpreters, however, do not typically process formatting control characters, including Bidi overrides, prior to parsing source code. If they simply ignore the directional formatting characters, they’ll parse:
e d o c
Old wine in new bottles?
Of course, this is nothing new under the sun. In the past, directional formatting characters have been inserted in file names to disguise their malicious nature. An email attachment displayed as 'myspecialexe.doc' might look innocent enough, were it not for the RLO (Right-to-Left override) character present which reveals the real name to be 'myspecialcod.exe'.
The Trojan Source attack inserts directional formatting characters in comments and strings present in the source code, as these won't generate any syntax or compiling errors. These control characters change the display order of logic of the code causing the compiler to read something entirely different than a human would.
For example a file containing the following bytes in this order:

will be reordered as follows by the directional formatting characters

causing the code to be rendered like this if directional formatting characters are not explicitly called out:

The RLO flips the closing brace to an opening brace, and vice versa in the last line. The result of executing this code would be: “You are an admin”. The admin check was commented out, however the control characters give the impression it was still present.
(Source: https://github.com/nickboucher/trojan-source/blob/main/C%23/commenting-out.csx)
How could this affect you?
Many languages are vulnerable to the attack: C, C++, C#, JavaScript, Java, Rust, Go, and Python, and it is assumed that there are more. Now, the average developer might frown upon seeing directional formatting characters in source code, but a novice might just as well shrug their shoulders and think nothing of it. Moreover, the visualization of these characters is highly IDE dependent, so it's never a guarantee they’ll be spotted.
But how could this vulnerability sneak into the source code in the first place? First and foremost, this can happen when using source code from untrustworthy sources, where malicious code contributions have gone unnoticed. Secondly, it could happen by a simple copy-paste from code found on the internet, something most of us developers have done before. Most organizations rely on software components from multiple vendors. This poses the question to what extent can we fully trust and rely on this code? How can we screen for source code that contains hidden backdoors?
Whose problem is it?
On the one hand, compilers and build pipelines should disallow source code lines with more than one direction, unless one direction is strictly limited to strings and comments. Note that a directional formatting character in a string or comment can, if not popped, extend a direction change until the end of the line. In general, code editors should explicitly render and highlight suspicious Unicode characters, such as homoglyphs and directional formatting characters. Since November, GitHub now adds a warning sign and message to every line of code containing bi-directional unicode text, although it does not highlight where in the line these characters are. This may still allow malicious direction changes to sneak in along with benign direction changes.
Awareness amongst developers and code reviewers is essential, which is why we have created a walkthrough illustrating the vulnerability. Currently this walkthrough is available for Java, C#, Python, GO, and PHP.
So if you want to know more, try out our simulation (public missions) of Trojan Source, and read the Trojan Source research.

Click on the link below and download the PDF of this resource.
Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
View reportBook a demoLaura Verheyde is a software developer at Secure Code Warrior focused on researching vulnerabilities and creating content for Missions and Coding labs.
In the beginning of November, the University of Cambridge released their research called Trojan-Source. This research focused on how backdoors can be hidden in source code and comments, using directional formatting characters. These can be used to craft code for which the logic is interpreted differently by the compiler than a human code reviewer.
This vulnerability is new—although Unicode has been used nefariously in the past, such as by hiding the true filename extension of a file by reversing the direction of the last part of a filename. The recent research revealed that many compilers will ignore Unicode characters in source code without warning whereas text editors, including code editors, may reflow lines containing comments and code based on them. Thus the editor may display the code and comments differently, and in a different order, from how the compiler will parse it—even interchanging code and comments.
Read on to find out more. Or if you would like to roll up your sleeves and try out simulated hacking of Trojan Source, jump into our free and public mission to experience it for yourself.
Bidirectional text
One of these Trojan-Source attacks makes use of the Unicode Bidi (bi-directional) algorithm, which handles how to put together text with a different display order, such as English (left to right) and Arabic (right to left). Directional formatting characters can be used to reorganize the grouping and display the order of characters.
The table above contains some of the Bidi override characters relevant to the attack. Take for example,
RLI e d o c PDI
The abbreviation RLI stands for Right-to-Left Isolate. It will isolate the text from its context (delimited by PDI, Pop-Directional-Isolate), and will read it from right to left. Resulting in:
c o d e
Compilers and interpreters, however, do not typically process formatting control characters, including Bidi overrides, prior to parsing source code. If they simply ignore the directional formatting characters, they’ll parse:
e d o c
Old wine in new bottles?
Of course, this is nothing new under the sun. In the past, directional formatting characters have been inserted in file names to disguise their malicious nature. An email attachment displayed as 'myspecialexe.doc' might look innocent enough, were it not for the RLO (Right-to-Left override) character present which reveals the real name to be 'myspecialcod.exe'.
The Trojan Source attack inserts directional formatting characters in comments and strings present in the source code, as these won't generate any syntax or compiling errors. These control characters change the display order of logic of the code causing the compiler to read something entirely different than a human would.
For example a file containing the following bytes in this order:

will be reordered as follows by the directional formatting characters

causing the code to be rendered like this if directional formatting characters are not explicitly called out:

The RLO flips the closing brace to an opening brace, and vice versa in the last line. The result of executing this code would be: “You are an admin”. The admin check was commented out, however the control characters give the impression it was still present.
(Source: https://github.com/nickboucher/trojan-source/blob/main/C%23/commenting-out.csx)
How could this affect you?
Many languages are vulnerable to the attack: C, C++, C#, JavaScript, Java, Rust, Go, and Python, and it is assumed that there are more. Now, the average developer might frown upon seeing directional formatting characters in source code, but a novice might just as well shrug their shoulders and think nothing of it. Moreover, the visualization of these characters is highly IDE dependent, so it's never a guarantee they’ll be spotted.
But how could this vulnerability sneak into the source code in the first place? First and foremost, this can happen when using source code from untrustworthy sources, where malicious code contributions have gone unnoticed. Secondly, it could happen by a simple copy-paste from code found on the internet, something most of us developers have done before. Most organizations rely on software components from multiple vendors. This poses the question to what extent can we fully trust and rely on this code? How can we screen for source code that contains hidden backdoors?
Whose problem is it?
On the one hand, compilers and build pipelines should disallow source code lines with more than one direction, unless one direction is strictly limited to strings and comments. Note that a directional formatting character in a string or comment can, if not popped, extend a direction change until the end of the line. In general, code editors should explicitly render and highlight suspicious Unicode characters, such as homoglyphs and directional formatting characters. Since November, GitHub now adds a warning sign and message to every line of code containing bi-directional unicode text, although it does not highlight where in the line these characters are. This may still allow malicious direction changes to sneak in along with benign direction changes.
Awareness amongst developers and code reviewers is essential, which is why we have created a walkthrough illustrating the vulnerability. Currently this walkthrough is available for Java, C#, Python, GO, and PHP.
So if you want to know more, try out our simulation (public missions) of Trojan Source, and read the Trojan Source research.
Table of contents

Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
Book a demoDownloadResources to get you started
Secure by Design: Defining Best Practices, Enabling Developers and Benchmarking Preventative Security Outcomes
In this research paper, Secure Code Warrior co-founders, Pieter Danhieux and Dr. Matias Madou, Ph.D., along with expert contributors, Chris Inglis, Former US National Cyber Director (now Strategic Advisor to Paladin Capital Group), and Devin Lynch, Senior Director, Paladin Global Institute, will reveal key findings from over twenty in-depth interviews with enterprise security leaders including CISOs, a VP of Application Security, and software security professionals.
Benchmarking Security Skills: Streamlining Secure-by-Design in the Enterprise
Finding meaningful data on the success of Secure-by-Design initiatives is notoriously difficult. CISOs are often challenged when attempting to prove the return on investment (ROI) and business value of security program activities at both the people and company levels. Not to mention, it’s particularly difficult for enterprises to gain insights into how their organizations are benchmarked against current industry standards. The President’s National Cybersecurity Strategy challenged stakeholders to “embrace security and resilience by design.” The key to making Secure-by-Design initiatives work is not only giving developers the skills to ensure secure code, but also assuring the regulators that those skills are in place. In this presentation, we share a myriad of qualitative and quantitative data, derived from multiple primary sources, including internal data points collected from over 250,000 developers, data-driven customer insights, and public studies. Leveraging this aggregation of data points, we aim to communicate a vision of the current state of Secure-by-Design initiatives across multiple verticals. The report details why this space is currently underutilized, the significant impact a successful upskilling program can have on cybersecurity risk mitigation, and the potential to eliminate categories of vulnerabilities from a codebase.
Secure code training topics & content
Our industry-leading content is always evolving to fit the ever changing software development landscape with your role in mind. Topics covering everything from AI to XQuery Injection, offered for a variety of roles from Architects and Engineers to Product Managers and QA. Get a sneak peak of what our content catalog has to offer by topic and role.
Resources to get you started
Revealed: How the Cyber Industry Defines Secure by Design
In our latest white paper, our Co-Founders, Pieter Danhieux and Dr. Matias Madou, Ph.D., sat down with over twenty enterprise security leaders, including CISOs, AppSec leaders and security professionals, to figure out the key pieces of this puzzle and uncover the reality behind the Secure by Design movement. It’s a shared ambition across the security teams, but no shared playbook.
Is Vibe Coding Going to Turn Your Codebase Into a Frat Party?
Vibe coding is like a college frat party, and AI is the centerpiece of all the festivities, the keg. It’s a lot of fun to let loose, get creative, and see where your imagination can take you, but after a few keg stands, drinking (or, using AI) in moderation is undoubtedly the safer long-term solution.