Understand the path traversal bug in Python’s tarfile module
Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.
This vulnerability could impact thousands of software projects yet many people are unfamiliar with the situation or how to handle it. That’s why, here at Secure Code Warrior, we’re giving you the opportunity to simulate exploiting this vulnerability yourself to see the impact first-hand and get some hands-on experience in the mechanics of this persistent bug, so you can better protect your application!
Try the simulated Mission now.
The vulnerability: path traversal during tar file extraction
Path or directory traversal happens when unsanitized user input is used to construct a file path, allowing an attacker to gain access to and overwrite files, and even execute arbitrary code.
The vulnerability exists in Python’s tarfile module. A tar (tape archive) file is a single file, called an archive. It packages together multiple files along with their metadata, and is usually recognized by having the .tar.gz or .tgz extension. Each member in the archive can be represented by a TarInfo object, which contains metadata, such as the file name, modification time, ownership, and more.
The risk arrises from the archives ability to be extracted again.
When being extracted, every member needs a path to be written to. This location is created by joining the base path with the file name:

Once this path is created, it’s passed on to the tarfile.extract or tarfile.extractall functions to perform the extraction:

The issue here is the lack of sanitization of the filename. An attacker could rename files to include path traversal characters, such as dot dot slash (../), which would cause the file to traverse out of the directory it was meant to be in and overwrite arbitrary files. This could eventually lead to remote code execution, which is ripe for exploitation.
The vulnerability appears throughout other scenarios, if you know how to identify it. In addition to Python’s handling of tar files, the vulnerability exists in the extraction of zip files. You may be familiar with this under another name, such as the zip slip vulnerability, which has manifested itself in languages other than Python!
How can you mitigate risk?
Despite the vulnerability being known for years, the Python maintainers consider the extraction functionality to be doing what it’s supposed to do. In this case, some may say “it’s a feature, not a bug.” Unfortunately, developers can’t always avoid extracting tar or zip files from an unknown source. It’s up to them to sanitize the untrusted input to prevent path traversal vulnerabilities as part of secure development practices.
Want to learn more about how to write secure code and mitigate risk with Python?
Try out our Python challenge for free.
If you’re interested in getting more free coding guidelines, check out Secure Code Coach to help you stay on top of secure coding practices.


Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.

Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
Book a demoLaura Verheyde is a software developer at Secure Code Warrior focused on researching vulnerabilities and creating content for Missions and Coding labs.


Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.
This vulnerability could impact thousands of software projects yet many people are unfamiliar with the situation or how to handle it. That’s why, here at Secure Code Warrior, we’re giving you the opportunity to simulate exploiting this vulnerability yourself to see the impact first-hand and get some hands-on experience in the mechanics of this persistent bug, so you can better protect your application!
Try the simulated Mission now.
The vulnerability: path traversal during tar file extraction
Path or directory traversal happens when unsanitized user input is used to construct a file path, allowing an attacker to gain access to and overwrite files, and even execute arbitrary code.
The vulnerability exists in Python’s tarfile module. A tar (tape archive) file is a single file, called an archive. It packages together multiple files along with their metadata, and is usually recognized by having the .tar.gz or .tgz extension. Each member in the archive can be represented by a TarInfo object, which contains metadata, such as the file name, modification time, ownership, and more.
The risk arrises from the archives ability to be extracted again.
When being extracted, every member needs a path to be written to. This location is created by joining the base path with the file name:

Once this path is created, it’s passed on to the tarfile.extract or tarfile.extractall functions to perform the extraction:

The issue here is the lack of sanitization of the filename. An attacker could rename files to include path traversal characters, such as dot dot slash (../), which would cause the file to traverse out of the directory it was meant to be in and overwrite arbitrary files. This could eventually lead to remote code execution, which is ripe for exploitation.
The vulnerability appears throughout other scenarios, if you know how to identify it. In addition to Python’s handling of tar files, the vulnerability exists in the extraction of zip files. You may be familiar with this under another name, such as the zip slip vulnerability, which has manifested itself in languages other than Python!
How can you mitigate risk?
Despite the vulnerability being known for years, the Python maintainers consider the extraction functionality to be doing what it’s supposed to do. In this case, some may say “it’s a feature, not a bug.” Unfortunately, developers can’t always avoid extracting tar or zip files from an unknown source. It’s up to them to sanitize the untrusted input to prevent path traversal vulnerabilities as part of secure development practices.
Want to learn more about how to write secure code and mitigate risk with Python?
Try out our Python challenge for free.
If you’re interested in getting more free coding guidelines, check out Secure Code Coach to help you stay on top of secure coding practices.

Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.
This vulnerability could impact thousands of software projects yet many people are unfamiliar with the situation or how to handle it. That’s why, here at Secure Code Warrior, we’re giving you the opportunity to simulate exploiting this vulnerability yourself to see the impact first-hand and get some hands-on experience in the mechanics of this persistent bug, so you can better protect your application!
Try the simulated Mission now.
The vulnerability: path traversal during tar file extraction
Path or directory traversal happens when unsanitized user input is used to construct a file path, allowing an attacker to gain access to and overwrite files, and even execute arbitrary code.
The vulnerability exists in Python’s tarfile module. A tar (tape archive) file is a single file, called an archive. It packages together multiple files along with their metadata, and is usually recognized by having the .tar.gz or .tgz extension. Each member in the archive can be represented by a TarInfo object, which contains metadata, such as the file name, modification time, ownership, and more.
The risk arrises from the archives ability to be extracted again.
When being extracted, every member needs a path to be written to. This location is created by joining the base path with the file name:

Once this path is created, it’s passed on to the tarfile.extract or tarfile.extractall functions to perform the extraction:

The issue here is the lack of sanitization of the filename. An attacker could rename files to include path traversal characters, such as dot dot slash (../), which would cause the file to traverse out of the directory it was meant to be in and overwrite arbitrary files. This could eventually lead to remote code execution, which is ripe for exploitation.
The vulnerability appears throughout other scenarios, if you know how to identify it. In addition to Python’s handling of tar files, the vulnerability exists in the extraction of zip files. You may be familiar with this under another name, such as the zip slip vulnerability, which has manifested itself in languages other than Python!
How can you mitigate risk?
Despite the vulnerability being known for years, the Python maintainers consider the extraction functionality to be doing what it’s supposed to do. In this case, some may say “it’s a feature, not a bug.” Unfortunately, developers can’t always avoid extracting tar or zip files from an unknown source. It’s up to them to sanitize the untrusted input to prevent path traversal vulnerabilities as part of secure development practices.
Want to learn more about how to write secure code and mitigate risk with Python?
Try out our Python challenge for free.
If you’re interested in getting more free coding guidelines, check out Secure Code Coach to help you stay on top of secure coding practices.

Click on the link below and download the PDF of this resource.
Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
View reportBook a demoLaura Verheyde is a software developer at Secure Code Warrior focused on researching vulnerabilities and creating content for Missions and Coding labs.
Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.
This vulnerability could impact thousands of software projects yet many people are unfamiliar with the situation or how to handle it. That’s why, here at Secure Code Warrior, we’re giving you the opportunity to simulate exploiting this vulnerability yourself to see the impact first-hand and get some hands-on experience in the mechanics of this persistent bug, so you can better protect your application!
Try the simulated Mission now.
The vulnerability: path traversal during tar file extraction
Path or directory traversal happens when unsanitized user input is used to construct a file path, allowing an attacker to gain access to and overwrite files, and even execute arbitrary code.
The vulnerability exists in Python’s tarfile module. A tar (tape archive) file is a single file, called an archive. It packages together multiple files along with their metadata, and is usually recognized by having the .tar.gz or .tgz extension. Each member in the archive can be represented by a TarInfo object, which contains metadata, such as the file name, modification time, ownership, and more.
The risk arrises from the archives ability to be extracted again.
When being extracted, every member needs a path to be written to. This location is created by joining the base path with the file name:

Once this path is created, it’s passed on to the tarfile.extract or tarfile.extractall functions to perform the extraction:

The issue here is the lack of sanitization of the filename. An attacker could rename files to include path traversal characters, such as dot dot slash (../), which would cause the file to traverse out of the directory it was meant to be in and overwrite arbitrary files. This could eventually lead to remote code execution, which is ripe for exploitation.
The vulnerability appears throughout other scenarios, if you know how to identify it. In addition to Python’s handling of tar files, the vulnerability exists in the extraction of zip files. You may be familiar with this under another name, such as the zip slip vulnerability, which has manifested itself in languages other than Python!
How can you mitigate risk?
Despite the vulnerability being known for years, the Python maintainers consider the extraction functionality to be doing what it’s supposed to do. In this case, some may say “it’s a feature, not a bug.” Unfortunately, developers can’t always avoid extracting tar or zip files from an unknown source. It’s up to them to sanitize the untrusted input to prevent path traversal vulnerabilities as part of secure development practices.
Want to learn more about how to write secure code and mitigate risk with Python?
Try out our Python challenge for free.
If you’re interested in getting more free coding guidelines, check out Secure Code Coach to help you stay on top of secure coding practices.
Table of contents

Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
Book a demoDownloadResources to get you started
Secure by Design: Defining Best Practices, Enabling Developers and Benchmarking Preventative Security Outcomes
In this research paper, Secure Code Warrior co-founders, Pieter Danhieux and Dr. Matias Madou, Ph.D., along with expert contributors, Chris Inglis, Former US National Cyber Director (now Strategic Advisor to Paladin Capital Group), and Devin Lynch, Senior Director, Paladin Global Institute, will reveal key findings from over twenty in-depth interviews with enterprise security leaders including CISOs, a VP of Application Security, and software security professionals.
Benchmarking Security Skills: Streamlining Secure-by-Design in the Enterprise
Finding meaningful data on the success of Secure-by-Design initiatives is notoriously difficult. CISOs are often challenged when attempting to prove the return on investment (ROI) and business value of security program activities at both the people and company levels. Not to mention, it’s particularly difficult for enterprises to gain insights into how their organizations are benchmarked against current industry standards. The President’s National Cybersecurity Strategy challenged stakeholders to “embrace security and resilience by design.” The key to making Secure-by-Design initiatives work is not only giving developers the skills to ensure secure code, but also assuring the regulators that those skills are in place. In this presentation, we share a myriad of qualitative and quantitative data, derived from multiple primary sources, including internal data points collected from over 250,000 developers, data-driven customer insights, and public studies. Leveraging this aggregation of data points, we aim to communicate a vision of the current state of Secure-by-Design initiatives across multiple verticals. The report details why this space is currently underutilized, the significant impact a successful upskilling program can have on cybersecurity risk mitigation, and the potential to eliminate categories of vulnerabilities from a codebase.
Secure code training topics & content
Our industry-leading content is always evolving to fit the ever changing software development landscape with your role in mind. Topics covering everything from AI to XQuery Injection, offered for a variety of roles from Architects and Engineers to Product Managers and QA. Get a sneak peak of what our content catalog has to offer by topic and role.
Resources to get you started
Revealed: How the Cyber Industry Defines Secure by Design
In our latest white paper, our Co-Founders, Pieter Danhieux and Dr. Matias Madou, Ph.D., sat down with over twenty enterprise security leaders, including CISOs, AppSec leaders and security professionals, to figure out the key pieces of this puzzle and uncover the reality behind the Secure by Design movement. It’s a shared ambition across the security teams, but no shared playbook.
Is Vibe Coding Going to Turn Your Codebase Into a Frat Party?
Vibe coding is like a college frat party, and AI is the centerpiece of all the festivities, the keg. It’s a lot of fun to let loose, get creative, and see where your imagination can take you, but after a few keg stands, drinking (or, using AI) in moderation is undoubtedly the safer long-term solution.