Understand the path traversal bug in Python’s tarfile module
Understand the path traversal bug in Python’s tarfile module
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/63e39c465243fe3477bdaf36_633adf8720b9cf1d74d34970_Python_Traversal_bug.webp)
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/63e39c465243fe743cbdaf35_633adf8720b9cf1d74d34970_Python_Traversal_bug.webp)
Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.
This vulnerability could impact thousands of software projects yet many people are unfamiliar with the situation or how to handle it. That’s why, here at Secure Code Warrior, we’re giving you the opportunity to simulate exploiting this vulnerability yourself to see the impact first-hand and get some hands-on experience in the mechanics of this persistent bug, so you can better protect your application!
Try the simulated Mission now.
The vulnerability: path traversal during tar file extraction
Path or directory traversal happens when unsanitized user input is used to construct a file path, allowing an attacker to gain access to and overwrite files, and even execute arbitrary code.
The vulnerability exists in Python’s tarfile module. A tar (tape archive) file is a single file, called an archive. It packages together multiple files along with their metadata, and is usually recognized by having the .tar.gz or .tgz extension. Each member in the archive can be represented by a TarInfo object, which contains metadata, such as the file name, modification time, ownership, and more.
The risk arrises from the archives ability to be extracted again.
When being extracted, every member needs a path to be written to. This location is created by joining the base path with the file name:
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/633ade2fd9b979589aef0d58_python%20tarfile.png)
Once this path is created, it’s passed on to the tarfile.extract or tarfile.extractall functions to perform the extraction:
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/633addfd7eee8a289c573b9d_Tarfile.png)
The issue here is the lack of sanitization of the filename. An attacker could rename files to include path traversal characters, such as dot dot slash (../), which would cause the file to traverse out of the directory it was meant to be in and overwrite arbitrary files. This could eventually lead to remote code execution, which is ripe for exploitation.
The vulnerability appears throughout other scenarios, if you know how to identify it. In addition to Python’s handling of tar files, the vulnerability exists in the extraction of zip files. You may be familiar with this under another name, such as the zip slip vulnerability, which has manifested itself in languages other than Python!
How can you mitigate risk?
Despite the vulnerability being known for years, the Python maintainers consider the extraction functionality to be doing what it’s supposed to do. In this case, some may say “it’s a feature, not a bug.” Unfortunately, developers can’t always avoid extracting tar or zip files from an unknown source. It’s up to them to sanitize the untrusted input to prevent path traversal vulnerabilities as part of secure development practices.
Want to learn more about how to write secure code and mitigate risk with Python?
Try out our Python challenge for free.
If you’re interested in getting more free coding guidelines, check out Secure Code Coach to help you stay on top of secure coding practices.
Resources to get you started
Trust Agent by Secure Code Warrior
Discover SCW Trust Agent, an innovative solution designed to enhance security by aligning developer secure code knowledge and skills with the work they commit. It provides comprehensive visibility and controls across an organization's entire code repository, analyzing each commit against developers' secure code profiles. With SCW Trust Agent, organizations can strengthen their security posture, optimize development lifecycles, and scale developer-driven security.
Resources to get you started
Women in Security are Winning: How the AWSN is Setting Up a New Generation of Security Superwomen
Secure-by-Design is the latest initiative on everyone’s lips, and the Australian government, collaborating with CISA at the highest levels of global governance, is guiding a higher standard of software quality and security from vendors.
Women in Security are Winning: How the AWSN is Setting Up a New Generation of Security Superwomen
Secure-by-Design is the latest initiative on everyone’s lips, and the Australian government, collaborating with CISA at the highest levels of global governance, is guiding a higher standard of software quality and security from vendors.
SCW Trust Agent - Visibility and Control to Scale Developer Driven Security
SCW Trust Agent, introduced by Secure Code Warrior, offers security leaders the visibility and control needed to scale developer-driven security within organizations. By connecting to code repositories, it assesses code commit metadata, inspects developers, programming languages used, and shipment timestamps to determine developers' security knowledge.
Understand the path traversal bug in Python’s tarfile module
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/63e39c465243fe3477bdaf36_633adf8720b9cf1d74d34970_Python_Traversal_bug.webp)
Recently, a team of security researchers announced their finding of a fifteen year old bug in Python’s tar file extraction functionality. The vulnerability was first disclosed in 2007 and tracked as CVE-2007-4559. A note was added to the official Python documentation, but the bug itself was left unpatched.
This vulnerability could impact thousands of software projects yet many people are unfamiliar with the situation or how to handle it. That’s why, here at Secure Code Warrior, we’re giving you the opportunity to simulate exploiting this vulnerability yourself to see the impact first-hand and get some hands-on experience in the mechanics of this persistent bug, so you can better protect your application!
Try the simulated Mission now.
The vulnerability: path traversal during tar file extraction
Path or directory traversal happens when unsanitized user input is used to construct a file path, allowing an attacker to gain access to and overwrite files, and even execute arbitrary code.
The vulnerability exists in Python’s tarfile module. A tar (tape archive) file is a single file, called an archive. It packages together multiple files along with their metadata, and is usually recognized by having the .tar.gz or .tgz extension. Each member in the archive can be represented by a TarInfo object, which contains metadata, such as the file name, modification time, ownership, and more.
The risk arrises from the archives ability to be extracted again.
When being extracted, every member needs a path to be written to. This location is created by joining the base path with the file name:
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/633ade2fd9b979589aef0d58_python%20tarfile.png)
Once this path is created, it’s passed on to the tarfile.extract or tarfile.extractall functions to perform the extraction:
![](https://cdn.prod.website-files.com/5fec9210c1841a6c20c6ce81/633addfd7eee8a289c573b9d_Tarfile.png)
The issue here is the lack of sanitization of the filename. An attacker could rename files to include path traversal characters, such as dot dot slash (../), which would cause the file to traverse out of the directory it was meant to be in and overwrite arbitrary files. This could eventually lead to remote code execution, which is ripe for exploitation.
The vulnerability appears throughout other scenarios, if you know how to identify it. In addition to Python’s handling of tar files, the vulnerability exists in the extraction of zip files. You may be familiar with this under another name, such as the zip slip vulnerability, which has manifested itself in languages other than Python!
How can you mitigate risk?
Despite the vulnerability being known for years, the Python maintainers consider the extraction functionality to be doing what it’s supposed to do. In this case, some may say “it’s a feature, not a bug.” Unfortunately, developers can’t always avoid extracting tar or zip files from an unknown source. It’s up to them to sanitize the untrusted input to prevent path traversal vulnerabilities as part of secure development practices.
Want to learn more about how to write secure code and mitigate risk with Python?
Try out our Python challenge for free.
If you’re interested in getting more free coding guidelines, check out Secure Code Coach to help you stay on top of secure coding practices.
Resources to get you started
Women in Security are Winning: How the AWSN is Setting Up a New Generation of Security Superwomen
Secure-by-Design is the latest initiative on everyone’s lips, and the Australian government, collaborating with CISA at the highest levels of global governance, is guiding a higher standard of software quality and security from vendors.
SCW Trust Agent - Visibility and Control to Scale Developer Driven Security
SCW Trust Agent, introduced by Secure Code Warrior, offers security leaders the visibility and control needed to scale developer-driven security within organizations. By connecting to code repositories, it assesses code commit metadata, inspects developers, programming languages used, and shipment timestamps to determine developers' security knowledge.
Trust Agent by Secure Code Warrior
Discover SCW Trust Agent, an innovative solution designed to enhance security by aligning developer secure code knowledge and skills with the work they commit. It provides comprehensive visibility and controls across an organization's entire code repository, analyzing each commit against developers' secure code profiles. With SCW Trust Agent, organizations can strengthen their security posture, optimize development lifecycles, and scale developer-driven security.