Tame the snake: Snyk shines a spotlight on Python security
The programming language Python has become increasingly popular, partly due to its ecosystem of machine-learning software libraries like NumPy, Pandas, Google's TensorFlow, and Facebook's PyTorch. According to the TIOBE Index, Python is now the most popular programming language in the world, outranking both Java and C.
As Snyk reviewed the threats and vulnerabilities relating to Python, it found that, "on average, 60 new Python vulnerabilities are added to Snyk's vulnerability database each month", with almost a third of them classed as critical or of high severity (5% are critical severity issues, 27% are high severity, 56% are medium severity and 12% are low severity). Fortunately, it also found that most vulnerabilities found could be quickly remediated in projects. Some 87% of vulnerabilities can be fixed by moving to updated packages. Similarly, most container vulnerabilities can be closed by using slimmer packages.
These reparatory actions are needed in many projects. In over 60% of Python projects, code-related elements of the OWASP Top 10 2021 list of issues can be found. These types of issues can result in attackers injecting client-side scripts into websites (XSS), user supplied strings are used to construct SQL queries and could be used for SQL injection attacks, and certificate verification is sometimes disabled, which makes man-in-the-middle attacks possible.
"When looking at some of the security issues found in Python projects, issues relating to interactions with external resources such as file or network streams are just some of the types regularly identified," says Daniel Berman, Product Marketing Director at Snyk. "Python developers seem a little less disciplined in calling the close functions to flush the content of the system as well as to free any handles."
Dependencies are widening the attack surface
Another area to highlight is the content of a typical Python repository, which today consists of much more than just the Python code written by the developer. Commonly found elements are open-source packages and container images, as well as configuration files used for provisioning the infrastructure required to run them. From a security perspective, this increasing volume means that the attack surface of projects is becoming wider over time.
Snyk found that an average Python project has around 35 dependencies. Of these, 17 are direct dependencies and 18 are indirect dependencies. In 47% of these projects, the dependencies introduce vulnerabilities. An average vulnerable project consists of 33 known vulnerabilities, out of which 10% are critical severity vulnerabilities, 26% are high severity, 26% medium severity and 28% low severity.
"As applications get more complex, so does the task of securing them," says Daniel Berman. "Malicious actors have a wide variety of attack vectors to use when attacking a Python app, whether via known vulnerabilities introduced via direct or indirect dependencies, security uses in the app's proprietary code, or container vulnerabilities."
Reviewing the top security issues found in application code has led Snyk to outline six general points of advice for Python developers:
1. Use modern static code analysis: Linters like Pylint and scanners like Bandit are good start. But nasty problems are interfile (i.e. the issues occur as the application execution flows between various source files). Finding these kinds of issues manually is near impossible.
2. Sanitization of data: Try to sanitize inflowing data from any external sources (including databases) at the entry point in the application.
3. ORM: Use modern Object Relational Mapping (ORM) tools to abstract database interactions and prevent SQL injection opportunities. If you are using packages like Django or Flask, use libraries like Django ORM or SQLAlchemy which are well-vetted.
4. Unicode: If possible, standardize all strings to a single Unicode encoding – we recommend UTF-8. Be careful when converting Unicode strings into ASCII.
5. Close APIs: Make sure to close your network connections (e.g., external read and writes). This ensures that data written in their buffers are actually stored, the state is stored correctly, and it frees up handles in your system.
6. Guard your secrets: This is not Python specific, but it's common to see secrets like usernames, passwords, API access tokens, and also file paths or file names leak into the source code. It is a good practice to keep them in separate files, or better yet, secret stores like HashiCorp Vault, AWS Key Management Service, etc.
For more information, visit https://snyk.io/python-security-insights/