Where Have You Installed Your Python Packages?

  sonic0002        2023-12-17 01:03:45       12,160        0         

Preface

I am writing this article because I recently noticed in the Python community that there are several frequently asked questions:

  1. Why does running the command after installing pip result in a "executable not found" error?
  2. Why does importing a module result in a "ModuleNotFound" error?
  3. Why can I run my code in PyCharm, but it doesn't work in the command prompt?

Rather than just providing solutions, it is better to teach people how to fish. To address these types of issues, you need to understand how Python locates packages. I hope that after reading this article, you will find it helpful.

How Python Finds Packages

It's quite possible that your computer has more than one Python installation, along with multiple virtual environments. This can lead to overlooking the installation path when installing packages. Let's first address the issue of package location. The answer to this question is simple, but many people may not be aware of the underlying principles.

Suppose the path to your Python interpreter is $path_prefix/bin/python. When you launch the Python interactive environment or run a script using this interpreter, it will default to searching the following locations:

  1. $path_prefix/lib (standard library path)
  2. $path_prefix/lib/pythonX.Y/site-packages (third-party library path, where X.Y corresponds to the major and minor version of Python, such as 3.7 or 2.6)
  3. Current working directory (result of the pwd command)

If you are using the default Python on Linux, $path_prefix is likely to be /usr. If you have compiled Python with default options yourself, $path_prefix would be /usr/local. From the second point above, you can see that the paths for third-party libraries differ for different Python versions. If you upgrade Python from version 3.6 to 3.7, the third-party libraries installed for the previous version may no longer be usable. Of course, you can copy the entire folder, and in most cases, it won't cause any issues.

Several Useful Functions

  • sys.executable: Path to the currently used Python interpreter.
  • sys.path: List of paths searched for the current package.
  • sys.prefix: Current value of $path_prefix.

Additionally, running python -m site in the command line will print out some information about the current Python environment, including the search path list.

Example

>>> import sys
>>> sys.executable
'/home/frostming/.pyenv/versions/3.7.2/bin/python'
>>> sys.path
['', '/home/frostming/.pyenv/versions/3.7.2/lib/python37.zip', '/home/frostming/.pyenv/versions/3.7.2/lib/python3.7', '/home/frostming/.pyenv/versions/3.7.2/lib/python3.7/lib-dynload', '/home/frostming/.local/lib/python3.7/site-packages', '/mnt/d/Workspace/pipenv', '/home/frostming/.pyenv/versions/3.7.2/lib/python3.7/site-packages']
>>> sys.prefix
'/home/frostming/.pyenv/versions/3.7.2'

Adding Search Paths Using Environment Variables

If the path to your package is not included in the search path list mentioned above, you can add the path to the PYTHONPATH environment variable. Multiple paths should be separated by colons(semicolon on Windows).

However, be cautious not to include paths for packages of different Python versions in PYTHONPATH, such as PYTHONPATH=/home/frostming/.local/lib/python2.7/site-packages. This is because the paths in PYTHONPATH take precedence over the default search paths, and using Python 3 may lead to compatibility issues. In fact, it's best not to include any paths with site-packages in PYTHONPATH.

By the way, PATH is used to find the search path for executable programs. If you run the command my_cmd in the terminal, the system will scan the paths in PATH one by one to check if my_cmd exists in any of those paths. So, if you get an error saying a program cannot be found or the command is not recognized, check if the path is added to PATH.

How Python Installs Packages

Currently, the most common method for installing Python packages is using pip. Even if you use tools like pipenv or poetry, they ultimately rely on pip. The following instructions are universally applicable. If you haven't installed pip, please refer to this link. If you have pip installed but are unable to use the pip command, please refer to the previous section.

There are two ways to run pip:

  • pip ...
  • python -m pip ...

The first and second methods are quite similar, with the difference being that the first method uses the Python interpreter specified in the shebang line of the pip script. In general, if your pip path is $path_prefix/bin/pip, then the corresponding Python path is $path_prefix/bin/python. If you are using a Unix system, you can find the Python interpreter's path by running cat $(which pip), and the first line will contain the path. The second method explicitly specifies the location of Python. This rule applies to all executable Python programs. 

So, without any custom configurations, using pip to install packages will automatically install them to $path_prefix/lib/pythonX.Y/site-packages (where $path_prefix is obtained from the previous section), and executable programs will be installed to $path_prefix/bin. If you want to run my_cmd directly from the command line, remember to add it to the PATH.

Options in pip for Changing Installation Locations:

  • --prefix PATH: Replaces $path_prefix with the given value.
  • --root ROOT_PATH: Prepends ROOT_PATH before $path_prefix. For example, with --root /home/frostming, $path_prefix changes from /usr to /home/frostming/usr.
  • --target TARGET: Directly specifies the installation location to TARGET.

Virtual Environments

A virtual environment is created to isolate the dependencies of different projects, allowing them to be installed in separate paths to prevent dependency conflicts. Once you understand how Python installs packages, understanding the principles behind virtual environments(virtualenv, venv module) becomes straightforward.

In essence, running virtualenv myenv will copy a new Python interpreter to myenv/bin and create directories such as myenv/lib and myenv/lib/pythonX.Y/site-packages. When you execute source myenv/bin/activate, it adds myenv/bin to the front of PATH, ensuring that this copied Python interpreter is prioritized in searches. Consequently, when installing packages, $path_prefix becomes myenv, achieving isolation in the installation paths.

Impact of Script Execution on Search Paths

As explained earlier, the primary factor influencing whether Python can find a package is sys.path, and more fundamentally, sys.executable's path. After writing a program, you inevitably need to run it. However, different methods of running a script may affect sys.path, leading to different behaviors. Let's discuss this issue.

Assuming your package structure is as follows:

.
├── main.py
└── my_package
    ├── __init__.py
    ├── a.py
    └── b.py

Content of main.py

import my_package.b

Content of b.py

import sys
print("I'm b")
print(sys.path)

Executing main.py 

$ python main.py
I'm b
['/home/frostming/test_path', ...]  # sommit some common paths which are not related
$ python my_package/b.py
I'm b
['/home/frostming/test_path/my_package', ...]

The running method of python xxx.py is called direct execution. In this case, the value of __name__ in the file is set to __main__. This approach is utilized by the Run File options in an IDE. It can be observed that in this scenario, the first value of sys.path is the directory where the script file is located, varying with the script's path. Remember, our test execution is always in the directory /home/frostming/test_path.

Now, if we need to import a.py into b.py, where a.py contains a simple line print("I'm a"), how should b.py be written?

Easy! Just write import a. Alright, let's run the test again as described above.

$ python main.py
ModuleNotFoundError: No module named 'a'
$ python my_package/b.py
I'm a
I'm b
['/home/frostming/test_path/my_package', ...]

The first test failed. If you've read the previous content, this error is expected—sys.path doesn't include the directory of a.py, which is /home/frostming/test_path/my_package. Naturally, it cannot find a.

Changing it to from my_package import a, we won't run the test again, as based on the same analysis, we can predict that the first run will be fine, but the second one will result in an error, unable to find my_package. Note that since b is inside the my_package package, relative imports can be used. Writing from . import a and from my_package import a have the same effect.

Is there a way to make both runs not producing errors? Yes. We need to understand that in a project, there are limited entry points. In reality, executable code won't exist both at the top level and in subdirectories. We should place the main logic in main.py (it doesn't have to be this name; for example, in a Django project, it could be manage.py). If there is a need to run code from a script in a subdirectory, it should be done using python -m [module_name]. The import statement in b.py to import a should be from my_package import a. Let's take a look at the results:

$ python main.py  # same as python -m main
I'm a
I'm b
['/home/frostming/test_path', ...]
$ python -m my_package.b
I'm a
I'm b
['/home/frostming/test_path', ...]

We can see that the contents of sys.path are consistent in both runs. Its first value is the directory where the current run is located. This running method is called running as a module. The parameter after python -m is the module name (separated by dots), not a path name. Due to this uniformity, you can use the same import format for all imports in your project, regardless of the script's location. This is also why the Django official documentation recommends using names like myapp.models.users for imports.

In addition, when running as a module, each level of the parent module (or package) specified is executed as a module. This means you can use relative imports in the module (which is not allowed when running directly), and the value of __name__ in the passed module is set to __main__, allowing you to apply the if __name__ == "__main__": check. If the module passed in python -m [module_name] is a package, the __main__.py script in the package directory will be executed (if it exists), and the __name__ value of this script will be __main__.

Conclusion

Regarding the search for package paths, the most crucial element is the $path_prefix prefix, which is derived from the path of the used Python interpreter. Therefore, to find the package path, you only need to know the interpreter's path. If you encounter a situation where the package's path needs to be changed, simply set the correct PATH, specifying the Python interpreter you desire.

Now, returning to the three questions at the beginning, have you figured them out? 

From Chinese version: 你的 Python 包都装到哪了?

PYTHON  PATH  PATH_PREFIX  PACKAGE LOCATION 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

Happy birthday Linux