Conda packages
What is a conda package?
A conda package is a compressed tarball file (.tar.bz2) or .conda file that contains:
system-level libraries.
Python or other modules.
executable programs and other components.
metadata under the
info/
directory.a collection of files that are installed directly into an
install
prefix.
Conda keeps track of the dependencies between packages and platforms. The conda package format is identical across platforms and operating systems.
Only files, including symbolic links, are part of a conda package. Directories are not included. Directories are created and removed as needed, but you cannot create an empty directory from the tar archive directly.
.conda file format
The .conda file format was introduced in conda 4.7 as a more compact, and thus faster, alternative to a tarball.
The .conda file format consists of an outer, uncompressed ZIP-format container, with 2 inner compressed .tar files.
For the .conda format's initial internal compression format support, we chose Zstandard (zstd). The actual compression format used does not matter, as long as the format is supported by libarchive. The compression format may change in the future as more advanced compression algorithms are developed and no change to the .conda format is necessary. Only an updated libarchive would be required to add a new compression format to .conda files.
These compressed files can be significantly smaller than their bzip2 equivalents. In addition, they decompress much more quickly. .conda is the preferred file format to use where available, although we continue to provide .tar.bz2 files in tandem.
Read more about the introduction of the .conda file format.
Note
In conda 4.7 and later, you cannot use package names that end in “.conda” as they conflict with the .conda file format for packages.
Using packages
You may search for packages
conda search scipy
You may install a package
conda install scipy
You may build a package after installing conda-build
conda build my_fun_package
Package structure
.
├── bin
│ └── pyflakes
├── info
│ ├── LICENSE.txt
│ ├── files
│ ├── index.json
│ ├── paths.json
│ └── recipe
└── lib
└── python3.5
bin contains relevant binaries for the package.
lib contains the relevant library files (eg. the .py files).
info contains package metadata.
Metapackages
When a conda package is used for metadata alone and does not contain any files, it is referred to as a metapackage. The metapackage may contain dependencies to several core, low-level libraries and can contain links to software files that are automatically downloaded when executed. Metapackages are used to capture metadata and make complicated package specifications simpler.
An example of a metapackage is "anaconda," which
collects together all the packages in the Anaconda installer.
The command conda create -n envname anaconda
creates an
environment that exactly matches what would be created from the
Anaconda installer. You can create metapackages with the
conda metapackage
command. Include the name and version
in the command.
Anaconda metapackage
The Anaconda metapackage is used in the creation of the Anaconda Distribution installers so that they have a set of packages associated with them. Each installer release has a version number, which corresponds to a particular collection of packages at specific versions. That collection of packages at specific versions is encapsulated in the Anaconda metapackage.
The Anaconda metapackage contains several core, low-level libraries, including compression, encryption, linear algebra, and some GUI libraries.
Read more about the Anaconda metapackage and Anaconda Distribution.
Mutex metapackages
A mutex metapackage is a very simple package that has a name. It need not have any dependencies or build steps. Mutex metapackages are frequently an "output" in a recipe that builds some variant of another package. Mutex metapackages function as a tool to help achieve mutual exclusivity among packages with different names.
Let's look at some examples for how to use mutex metapackages to build NumPy against different BLAS implementations.
Building NumPy with BLAS variants
If you build NumPy with MKL, you also need to build SciPy, scikit-learn, and anything else using BLAS also with MKL. It is important to ensure that these “variants” (packages built with a particular set of options) are installed together and never with an alternate BLAS implementation. This is to avoid crashes, slowness, or numerical problems. Lining up these libraries is both a build-time and an install-time concern. We’ll show how to use metapackages to achieve this need.
Let's start with the metapackage blas=1.0=mkl
:
https://github.com/AnacondaRecipes/intel_repack-feedstock/blob/e699b12/recipe/meta.yaml#L108-L112
Note that mkl
is a string of blas
.
That metapackage is automatically added as a dependency
using run_exports
when someone uses the mkl-devel
package as a build-time dependency:
https://github.com/AnacondaRecipes/intel_repack-feedstock/blob/e699b12/recipe/meta.yaml#L124
By the same token, here’s the metapackage for OpenBLAS: https://github.com/AnacondaRecipes/openblas-feedstock/blob/ae5af5e/recipe/meta.yaml#L127-L131
And the run_exports
for OpenBLAS, as part of
openblas-devel:
https://github.com/AnacondaRecipes/openblas-feedstock/blob/ae5af5e/recipe/meta.yaml#L100
Fundamentally, conda’s model of mutual exclusivity relies on the package name.
OpenBLAS and MKL are obviously not the same package name, and thus are not
mutually exclusive. There’s nothing stopping conda from installing both at
once. There’s nothing stopping conda from installing NumPy with MKL and SciPy
with OpenBLAS. The metapackage is what allows us to achieve the mutual
exclusivity. It unifies the options on a single package name,
but with a different build string. Automating the addition of the
metapackage with run_exports
helps ensure the library consumers
(package builders who depend on libraries) will have correct dependency
information to achieve the unified runtime library collection.
Installing NumPy with BLAS variants
To specify which variant of NumPy that you want, you could potentially specify the BLAS library you want:
conda install numpy mkl
However, that doesn’t actually preclude OpenBLAS from being chosen. Neither MKL nor its dependencies are mutually exclusive (meaning they do not have similar names and different version/build-string).
This pathway may lead to some ambiguity and solutions with mixed BLAS, so using the metapackage is recommended. To specify MKL-powered NumPy in a non-ambiguous way, you can specify the mutex package (either directly or indirectly):
conda install numpy “blas=*=mkl”
There is a simpler way to address this, however. For example, you may want to try another package that has the desired mutex package as a dependency.
OpenBLAS has this with its “nomkl” package: https://github.com/AnacondaRecipes/openblas-feedstock/blob/ae5af5e/recipe/meta.yaml#L133-L147
Nothing should use “nomkl” as a dependency. It is strictly a utility for users to facilitate switching from MKL (which is the default) to OpenBLAS.
How did MKL become the default? The solver needs a way to prioritize some packages over others. We achieve that with an older conda feature called track_features that originally served a different purpose.
Track_features
One of conda’s optimization goals is to minimize the number of track_features needed to specify the desired specs. By adding track_features to one or more of the options, conda will de-prioritize it or “weigh it down.” The lowest priority package is the one that would cause the most track_features to be activated in the environment. The default package among many variants is the one that would cause the least track_features to be activated.
There is a catch, though: any track_features must be unique. No two packages can provide the same track_feature. For this reason, our standard practice is to attach track_features to the metapackage associated with what we want to be non-default.
Take another look at the OpenBLAS recipe: https://github.com/AnacondaRecipes/openblas-feedstock/blob/ae5af5e/recipe/meta.yaml#L127-L137
This attached track_features entry is why MKL is chosen over OpenBLAS. MKL does not have any track_features associated with it. If there are 3 options, you would attach 0 track_features to the default, then 1 track_features for the next preferred option, and finally 2 for the least preferred option. However, since you generally only care about the one default, it is usually sufficient to add 1 track_feature to all options other than the default option.
More info
For reference, the Visual Studio version alignment on Windows also uses mutex metapackages. https://github.com/AnacondaRecipes/aggregate/blob/9635228/vs2017/meta.yaml#L24
Noarch packages
Noarch packages are packages that are not architecture specific and therefore only have to be built once. Noarch packages are either generic or Python. Noarch generic packages allow users to distribute docs, datasets, and source code in conda packages. Noarch Python packages are described below.
Declaring these packages as noarch
in the build
section of
the meta.yaml
reduces shared CI resources. Therefore, all packages
that qualify to be noarch packages should be declared as such.
Noarch Python
The noarch: python
directive in the build section
makes pure-Python packages that only need to be built once.
Noarch Python packages cut down on the overhead of building multiple different pure Python packages on different architectures and Python versions by sorting out platform and Python version-specific differences at install time.
In order to qualify as a noarch Python package, all of the following criteria must be fulfilled:
No compiled extensions.
No post-link, pre-link, or pre-unlink scripts.
No OS-specific build scripts.
No Python version-specific requirements.
No skips except for Python version. If the recipe is py3 only, remove skip statement and add version constraint on Python in host and run section.
2to3 is not used.
Scripts argument in setup.py is not used.
If
console_script
entrypoints are in setup.py, they are listed inmeta.yaml
.No activate scripts.
Not a dependency of conda.
Note
While noarch: python
does not work with selectors, it does
work with version constraints. skip: True # [py2k]
can sometimes
be replaced with a constrained Python version in the host and run
subsections, for example: python >=3
instead of just python
.
Note
Only console_script
entry points have to be listed in meta.yaml
.
Other entry points do not conflict with noarch
and therefore do
not require extra treatment.
Read more about conda's noarch packages.
Link and unlink scripts
You may optionally execute scripts before and after the link and unlink steps. For more information, see Adding pre-link, post-link, and pre-unlink scripts.
More information
Go deeper on how to manage packages. Learn more about package metadata, repository structure and index, and package match specifications at Package specifications.