Anaconda compiler tools#
Anaconda 5.0 switched from OS-provided compiler tools to our own toolsets. This allows improved compiler capabilities, including better security and performance. This page describes how to use these tools and enable these benefits.
Compiler packages#
Before Anaconda 5.0, compilers were installed using system tools such as XCode
or yum install gcc
. Now there are conda packages for Linux and macOS
compilers. Unlike the previous GCC 4.8.5 packages that included GCC, g++, and
GFortran all in the same package, these conda packages are split into separate
compilers:
macOS:
clang_osx-64.
clangxx_osx-64.
gfortran_osx-64.
Linux:
gcc_linux-64.
gxx_linux-64.
gfortran_linux-64.
A compiler's "build platform" is the platform where the compiler runs and builds the code.
A compiler's "host platform" is the platform where the built code will finally be hosted and run.
Notice that all of these package names end in a platform identifier which specifies the host platform. All compiler packages are specific to both the build platform and the host platform.
Using the compiler packages#
The compiler packages can be installed with conda. Because they are designed
with (pseudo) cross-compiling in mind, all of the executables in a compiler
package are "prefixed." Instead of gcc
, the executable name of the compiler
you use will be something like x86_64-conda_cos6-linux-gnu-gcc
. These full
compiler names are shown in the build logs, recording the host platform and
helping prevent the common mistake of using the wrong compiler.
Many build tools such as make
and CMake
search by default for a
compiler named simply gcc
, so we set environment variables to point these
tools to the correct compiler.
We set these variables in conda activate.d
scripts, so any environment in
which you will use the compilers must first be activated so the scripts will
run. Conda-build does this activation for you using activation hooks installed
with the compiler packages in CONDA_PREFIX/etc/conda/activate.d
, so no
additional effort is necessary.
You can activate the root environment with the command conda activate root
.
macOS SDK#
The macOS compilers require the macOS 10.9 SDK or above. The SDK license prevents it from being bundled in the conda package. We know of 2 current sources for the macOS SDKs:
We usually install the 10.10 SDK at /opt/MacOSX10.10.sdk
but you may install
it anywhere. Edit your conda_build_config.yaml
file to point to it, like this:
CONDA_BUILD_SYSROOT:
- /opt/MacOSX10.10.sdk # [osx]
At Anaconda, we have this configuration setting in a centralized
conda_build_config.yaml
at the root of our recipe repository. Since we run
build commands from that location, the file and the setting are used for all
recipes. The conda_build_config.yaml
search order is described further at
Creating conda-build variant config files.
Build scripts for macOS should make use of the variables
MACOSX_DEPLOYMENT_TARGET
and CONDA_BUILD_SYSROOT
, which are set by
conda-build (see Environment variables). These variables should be translated into
correct compiler arguments, e.g. for Clang this would be:
clang .. -isysroot ${CONDA_BUILD_SYSROOT} -mmacosx-version-min=${MACOSX_DEPLOYMENT_TARGET} ..
Most build tools, e.g. CMake and distutils (setuptools), will automatically pick
up MACOSX_DEPLOYMENT_TARGET
but you need to pass CONDA_BUILD_SYSROOT
explicitly. For CMake, this can be done with the option
-DCMAKE_OSX_SYSROOT=${CONDA_BUILD_SYSROOT}
. When building Python extensions
with distutils, one should always extend CFLAGS
before calling
setup.py
:
export CFLAGS="${CFLAGS} -i sysroot ${CONDA_BUILD_SYSROOT}"
When building C++ extensions with Cython, CXXFLAGS
must be similarly modified.
Backward compatibility#
Some users want to use the latest Anaconda packages but do not yet want to use
the Anaconda compilers. To enable this, the latest Python package builds have
a default _sysconfigdata
file. This file sets the compilers provided by the
system, such as gcc
and g++
, as the default compilers. This way allows legacy
recipes to keep working.
Python packages also include an alternative _sysconfigdata
file that sets
the Anaconda compilers as the default compilers. The Anaconda Python executable
itself is made with these Anaconda compilers.
The compiler packages set the environment variable
_PYTHON_SYSCONFIGDATA_NAME
, which tells Python which _sysconfigdata
file
to use. This variable is set at activation time using the activation hooks
described above.
The new _sysconfigdata
customization system is only present in recent
versions of the Python package. Conda-build automatically tries to use the
latest Python version available in the currently configured channels, which
normally gets the latest from the default channel. If you're using something
other than conda-build while working with the new compilers, conda does not
automatically update Python, so make sure you have the correct
_sysconfigdata
files by updating your Python package manually.
Anaconda compilers and conda-build 3#
The Anaconda 5.0 compilers and conda-build 3 are designed to work together.
Conda-build 3 defines a special jinja2 function, compiler()
, to make it
easy to specify compiler packages dynamically on many platforms. The
compiler
function takes at least 1 argument, the language of the compiler
to use:
requirements:
build:
- {{ compiler('c') }}
"Cross-capable" recipes can be used to make packages with a host platform different than the build platform where conda-build runs. To write cross-capable recipes, you may also need to use the "host" section in the requirements section. In this example we set "host" to "zlib" to tell conda-build to use the zlib in the conda environment and not the system zlib. This makes sure conda-build uses the zlib for the host platform and not the zlib for the build platform.
requirements:
build:
- {{ compiler('c') }}
host:
- zlib
Generally, the build section should include compilers and other build tools and the host section should include everything else, including shared libraries, Python, and Python libraries.
An aside on CMake and sysroots#
Anaconda's compilers for Linux are built with something called crosstool-ng. They include not only GCC, but also a "sysroot" with glibc, as well as the rest of the toolchain (binutils). Ordinarily, the sysroot is something that your system provides, and it is what establishes the libc compatibility bound for your compiled code. Any compilation that uses a sysroot other than the system sysroot is said to be "cross-compiling." When the target OS and the build OS are the same, it is called a "pseudo-cross-compiler." This is the case for normal builds with Anaconda's compilers on Linux.
Unfortunately, some software tools do not handle sysroots in intuitive ways. CMake is especially bad for this. Even though the compiler itself understands its own sysroot, CMake insists on ignoring that. We've filed issues at:
Additionally, this Stack Overflow issue has some more information: https://stackoverflow.com/questions/36195791/cmake-missing-sysroot-when-cross-compiling
In order to teach CMake about the sysroot, you must do additional work. As an example, please see our recipe for libnetcdf at AnacondaRecipes/libnetcdf-feedstock
In particular, you'll need to copy the cross-linux.cmake
file there, and reference it in your build.sh file:
CMAKE_PLATFORM_FLAGS+=(-DCMAKE_TOOLCHAIN_FILE="${RECIPE_DIR}/cross-linux.cmake")
cmake -DCMAKE_INSTALL_PREFIX=${PREFIX} \
${CMAKE_PLATFORM_FLAGS[@]} \
${SRC_DIR}
Customizing the compilers#
The compiler packages listed above are small packages that only include the activation scripts and list most of the software they provide as runtime dependencies.
This design is intended to make it easy for you to customize your own compiler
packages by copying these recipes and changing the flags. You can then edit the
conda_build_config.yaml
file to specify your own packages.
We have been careful to select good, general purpose, secure, and fast flags. We have also used them for all packages in Anaconda Distribution 5.0.0, except for some minor customizations in a few recipes. When changing these flags, remember that choosing the wrong flags can reduce security, reduce performance, and cause incompatibilities.
With that warning in mind, let's look at good ways to customize Clang.
Download or fork the code from anacondarecipes/aggregate. The Clang package recipe is in the
clang
folder. The main material is in the llvm-compilers-feedstock folder.Edit
clang/recipe/meta.yaml
:package: name: clang_{{ target_platform }} version: {{ version }}
The name here does not matter but the output names below do. Conda-build expects any compiler to follow the BASENAME_PLATFORMNAME pattern, so it is important to keep the
{{target_platform}}
part of the name.{{ version }}
is left as an intentionally undefined jinja2 variable. It is set later inconda_build_config.yaml
.Before any packaging is done, run the build.sh script: AnacondaRecipes/aggregate
In this recipe, values are changed here. Those values are inserted into the activate scripts that are installed later.
#!/bin/bash CHOST=${macos_machine} FINAL_CPPFLAGS="-D_FORTIFY_SOURCE=2 -mmacosx-version-min=${macos_min_version}" FINAL_CFLAGS="-march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe" FINAL_CXXFLAGS="-march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -std=c++14 -fmessage-length=0" # These are the LDFLAGS for when the linker is being called directly, without "-Wl," FINAL_LDFLAGS="-pie -headerpad_max_install_names" # These are the LDFLAGS for when the linker is being driven by a compiler, with "-Wl," FINAL_LDFLAGS_CC="-Wl,-pie -Wl,-headerpad_max_install_names" FINAL_DEBUG_CFLAGS="-Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments" FINAL_DEBUG_CXXFLAGS="-Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments" FINAL_DEBUG_FFLAGS="-Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments" find "${RECIPE_DIR}" -name "*activate*.sh" -exec cp {} . \; find . -name "*activate*.sh" -exec sed -i.bak "s|@CHOST@|${CHOST}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@CPPFLAGS@|${FINAL_CPPFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@CFLAGS@|${FINAL_CFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_CFLAGS@|${FINAL_DEBUG_CFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@CXXFLAGS@|${FINAL_CXXFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_CXXFLAGS@|${FINAL_DEBUG_CXXFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_CXXFLAGS@|${FINAL_DEBUG_CXXFLAGS}|g" "{}" \; # find . -name "*activate*.sh" -exec sed -i.bak "s|@FFLAGS@|${FINAL_FFLAGS}|g" "{}" \; # find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_FFLAGS@|${FINAL_DEBUG_FFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@LDFLAGS@|${FINAL_LDFLAGS}|g" "{}" \; find . -name "*activate*.sh" -exec sed -i.bak "s|@LDFLAGS_CC@|${FINAL_LDFLAGS_CC}|g" "{}" \; find . -name "*activate*.sh.bak" -exec rm "{}" \;
With those changes to the activate scripts in place, it's time to move on to installing things. Look back at the
clang
folder'smeta.yaml
. Here's where we change the package name. Notice what comes before the{{ target_platform }}
.outputs: - name: super_duper_clang_{{ target_platform }} script: install-clang.sh requirements: - clang {{ version }}
The script reference here is another place you might add customization. You'll either change the contents of those install scripts or change the scripts that those install scripts are installing.
Note that we make the package
clang
in the main material agree in version with our output version. This is implicitly the same as the top-level recipe. Theclang
package sets no environment variables at all, so it may be difficult to use directly.Let's examine the script
install-clang.sh
:#!/bin/bash set -e -x CHOST=${macos_machine} mkdir -p "${PREFIX}"/etc/conda/{de,}activate.d/ cp "${SRC_DIR}"/activate-clang.sh "${PREFIX}"/etc/conda/activate.d/activate_"${PKG_NAME}".sh cp "${SRC_DIR}"/deactivate-clang.sh "${PREFIX}"/etc/conda/deactivate.d/deactivate_"${PKG_NAME}".sh pushd "${PREFIX}"/bin ln -s clang ${CHOST}-clang popd
Nothing here is too unusual.
Activate scripts are named according to our package name so they won't conflict with other activate scripts.
The symlink for Clang is a Clang implementation detail that sets the host platform.
We define
macos_machine
in aggregate'sconda_build_config.yaml
: AnacondaRecipes/aggregateThe activate scripts that are being installed are where we actually set the environment variables. Remember that these have been modified by build.sh.
With any of your desired changes in place, go ahead and build the recipe.
You should end up with a super_duper_clang_osx-64 package. Or, if you're not on macOS and are modifying a different recipe, you should end up with an equivalent package for your platform.
Using your customized compiler package with conda-build 3#
Remember the Jinja2 function, {{ compiler('c') }}
? Here's where that comes
in. Specific keys in conda_build_config.yaml
are named for the language
argument to that jinja2 function. In your conda_build_config.yaml
, add
this:
c_compiler:
- super_duper_clang
Note that we're not adding the target_platform
part, which is separate. You
can define that key, too:
c_compiler:
- super_duper_clang
target_platform:
- win-64
With those two keys defined, conda-build will try to use a compiler package
named super_duper_clang_win-64
. That package needs to exist for your native
platform. For example, if you're on macOS, your native platform is osx-64
.
The package subdirectory for your native platform is the build platform. The
build platform and the target_platform
can be the same, and they are the
same by default, but they can also be different. When they are different,
you're cross-compiling.
If you ever needed a different compiler key for the same language, remember that the language key is arbitrary. For example, we might want different compilers for Python and for R within one ecosystem. On Windows, the Python ecosystem uses the Microsoft Visual C compilers, while the R ecosystem uses the Mingw compilers.
Let's start in conda_build_config.yaml
:
python_c_compiler:
- vs2015
r_c_compiler:
- m2w64-gcc
target_platform:
- win-64
In Python recipes, you'd have:
requirements:
build:
- {{ compiler('python_c') }}
In R recipes, you'd have:
requirements:
build:
- {{ compiler('r_c') }}
This example is a little contrived, because the m2w64-gcc_win-64
package is
not available. You'd need to create a metapackage m2w64-gcc_win-64
to
point at the m2w64-gcc
package, which does exist on the msys2 channel on
repo.anaconda.com.
Expressing the relation between compiler and its standard library#
For most languages, certainly for "c" and for "cxx", compiling any given
program may create a run-time dependence on symbols from the respective
standard library. For example, the standard library for C on linux is generally
glibc
, and a core component of your operating system. Conda is not able to
change or supersede this library (it would be too risky to try to). A similar
situation exists on MacOS and on Windows.
Compiler packages usually have two ways to deal with this dependence:
assume the package must be there (like
glibc
on linux).always add a run-time requirement on the respective stdlib (e.g.
libcxx
on MacOS).
However, even if we assume the package must be there, the information about the
glibc
version is still a highly relevant piece of information, which is
also why it is reflected in the __glibc
virtual package.
For example, newer packages may decide over time to increase the lowest version
of glibc
that they support. We therefore need a way to express this
dependence in a way that conda will be able to understand, so that (in
conjunction with the __glibc
virtual package) the environment resolver will
not consider those packages on machines whose glibc
version is too old.
The way to do this is to use the Jinja2 function {{ stdlib('c') }}
, which
matches {{ compiler('c') }}
in as many ways as possible. Let's start again
with the conda_build_config.yaml
:
c_stdlib:
- sysroot # [linux]
- macosx_deployment_target # [osx]
c_stdlib_version:
- 2.17 # [linux]
- 10.13 # [osx]
In the recipe we would then use:
requirements:
build:
- {{ compiler('c') }}
- {{ stdlib('c') }}
This would then express that the resulting package requires sysroot ==2.17
(corresponds to glibc
) on linux and macosx_deployment_target ==10.13
on
MacOS in the build environment, respectively. How this translates into a
run-time dependence can be defined in the metadata of the respective conda
(meta-)package which represents the standard library (i.e. those defined under
c_stdlib
above).
In this example, sysroot 2.17
would generate a run-export on
__glibc >=2.17
and macosx_deployment_target 10.13
would similarly
generate __osx >=10.13
. This way, we enable packages to define their own
expectations about the standard library in a unified way, and without
implicitly depending on some global assumption about what the lower version
on a given platform must be.
In principle, this facility would make it possible to also express the
dependence on separate stdlib implementations (like musl
instead of
glibc
), or to remove the need to assume that a C++ compiler always needs to
add a run-export on the C++ stdlib -- it could then be left up to packages
themselves whether they need {{ stdlib('cxx') }}
or not.
Anaconda compilers implicitly add RPATH pointing to the conda environment#
You might want to use the Anaconda compilers outside of conda-build
so that you use the same versions, flags, and configuration, for maximum
compatibility with Anaconda packages (but in a case where you want simple
tarballs, for example). In this case, there is a gotcha.
Even if Anaconda compilers are used from outside of conda-build
, the GCC
specs are customized so that, when linking an executable or a shared library,
an RPATH pointing to lib/
inside the current enviroment prefix directory
($CONDA_PREFIX/lib
) is added. This is done by changing the
link_libgcc:
section inside GCC specs
file, and this change is done
so that LD_LIBRARY_PATH
isn't required for basic libraries.
conda-build
knows how to make this automatically relocatable, so that
this RPATH
will be changed to point to the environment where the package
is being installed (at installation time, by conda
). But if you only pack
this binary in a tarball, it will continue containing this hardcoded RPATH
to an environment in your machine. In this case, it is recommended to manually
remove the RPATH
.