misc#

Miscellaneous utility functions for sharded repodata processing.

This module contains utility functions that don't fit cleanly into other modules: - URL handling - Package name parsing - Data transformation helpers - Threading utilities

Functions#

_shards_connections(→ int)

If context.repodata_threads is not set, find the size of the connection pool

_safe_urljoin_with_slash(→ str)

Join base_url with relative_url, ensuring proper handling of all URL schemes.

_is_http_error_most_400_codes(→ bool)

Determine whether the HTTPError is an HTTP 400 error code (except for 416).

ensure_hex_hash(record)

Convert bytes checksums to hex; leave unchanged if already str.

spec_to_package_name(→ str | None)

Given a dependency spec, return the package name, or None if the spec is

filter_redundant_packages(...)

Given repodata or a single shard, remove any .tar.bz2 packages that have a

combine_batches_until_none(...)

Combine lists from in_queue until we see None. Yield combined lists.

exception_to_queue(func)

Decorator to send unhandled exceptions to the second argument out_queue.

Attributes#

_T#
_URLJOIN_SAFE_SCHEMES#
SHARDS_CONNECTIONS_DEFAULT = 10#
_shards_connections() int#

If context.repodata_threads is not set, find the size of the connection pool in a typical https:// session. This should significantly reduce dropped connections. We match requests' default 10.

Is this shared between all sessions? Or do we get a different pool for a different get_session(url)?

Other adapters (file://, s3://) used in conda would have different concurrency behavior; we are not prepared to have separate threadpools per connection type.

_safe_urljoin_with_slash(base_url: str, relative_url: str = '') str#

Join base_url with relative_url, ensuring proper handling of all URL schemes.

Python's urllib.parse.urljoin only handles schemes registered in urllib.parse.uses_relative. For unregistered schemes like s3://, it returns just "." instead of the resolved URL. This function falls back to a scheme-swap workaround for those cases.

The result always ends with "/" to enable proper string concatenation with filenames.

See: conda/conda-libmamba-solver#866

_is_http_error_most_400_codes(status_code: str | int) bool#

Determine whether the HTTPError is an HTTP 400 error code (except for 416).

ensure_hex_hash(record: conda._private.shards.typing.PackageRecordDict)#

Convert bytes checksums to hex; leave unchanged if already str.

spec_to_package_name(spec: str) str | None#

Given a dependency spec, return the package name, or None if the spec is not parseable.

Uses conda's MatchSpec rather than libmambapy to avoid a hard dependency on a solver backend. With @functools.cache the performance is equivalent (benchmarked at ~10ms for 5000 unique specs either way).

filter_redundant_packages(repodata: conda._private.shards.typing.ShardDict, use_only_tar_bz2=False) conda._private.shards.typing.ShardDict#

Given repodata or a single shard, remove any .tar.bz2 packages that have a .conda counterpart.

Return a shallow copy if use_only_tar_bz2==False, else unmodified input.

combine_batches_until_none(in_queue: queue.SimpleQueue[collections.abc.Sequence[_T] | None]) collections.abc.Iterator[collections.abc.Sequence[_T]]#

Combine lists from in_queue until we see None. Yield combined lists.

exception_to_queue(func)#

Decorator to send unhandled exceptions to the second argument out_queue.