`misc`#

Miscellaneous utility functions for sharded repodata processing.

This module contains utility functions that don't fit cleanly into other modules: - URL handling - Package name parsing - Data transformation helpers - Threading utilities

Functions#

`_shards_connections`(→ int)	If context.repodata_threads is not set, find the size of the connection pool
`_safe_urljoin_with_slash`(→ str)	Join base_url with relative_url, ensuring proper handling of all URL schemes.
`_is_http_error_most_400_codes`(→ bool)	Determine whether the HTTPError is an HTTP 400 error code (except for 416).
`ensure_hex_hash`(record)	Convert bytes checksums to hex; leave unchanged if already str.
`spec_to_package_name`(→ str \| None)	Given a dependency spec, return the package name, or None if the spec is
`filter_redundant_packages`(...)	Given repodata or a single shard, remove any .tar.bz2 packages that have a
`combine_batches_until_none`(...)	Combine lists from in_queue until we see None. Yield combined lists.
`exception_to_queue`(func)	Decorator to send unhandled exceptions to the second argument out_queue.

Attributes#

`_T`
`_URLJOIN_SAFE_SCHEMES`
`SHARDS_CONNECTIONS_DEFAULT`

_T#

_URLJOIN_SAFE_SCHEMES#

SHARDS_CONNECTIONS_DEFAULT = 10#

_shards_connections() → int#

If context.repodata_threads is not set, find the size of the connection pool in a typical https:// session. This should significantly reduce dropped connections. We match requests' default 10.

Is this shared between all sessions? Or do we get a different pool for a different get_session(url)?

Other adapters (file://, s3://) used in conda would have different concurrency behavior; we are not prepared to have separate threadpools per connection type.

_safe_urljoin_with_slash(base_url: str, relative_url: str = '') → str#

Join base_url with relative_url, ensuring proper handling of all URL schemes.

Python's urllib.parse.urljoin only handles schemes registered in urllib.parse.uses_relative. For unregistered schemes like s3://, it returns just "." instead of the resolved URL. This function falls back to a scheme-swap workaround for those cases.

The result always ends with "/" to enable proper string concatenation with filenames.

See: conda/conda-libmamba-solver#866

_is_http_error_most_400_codes(status_code: str | int) → bool#: Determine whether the HTTPError is an HTTP 400 error code (except for 416).

ensure_hex_hash(record: conda._private.shards.typing.PackageRecordDict)#: Convert bytes checksums to hex; leave unchanged if already str.

spec_to_package_name(spec: str) → str | None#

Given a dependency spec, return the package name, or None if the spec is not parseable.

Uses conda's MatchSpec rather than libmambapy to avoid a hard dependency on a solver backend. With @functools.cache the performance is equivalent (benchmarked at ~10ms for 5000 unique specs either way).

filter_redundant_packages(repodata: conda._private.shards.typing.ShardDict, use_only_tar_bz2=False) → conda._private.shards.typing.ShardDict#

Given repodata or a single shard, remove any .tar.bz2 packages that have a .conda counterpart.

Return a shallow copy if use_only_tar_bz2==False, else unmodified input.

combine_batches_until_none(in_queue: queue.SimpleQueue[collections.abc.Sequence[_T] | None]) → collections.abc.Iterator[collections.abc.Sequence[_T]]#: Combine lists from in_queue until we see None. Yield combined lists.

exception_to_queue(func)#: Decorator to send unhandled exceptions to the second argument out_queue.

misc#

Functions#

Attributes#

`misc`#