shards#

Models for sharded repodata, and to make monolithic repodata look like sharded repodata.

Classes#

ShardFetch

Wrapper class that encapsulates fetching and caching of individual shards.

ShardBase

Abstract base class for shard-like objects.

ShardLike

Present a "classic" repodata.json as per-package shards.

Shards

Handle repodata_shards.msgpack.zst and individual per-package shards.

Functions#

shard_mentioned_packages([spec_to_package_name])

Return all dependency names mentioned in a shard, not including the shard's

_shards_base_url(→ str)

Return shards_base_url joined with base_url and url.

_repodata_shards(→ bytes)

Fetch shards index with cache.

fetch_shards_index(→ Shards | None)

Check a SubdirData's URL for shards.

batch_retrieve_from_cache(→ list[ShardFetch])

Given a list of ShardBase objects and a list of package names, fetch all URLs

batch_retrieve_from_network(wanted)

Fetch all shards in the wanted list from the network.

fetch_channels(→ dict[str, ShardBase] | None)

Attributes#

ZSTD_MAX_SHARD_SIZE = 16777216#
class ShardFetch(shardbase: ShardBase, package: str, shard_cache: conda._private.shards.cache.ShardCache | None = None)#

Wrapper class that encapsulates fetching and caching of individual shards.

Handles deferred fetching: shards can be requested via this class but will only be actually retrieved from the network when fetch() is called. This allows batching and coordinating shard retrieval across multiple channels.

Initialize a ShardFetch wrapper.

Parameters:
  • shardbase -- The ShardBase (Shards or ShardLike) instance

  • package -- The package name to fetch

  • shard_cache -- Optional cache to use for storage (required for Shards)

shardbase#
package#
url#
shard_cache = None#
_shard: conda._private.shards.typing.ShardDict | None = None#
_fetched = False#
fetch() conda._private.shards.typing.ShardDict#

Fetch the shard from the network or return cached result.

For Shards, performs the actual network fetch. For ShardLike, returns the shard immediately since it's in memory.

_fetch_from_shards() conda._private.shards.typing.ShardDict#

Fetch a single shard from a Shards instance.

_fetch_shards_impl(packages: collections.abc.Iterable[str]) dict[str, conda._private.shards.typing.ShardDict]#

Fetch multiple shards for a Shards instance.

Implements the core fetching logic for Shards, handling network requests, caching, and decompression.

_process_fetch_result(future, url, package, results, shards)#

Process a single fetched shard result.

static fetch_batch(shard_fetches: collections.abc.Iterable[ShardFetch]) None#

Batch fetch multiple ShardFetch objects, grouping by ShardBase.

This efficiently fetches shards from multiple sources by grouping requests by their ShardBase instance and making coordinated network calls.

shard_mentioned_packages(shard: conda._private.shards.typing.ShardDict, extra: collections.abc.Iterable[str] = (), spec_to_package_name=spec_to_package_name, repodata_version: int = 1) collections.abc.Iterable[str]#

Return all dependency names mentioned in a shard, not including the shard's own package name.

class ShardBase#

Bases: abc.ABC

Abstract base class for shard-like objects.

Defines the common interface for both sharded repodata (Shards) and monolithic repodata presented as shards (ShardLike).

url: str#
repodata_no_packages: conda._private.shards.typing.RepodataDict#
visited: dict[str, conda._private.shards.typing.ShardDict | None]#
_base_url: str#
property package_names: collections.abc.KeysView[str]#
Abstractmethod:

Return the names of all packages available in this shard collection.

property base_url: str#

Return self.url joined with base_url from repodata, or self.url if no base_url was present. Packages are found here.

Note base_url can be a relative or an absolute url. Uses _safe_urljoin_with_slash to handle non-HTTP schemes (s3://, etc.).

__contains__(package: str) bool#

Check if a package is available in this shard collection.

abstractmethod shard_url(package: str) str#

Return shard URL for a given package. For monolithic repodata, should not be fetched but is a unique identifier.

Raise KeyError if package is not in the index.

abstractmethod shard_loaded(package: str) bool#

Return True if the given package's shard is in memory.

visit_package(package: str) conda._private.shards.typing.ShardDict#

Return a shard that is already loaded in memory and mark as visited.

visit_shard(package: str, shard: conda._private.shards.typing.ShardDict)#

Store new shard data in the visited dict.

build_repodata() conda._private.shards.typing.RepodataDict#

Return monolithic repodata including all visited shards.

Does not return "v3" repodata.

Prefer iter_records_v3() over this method.

iter_records() collections.abc.Iterable[tuple[str, dict]]#

Yield (filename, record) tuples for all packages in visited shards.

iter_records_v3() collections.abc.Iterable[tuple[tuple[str, str], dict]]#

Yield ((key, section), record) tuples for all packages in visited shards.

Section can be: "packages" for .tar.bz2 packages, "packages.conda" for .conda packages, "v3.whl", "v3.conda", "v3.tar.bz2" for v3 packages.

key is the same as the filename for "packages", "packages.conda" but is different from the filename for v3 packages.

class ShardLike(repodata: conda._private.shards.typing.RepodataDict, url: str = '')#

Bases: ShardBase

Present a "classic" repodata.json as per-package shards.

url: must be unique for all ShardLike used together.

repodata_no_packages: conda._private.shards.typing.RepodataDict#
url = ''#
shards: dict[str, conda._private.shards.typing.ShardDict]#
visited: dict[str, conda._private.shards.typing.ShardDict | None]#
__repr__()#
property package_names: collections.abc.KeysView[str]#

Return the names of all packages available in this shard collection.

shard_url(package: str) str#

Return shard URL for a given package.

Raise KeyError if package is not in the index.

shard_loaded(package: str) bool#

Return True if the given package's shard is in memory.

visit_package(package: str) conda._private.shards.typing.ShardDict#

Return a shard that is already in memory and mark as visited.

_shards_base_url(url, shards_base_url) str#

Return shards_base_url joined with base_url and url. Note shards_base_url can be a relative or an absolute url. Uses _safe_urljoin_with_slash to handle non-HTTP schemes (s3://, etc.).

class Shards(shards_index: conda._private.shards.typing.ShardsIndexDict, url: str)#

Bases: ShardBase

Handle repodata_shards.msgpack.zst and individual per-package shards.

Parameters:
  • shards_index -- raw parsed msgpack dict. Don't change it or base_url,

  • wrong. (shards_base_url will be)

  • url -- URL of repodata_shards.msgpack.zst

_shards_base_url: str#
shards_index#
url#
_base_url#
session#
repodata_no_packages#
visited: dict[str, conda._private.shards.typing.ShardDict | None]#
_shard_url_cache: dict[str, str]#
property package_names#

Return the names of all packages available in this shard collection.

property packages_index#
property shards_base_url: str#

Return self.url joined with shards_base_url. Note shards_base_url can be a relative or an absolute url.

shard_url(package: str) str#

Return shard URL for a given package.

Raise KeyError if package is not in the index.

shard_loaded(package: str) bool#

Return True if the given package's shard is in memory.

visit_package(package: str) conda._private.shards.typing.ShardDict#

Return a shard that is already in memory and mark as visited.

_repodata_shards(url, cache: conda.gateways.repodata.RepodataCache) bytes#

Fetch shards index with cache.

Update cache state.

Return shards data, either newly fetched or from cache.

In offline mode, returns cached data even if expired. If no cache exists in offline mode, raises RepodataIsEmpty to signal unavailability.

fetch_shards_index(sd: conda.core.subdir_data.SubdirData) Shards | None#

Check a SubdirData's URL for shards.

Return shards index bytes from cache or network. Return None if not found; caller should fetch normal repodata.

TODO: If this function fails to retrieve the sharded repodata index file, it will

mark it is as not supporting this feature in cache. This can problematic because sometimes server errors can happen which will lead it to wrongly assuming the channel doesn't support sharding. We need to rethink our logic for determining shard support.

batch_retrieve_from_cache(shardlikes: collections.abc.Sequence[ShardBase], packages: list[str], shard_cache: conda._private.shards.cache.ShardCache) list[ShardFetch]#

Given a list of ShardBase objects and a list of package names, fetch all URLs from a shared local cache, and update shardlikes with those per-package shards. Return ShardFetch objects for items not found in cache (to be fetched from network).

batch_retrieve_from_network(wanted: list[ShardFetch])#

Fetch all shards in the wanted list from the network.

Coordinate batch fetching across multiple ShardBase instances.

fetch_channels(url_to_channel: dict[str, conda.models.channel.Channel]) dict[str, ShardBase] | None#
Parameters:

url_to_channel -- not modified, must already be expanded to subdirs.

Attempt to fetch the sharded index first and then fall back to retrieving a monolithic repodata.json file.

Returns:

A dict mapping channel URLs to Shard or ShardLike objects. None if no channels have shards. This dict preserves the key order of the input url_to_channel.