API Reference

The following sections expose the public Python API. Objects are grouped by module for clarity. The documentation uses autodoc to pull docstrings directly from the source code.

Core Module 

easydecon.easydecon.add_df_to_spatialdata(sdata, df, bin_size=8)[source]

easydecon.easydecon.assign_clusters_from_df(sdata, df, bin_size=8, results_column='easydecon', method='max', allow_multiple=False, diagnostic=None, fold_change_threshold=2.0, add_to_obs=True)[source]

Assigns cell clusters to spatial spots based on deconvolution results.

This function takes deconvolution results and assigns cell type clusters to each spatial spot using different methods (max, zmax, or hybrid). The results are stored in the spatial data table.

Parameters:

sdataSpatialData object: The spatial data object containing the spatial transcriptomics data.
dfpandas.DataFrame: DataFrame containing deconvolution results with cell type proportions. Rows should correspond to spatial spots, columns to cell types.
bin_sizeint, optional: Size of the spatial bin in micrometers. Default: 8
results_columnstr, optional: Name of the column in the table.obs where results will be stored. Default: “easydecon”
methodstr, optional: Method to use for cluster assignment: - “max”: Assigns the cell type with the highest proportion - “zmax”: Uses z-score normalization before finding the maximum - “hybrid”: Combines similarity scores and adaptive probabilities Default: “max”
allow_multiplebool, optional: Whether to allow multiple cell type assignments per spot. Default: False
diagnosticstr, optional: Path to save diagnostic information (if needed). Default: None
fold_change_thresholdfloat, optional: Threshold for fold change filtering. Default: 2.0

Returns:

None: The function modifies the input sdata object in place by adding cluster assignments to the table.obs DataFrame under the specified results_column.

Raises:

ValueError: If an invalid method is specified.

Notes:

The function automatically handles missing values and ensures proper indexing.
Results are stored as categorical variables in the table.obs DataFrame.
The function supports three different methods for cluster assignment, each with its own characteristics and use cases.

easydecon.easydecon.common_markers_gene_expression_and_filter(sdata: object, marker_genes, common_group_name: str = 'MarkerGroup', celltype: str = 'group', gene_id_column: str = 'names', exclude_group_names: list[str] = [], bin_size: int = 8, aggregation_method: str = 'sum', add_to_obs: bool = True, filtering_algorithm: str = 'permutation', num_permutations: int = 5000, alpha: float = 0.01, subsample_size: int = 25000, subsample_signal_quantile: float = 0.1, permutation_gene_pool_fraction: float = 0.3, parametric: bool = True, n_subs: int = 5, quantile: float = 0.7, **kwargs) → fireducks.pandas.DataFrame[source]

Extended version allowing marker_genes as a list, dict, or DataFrame, with customizable column names for the DataFrame.

If marker_genes is:

list[str]: Single group of markers -> create one column named common_group_name.
dict[str, list[str]]: Multiple groups -> each dict key becomes a column in table.obs.
pd.DataFrame: Must contain columns for groups and gene names (by default ‘group’ and ‘names’), but these can be overridden by celltype and gene_id_column.

Steps (for each group):

Compute aggregator (sum, mean, median, cs) for all bins over that group’s marker genes.
If filtering_algorithm=”permutation”, subsample bins (subsample_size) and build a null distribution by randomly picking genes of size=len(marker_genes).
If filtering_algorithm=”quantile”, compute threshold from (1 - quantile).
Apply cutoff to all bins (values below threshold become 0).
Merge results back into table.obs if add_to_obs=True.

Parameters:

sdata (object) – Spatial data container or AnnData object. Expected to have sdata.tables[f”square_{bin_size:03}um”].
marker_genes (list, dict, or pd.DataFrame) –
Marker genes to use: - list[str]: Single group of marker genes (assigned to common_group_name). - dict[str, list[str]]: Mapping of group names to marker gene lists. - pd.DataFrame: Must have columns for group and gene names (default: ‘group’ and ‘names’),

customizable via celltype and gene_id_column.
common_group_name (str, optional) – Column name assigned if marker_genes is a list. Default is “MarkerGroup”.
celltype (str, optional) – Column name in marker_genes DataFrame for group identifier. Default is “group”.
gene_id_column (str, optional) – Column name in marker_genes DataFrame for gene names. Default is “names”.
exclude_group_names (list[str], optional) – Names of groups to exclude bins where those groups are nonzero. Default is empty.
bin_size (int, optional) – Spatial bin size in microns (e.g., 8 means “square_008um” table). Default is 8.
aggregation_method (str, optional) – How to aggregate gene expression across marker genes. One of “sum”, “mean”, “median”, or “cs” (composite score). Default is “sum”.
add_to_obs (bool, optional) – Whether to add the results into the obs of the spatial data table. Default is True.
filtering_algorithm (str, optional) – Method to determine expression cutoff. Options: “permutation” or “quantile”. Default is “permutation”.
num_permutations (int, optional) – Number of permutations for null distribution (used if filtering_algorithm=”permutation”). Default is 5000.
alpha (float, optional) – Significance level (1 - alpha quantile) for thresholding using permutation. Default is 0.01.
subsample_size (int, optional) – Number of bins used in permutation subsampling. Default is 25000.
subsample_signal_quantile (float, optional) – Quantile range for selecting moderate-expression bins before permutation. Default is 0.1.
permutation_gene_pool_fraction (float, optional) – Fraction of most variable genes used as the background gene pool for permutation. Default is 0.3.
parametric (bool, optional) – If True, fit a parametric distribution (Gamma or Exponential) to null scores for thresholding. Otherwise use empirical quantile. Default is True.
n_subs (int, optional) – Number of smaller subsets to split the permutation into. Default is 5.
quantile (float, optional) – Quantile cutoff if filtering_algorithm=”quantile”. Default is 0.7.
**kwargs – Additional keyword arguments (currently unused but reserved for future extensions).

Returns:

The final DataFrame with aggregated + thresholded expression for each group. Columns = one per group, indexed by bin.

Return type:

pd.DataFrame

easydecon.easydecon.composite_score(row)[source]

easydecon.easydecon.function_row_cosine(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_diagnostic(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_euclidean(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_jaccard(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_mean(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_median(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_overlap(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_spearman(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_sum(row, markers_df, **kwargs)[source]

easydecon.easydecon.function_row_weighted_jaccard(row, markers_df, **kwargs)[source]

easydecon.easydecon.get_clusters_by_similarity_on_tissue(sdata, markers_df, common_group_name=None, bin_size=8, gene_id_column='names', method='wjaccard', add_to_obs=False, **kwargs)[source]

Compute cluster assignments based on a chosen similarity method.

Parameters:

sdata (AnnData-like object) – Spatial (or single-cell) data containing expression matrices. It is expected to have ‘tables’ attribute with keys like “square_00Xum”, or simply be treated as a table if the key doesn’t exist.
markers_df (pd.DataFrame) – DataFrame containing marker genes for each cluster. Rows typically represent clusters, columns represent information about each gene (e.g., logfoldchanges, names, etc.).
common_group_name (str, optional) – Name of a column in table.obs specifying spots to process. If found, only spots where common_group_name != 0 are processed. Otherwise, all spots are processed. Default is None.
bin_size (int, optional) – Determines the bin size (like “square_008um”) for looking up the table in sdata.tables. Default is 8.
gene_id_column (str, optional) – Name of the column in markers_df that contains gene IDs. Default is “names”.
similarity_by_column (str, optional) – Column in markers_df used to measure similarity or weight. Default is “logfoldchanges”.
method (str, optional) – Method to use for computing similarity. Supported methods include: “correlation”, “cosine”, “jaccard”, “overlap”, “wjaccard”, “diagnostic”, “sum”, “mean”, “median”. Default is “wjaccard”.
add_to_obs (bool, optional) – If True, adds the resulting assignment columns to table.obs. Default is True.
**method_kwargs – Additional, method-specific parameters. For example: - For method=”wjaccard”: supply lambda_param, etc.

Returns:

A DataFrame whose index matches table.obs.index with cluster assignment columns (or other metrics) computed by the specified method.

Return type:

pd.DataFrame

easydecon.easydecon.get_proportions_on_tissue(sdata, markers_df, common_group_name=None, bin_size=8, gene_id_column='names', similarity_by_column='logfoldchanges', method='nnls', normalization_method='unit', add_to_obs=True, alpha=0.01, l1_ratio=0.7, verbose=True)[source]

Compute cell-type proportions per spatial bin using NNLS-based deconvolution.

Parameters:

sdata (AnnData-like object) – Spatial data containing expression matrices.
markers_df (pd.DataFrame) – DataFrame containing marker genes with fold-change or scores for each cell type. The index of markers_df should represent cell-type groups.
common_group_name (str, optional) – Column in table.obs to specify spots to process. If None, all spots are processed.
bin_size (int, optional) – Bin size for spatial data lookup, default is 8.
gene_id_column (str, optional) – Column in markers_df with gene identifiers, default is “names”.
similarity_by_column (str, optional) – Column in markers_df containing fold-change values, default is “logfoldchanges”.
normalization_method (str, optional) – Method for normalizing reference matrix, options are “unit”, “l1” or “zscore”, default is “unit”.
add_to_obs (bool, optional) – If True, add proportions to table.obs, default is True.
verbose (bool, optional) – Show progress and info, default is True.

Returns:

Cell-type proportions per spatial bin.

Return type:

pd.DataFrame

easydecon.easydecon.min_max_scale(series)[source]

easydecon.easydecon.napari_region_assignment(sdata, key='Shapes', bin_size=8, column='napari', target_coordinate_system='global')[source]

easydecon.easydecon.plot_assigned_clusters_from_dataframe(sdata, dataframe, sample_id, bin_size=8, title='Assigned Clusters', cmap='tab20', legend_fontsize=8, figsize=(5, 5), dpi=200, method='matplotlib', scale=1)[source]

easydecon.easydecon.process_row(row, func, **kwargs)[source]

easydecon.easydecon.process_row_with_suppression(row, func, **kwargs)[source]: Wrapper around process_row to suppress warnings in each worker.

easydecon.easydecon.read_markers_dataframe(sdata, filename=None, adata=None, exclude_celltype=[], bin_size=8, top_n_genes=60, sort_by_column='scores', ascending=False, gene_id_column='names', celltype='group', key='rank_genes_groups', log2fc_min=0.25, pval_cutoff=0.05)[source]

Reads and processes marker genes data for spatial transcriptomics analysis.

This function can read marker genes data either from a file or from an AnnData object, and processes it to create a filtered and sorted DataFrame of marker genes.

Parameters:

sdataSpatialData object: The spatial data object containing the spatial transcriptomics data.
filenamestr, optional: Path to the input file containing marker genes data (CSV or Excel format). Required if adata is not provided.
adataAnnData object, optional: AnnData object containing the marker genes data. Required if filename is not provided.
exclude_celltypelist, optional: List of cell types to exclude from the analysis. Default: []
bin_sizeint, optional: Size of the spatial bin in micrometers. Default: 8
top_n_genesint, optional: Number of top genes to keep per cell type. Default: 60
sort_by_columnstr, optional: Column name to sort the genes by. Default: “scores”
ascendingbool, optional: Whether to sort in ascending order. Default: False
gene_id_columnstr, optional: Column name containing gene IDs. Default: “names”
celltypestr, optional: Column name containing cell type information. Default: “group”
keystr, optional: Key in adata.uns where marker genes are stored. Default: “rank_genes_groups”
log2fc_minfloat, optional: Minimum log2 fold change threshold for gene selection. Default: 0.25
pval_cutofffloat, optional: Maximum adjusted p-value threshold for gene selection. Default: 0.05

Returns:

pandas.DataFrame: Processed DataFrame containing filtered and sorted marker genes data. The DataFrame includes columns for cell types, gene IDs, and scores.

Raises:

ValueError: If neither filename nor adata is provided. If invalid adata object is provided.

Notes:

The function automatically handles both CSV and Excel file formats.
Genes are filtered based on log2 fold change and adjusted p-value thresholds.
The resulting DataFrame is sorted by the specified column and limited to the top N genes per cell type.

easydecon.easydecon.sparse_var(matrix, axis=0)[source]

easydecon.easydecon.test_function()[source]

easydecon.easydecon.visualize_only_selected_clusters(sdata, clusters, bin_size=8, results_column='easydecon', temp_column='tmp')[source]

easydecon.extra.easydecon_workflow(sdata, markers_df, marker_genes=None, celltype: str = 'group', gene_id_column: str = 'names', exclude_group_names: list[str] | None = None, bin_size: int = 8, aggregation_method: str = 'sum', filtering_algorithm: str = 'permutation', num_permutations: int = 5000, parametric: bool = True, alpha: float = 0.01, subsample_size: int = 25000, subsample_signal_quantile: float = 0.1, permutation_gene_pool_fraction: float = 0.3, n_subs: int = 5, quantile: float = 0.7, method: str = 'wjaccard', similarity_by_column: str = 'logfoldchanges', lambda_param: float = 0.25, weight_column: str = 'logfoldchanges', proportion_method: str = 'nnls', normalization_method: str = 'unit', regularization_alpha: float = 0.01, l1_ratio: float = 0.7, evidence_to_likelihood: str = 'softmax', softmax_tau: float = 1.0, epsilon: float = 1e-12, results_column: str = 'easydecon', assign_method: str = 'max', allow_multiple: bool = False, diagnostic=None, fold_change_threshold: float = 2.0)[source]

easydecon.extra.get_clusters_expression_on_tissue(sdata, markers_df, common_group_name=None, bin_size=8, gene_id_column='names', aggregation_method='mean', add_to_obs=True)[source]

Configuration 

class easydecon.config.Config[source]

Bases: object

batch_size = 1000

n_jobs = 5

easydecon.config.set_batch_size(n)[source]

easydecon.config.set_n_jobs(n)[source]

API Reference

Core Module 

Parameters:

Returns:

Raises:

Notes:

Parameters:

Returns:

Raises:

Notes:

Segmentation Utilities 

Modelling Utilities 

Extra Utilities 

Configuration 

API Reference

Core Module

Parameters:

Returns:

Raises:

Notes:

Parameters:

Returns:

Raises:

Notes:

Segmentation Utilities

Modelling Utilities

Extra Utilities

Configuration

Core Module 

Segmentation Utilities 

Modelling Utilities 

Extra Utilities 

Configuration 