API Reference
The following sections expose the public Python API. Objects are grouped by
module for clarity. The documentation uses autodoc to pull docstrings
directly from the source code.
Core Module
- easydecon.easydecon.assign_clusters_from_df(sdata, df, bin_size=8, results_column='easydecon', method='max', allow_multiple=False, diagnostic=None, fold_change_threshold=2.0, add_to_obs=True)[source]
Assigns cell clusters to spatial spots based on deconvolution results.
This function takes deconvolution results and assigns cell type clusters to each spatial spot using different methods (max, zmax, or hybrid). The results are stored in the spatial data table.
Parameters:
- sdataSpatialData object
The spatial data object containing the spatial transcriptomics data.
- dfpandas.DataFrame
DataFrame containing deconvolution results with cell type proportions. Rows should correspond to spatial spots, columns to cell types.
- bin_sizeint, optional
Size of the spatial bin in micrometers. Default: 8
- results_columnstr, optional
Name of the column in the table.obs where results will be stored. Default: “easydecon”
- methodstr, optional
Method to use for cluster assignment: - “max”: Assigns the cell type with the highest proportion - “zmax”: Uses z-score normalization before finding the maximum - “hybrid”: Combines similarity scores and adaptive probabilities Default: “max”
- allow_multiplebool, optional
Whether to allow multiple cell type assignments per spot. Default: False
- diagnosticstr, optional
Path to save diagnostic information (if needed). Default: None
- fold_change_thresholdfloat, optional
Threshold for fold change filtering. Default: 2.0
Returns:
- None
The function modifies the input sdata object in place by adding cluster assignments to the table.obs DataFrame under the specified results_column.
Raises:
- ValueError
If an invalid method is specified.
Notes:
The function automatically handles missing values and ensures proper indexing.
Results are stored as categorical variables in the table.obs DataFrame.
The function supports three different methods for cluster assignment, each with its own characteristics and use cases.
- easydecon.easydecon.common_markers_gene_expression_and_filter(sdata: object, marker_genes, common_group_name: str = 'MarkerGroup', celltype: str = 'group', gene_id_column: str = 'names', exclude_group_names: list[str] = [], bin_size: int = 8, aggregation_method: str = 'sum', add_to_obs: bool = True, filtering_algorithm: str = 'permutation', num_permutations: int = 5000, alpha: float = 0.01, subsample_size: int = 25000, subsample_signal_quantile: float = 0.1, permutation_gene_pool_fraction: float = 0.3, parametric: bool = True, n_subs: int = 5, quantile: float = 0.7, **kwargs) fireducks.pandas.DataFrame[source]
Extended version allowing marker_genes as a list, dict, or DataFrame, with customizable column names for the DataFrame.
- If marker_genes is:
list[str]: Single group of markers -> create one column named common_group_name.
dict[str, list[str]]: Multiple groups -> each dict key becomes a column in table.obs.
pd.DataFrame: Must contain columns for groups and gene names (by default ‘group’ and ‘names’), but these can be overridden by celltype and gene_id_column.
- Steps (for each group):
Compute aggregator (sum, mean, median, cs) for all bins over that group’s marker genes.
If filtering_algorithm=”permutation”, subsample bins (subsample_size) and build a null distribution by randomly picking genes of size=len(marker_genes).
If filtering_algorithm=”quantile”, compute threshold from (1 - quantile).
Apply cutoff to all bins (values below threshold become 0).
Merge results back into table.obs if add_to_obs=True.
- Parameters:
sdata (object) – Spatial data container or AnnData object. Expected to have sdata.tables[f”square_{bin_size:03}um”].
marker_genes (list, dict, or pd.DataFrame) –
Marker genes to use: - list[str]: Single group of marker genes (assigned to common_group_name). - dict[str, list[str]]: Mapping of group names to marker gene lists. - pd.DataFrame: Must have columns for group and gene names (default: ‘group’ and ‘names’),
customizable via celltype and gene_id_column.
common_group_name (str, optional) – Column name assigned if marker_genes is a list. Default is “MarkerGroup”.
celltype (str, optional) – Column name in marker_genes DataFrame for group identifier. Default is “group”.
gene_id_column (str, optional) – Column name in marker_genes DataFrame for gene names. Default is “names”.
exclude_group_names (list[str], optional) – Names of groups to exclude bins where those groups are nonzero. Default is empty.
bin_size (int, optional) – Spatial bin size in microns (e.g., 8 means “square_008um” table). Default is 8.
aggregation_method (str, optional) – How to aggregate gene expression across marker genes. One of “sum”, “mean”, “median”, or “cs” (composite score). Default is “sum”.
add_to_obs (bool, optional) – Whether to add the results into the obs of the spatial data table. Default is True.
filtering_algorithm (str, optional) – Method to determine expression cutoff. Options: “permutation” or “quantile”. Default is “permutation”.
num_permutations (int, optional) – Number of permutations for null distribution (used if filtering_algorithm=”permutation”). Default is 5000.
alpha (float, optional) – Significance level (1 - alpha quantile) for thresholding using permutation. Default is 0.01.
subsample_size (int, optional) – Number of bins used in permutation subsampling. Default is 25000.
subsample_signal_quantile (float, optional) – Quantile range for selecting moderate-expression bins before permutation. Default is 0.1.
permutation_gene_pool_fraction (float, optional) – Fraction of most variable genes used as the background gene pool for permutation. Default is 0.3.
parametric (bool, optional) – If True, fit a parametric distribution (Gamma or Exponential) to null scores for thresholding. Otherwise use empirical quantile. Default is True.
n_subs (int, optional) – Number of smaller subsets to split the permutation into. Default is 5.
quantile (float, optional) – Quantile cutoff if filtering_algorithm=”quantile”. Default is 0.7.
**kwargs – Additional keyword arguments (currently unused but reserved for future extensions).
- Returns:
The final DataFrame with aggregated + thresholded expression for each group. Columns = one per group, indexed by bin.
- Return type:
pd.DataFrame
- easydecon.easydecon.get_clusters_by_similarity_on_tissue(sdata, markers_df, common_group_name=None, bin_size=8, gene_id_column='names', method='wjaccard', add_to_obs=False, **kwargs)[source]
Compute cluster assignments based on a chosen similarity method.
- Parameters:
sdata (AnnData-like object) – Spatial (or single-cell) data containing expression matrices. It is expected to have ‘tables’ attribute with keys like “square_00Xum”, or simply be treated as a table if the key doesn’t exist.
markers_df (pd.DataFrame) – DataFrame containing marker genes for each cluster. Rows typically represent clusters, columns represent information about each gene (e.g., logfoldchanges, names, etc.).
common_group_name (str, optional) – Name of a column in table.obs specifying spots to process. If found, only spots where common_group_name != 0 are processed. Otherwise, all spots are processed. Default is None.
bin_size (int, optional) – Determines the bin size (like “square_008um”) for looking up the table in sdata.tables. Default is 8.
gene_id_column (str, optional) – Name of the column in markers_df that contains gene IDs. Default is “names”.
similarity_by_column (str, optional) – Column in markers_df used to measure similarity or weight. Default is “logfoldchanges”.
method (str, optional) – Method to use for computing similarity. Supported methods include: “correlation”, “cosine”, “jaccard”, “overlap”, “wjaccard”, “diagnostic”, “sum”, “mean”, “median”. Default is “wjaccard”.
add_to_obs (bool, optional) – If True, adds the resulting assignment columns to table.obs. Default is True.
**method_kwargs – Additional, method-specific parameters. For example: - For method=”wjaccard”: supply
lambda_param, etc.
- Returns:
A DataFrame whose index matches table.obs.index with cluster assignment columns (or other metrics) computed by the specified method.
- Return type:
pd.DataFrame
- easydecon.easydecon.get_proportions_on_tissue(sdata, markers_df, common_group_name=None, bin_size=8, gene_id_column='names', similarity_by_column='logfoldchanges', method='nnls', normalization_method='unit', add_to_obs=True, alpha=0.01, l1_ratio=0.7, verbose=True)[source]
Compute cell-type proportions per spatial bin using NNLS-based deconvolution.
- Parameters:
sdata (AnnData-like object) – Spatial data containing expression matrices.
markers_df (pd.DataFrame) – DataFrame containing marker genes with fold-change or scores for each cell type. The index of markers_df should represent cell-type groups.
common_group_name (str, optional) – Column in table.obs to specify spots to process. If None, all spots are processed.
bin_size (int, optional) – Bin size for spatial data lookup, default is 8.
gene_id_column (str, optional) – Column in markers_df with gene identifiers, default is “names”.
similarity_by_column (str, optional) – Column in markers_df containing fold-change values, default is “logfoldchanges”.
normalization_method (str, optional) – Method for normalizing reference matrix, options are “unit”, “l1” or “zscore”, default is “unit”.
add_to_obs (bool, optional) – If True, add proportions to table.obs, default is True.
verbose (bool, optional) – Show progress and info, default is True.
- Returns:
Cell-type proportions per spatial bin.
- Return type:
pd.DataFrame
- easydecon.easydecon.napari_region_assignment(sdata, key='Shapes', bin_size=8, column='napari', target_coordinate_system='global')[source]
- easydecon.easydecon.plot_assigned_clusters_from_dataframe(sdata, dataframe, sample_id, bin_size=8, title='Assigned Clusters', cmap='tab20', legend_fontsize=8, figsize=(5, 5), dpi=200, method='matplotlib', scale=1)[source]
- easydecon.easydecon.process_row_with_suppression(row, func, **kwargs)[source]
Wrapper around process_row to suppress warnings in each worker.
- easydecon.easydecon.read_markers_dataframe(sdata, filename=None, adata=None, exclude_celltype=[], bin_size=8, top_n_genes=60, sort_by_column='scores', ascending=False, gene_id_column='names', celltype='group', key='rank_genes_groups', log2fc_min=0.25, pval_cutoff=0.05)[source]
Reads and processes marker genes data for spatial transcriptomics analysis.
This function can read marker genes data either from a file or from an AnnData object, and processes it to create a filtered and sorted DataFrame of marker genes.
Parameters:
- sdataSpatialData object
The spatial data object containing the spatial transcriptomics data.
- filenamestr, optional
Path to the input file containing marker genes data (CSV or Excel format). Required if adata is not provided.
- adataAnnData object, optional
AnnData object containing the marker genes data. Required if filename is not provided.
- exclude_celltypelist, optional
List of cell types to exclude from the analysis. Default: []
- bin_sizeint, optional
Size of the spatial bin in micrometers. Default: 8
- top_n_genesint, optional
Number of top genes to keep per cell type. Default: 60
- sort_by_columnstr, optional
Column name to sort the genes by. Default: “scores”
- ascendingbool, optional
Whether to sort in ascending order. Default: False
- gene_id_columnstr, optional
Column name containing gene IDs. Default: “names”
- celltypestr, optional
Column name containing cell type information. Default: “group”
- keystr, optional
Key in adata.uns where marker genes are stored. Default: “rank_genes_groups”
- log2fc_minfloat, optional
Minimum log2 fold change threshold for gene selection. Default: 0.25
- pval_cutofffloat, optional
Maximum adjusted p-value threshold for gene selection. Default: 0.05
Returns:
- pandas.DataFrame
Processed DataFrame containing filtered and sorted marker genes data. The DataFrame includes columns for cell types, gene IDs, and scores.
Raises:
- ValueError
If neither filename nor adata is provided. If invalid adata object is provided.
Notes:
The function automatically handles both CSV and Excel file formats.
Genes are filtered based on log2 fold change and adjusted p-value thresholds.
The resulting DataFrame is sorted by the specified column and limited to the top N genes per cell type.
Segmentation Utilities
Modelling Utilities
Extra Utilities
- easydecon.extra.easydecon_workflow(sdata, markers_df, marker_genes=None, celltype: str = 'group', gene_id_column: str = 'names', exclude_group_names: list[str] | None = None, bin_size: int = 8, aggregation_method: str = 'sum', filtering_algorithm: str = 'permutation', num_permutations: int = 5000, parametric: bool = True, alpha: float = 0.01, subsample_size: int = 25000, subsample_signal_quantile: float = 0.1, permutation_gene_pool_fraction: float = 0.3, n_subs: int = 5, quantile: float = 0.7, method: str = 'wjaccard', similarity_by_column: str = 'logfoldchanges', lambda_param: float = 0.25, weight_column: str = 'logfoldchanges', proportion_method: str = 'nnls', normalization_method: str = 'unit', regularization_alpha: float = 0.01, l1_ratio: float = 0.7, evidence_to_likelihood: str = 'softmax', softmax_tau: float = 1.0, epsilon: float = 1e-12, results_column: str = 'easydecon', assign_method: str = 'max', allow_multiple: bool = False, diagnostic=None, fold_change_threshold: float = 2.0)[source]