API Reference

The following sections expose the public Python API. Objects are grouped by module for clarity. The documentation uses autodoc to pull docstrings directly from the source code.

Core Module

easydecon.easydecon.add_df_to_spatialdata(sdata, df, bin_size=8)[source]
easydecon.easydecon.assign_clusters_from_df(sdata, df, bin_size=8, results_column='easydecon', method='max', allow_multiple=False, diagnostic=None, fold_change_threshold=2.0, add_to_obs=True)[source]

Assigns cell clusters to spatial spots based on deconvolution results.

This function takes deconvolution results and assigns cell type clusters to each spatial spot using different methods (max, zmax, or hybrid). The results are stored in the spatial data table.

Parameters:

sdataSpatialData object

The spatial data object containing the spatial transcriptomics data.

dfpandas.DataFrame

DataFrame containing deconvolution results with cell type proportions. Rows should correspond to spatial spots, columns to cell types.

bin_sizeint, optional

Size of the spatial bin in micrometers. Default: 8

results_columnstr, optional

Name of the column in the table.obs where results will be stored. Default: “easydecon”

methodstr, optional

Method to use for cluster assignment: - “max”: Assigns the cell type with the highest proportion - “zmax”: Uses z-score normalization before finding the maximum - “hybrid”: Combines similarity scores and adaptive probabilities Default: “max”

allow_multiplebool, optional

Whether to allow multiple cell type assignments per spot. Default: False

diagnosticstr, optional

Path to save diagnostic information (if needed). Default: None

fold_change_thresholdfloat, optional

Threshold for fold change filtering. Default: 2.0

Returns:

None

The function modifies the input sdata object in place by adding cluster assignments to the table.obs DataFrame under the specified results_column.

Raises:

ValueError

If an invalid method is specified.

Notes:

  • The function automatically handles missing values and ensures proper indexing.

  • Results are stored as categorical variables in the table.obs DataFrame.

  • The function supports three different methods for cluster assignment, each with its own characteristics and use cases.

easydecon.easydecon.common_markers_gene_expression_and_filter(sdata: object, marker_genes, common_group_name: str = 'MarkerGroup', celltype: str = 'group', gene_id_column: str = 'names', exclude_group_names: list[str] = [], bin_size: int = 8, aggregation_method: str = 'sum', add_to_obs: bool = True, filtering_algorithm: str = 'permutation', num_permutations: int = 5000, alpha: float = 0.01, subsample_size: int = 25000, subsample_signal_quantile: float = 0.1, permutation_gene_pool_fraction: float = 0.3, parametric: bool = True, n_subs: int = 5, quantile: float = 0.7, output_stat: str = 'expression', **kwargs) fireducks.pandas.DataFrame[source]

Extended version allowing marker_genes as a list, dict, or DataFrame, with customizable column names for the DataFrame.

If marker_genes is:
  1. list[str]: Single group of markers -> create one column named common_group_name.

  2. dict[str, list[str]]: Multiple groups -> each dict key becomes a column in table.obs.

  3. pd.DataFrame: Must contain columns for groups and gene names (by default ‘group’ and ‘names’), but these can be overridden by celltype and gene_id_column.

Steps (for each group):
  1. Compute aggregator (sum, mean, median, cs) for all bins over that group’s marker genes.

  2. If filtering_algorithm=”permutation”, subsample bins (subsample_size) and build a null distribution by randomly picking genes of size=len(marker_genes).

  3. If filtering_algorithm=”quantile”, compute threshold from (1 - quantile).

  4. Apply cutoff to all bins (values below threshold become 0).

  5. Merge results back into table.obs if add_to_obs=True.

Parameters:
  • sdata (object) – Spatial data container or AnnData object. Expected to have sdata.tables[f”square_{bin_size:03}um”].

  • marker_genes (list, dict, or pd.DataFrame) –

    Marker genes to use: - list[str]: Single group of marker genes (assigned to common_group_name). - dict[str, list[str]]: Mapping of group names to marker gene lists. - pd.DataFrame: Must have columns for group and gene names (default: ‘group’ and ‘names’),

    customizable via celltype and gene_id_column.

  • common_group_name (str, optional) – Column name assigned if marker_genes is a list. Default is “MarkerGroup”.

  • celltype (str, optional) – Column name in marker_genes DataFrame for group identifier. Default is “group”.

  • gene_id_column (str, optional) – Column name in marker_genes DataFrame for gene names. Default is “names”.

  • exclude_group_names (list[str], optional) – Names of groups to exclude bins where those groups are nonzero. Default is empty.

  • bin_size (int, optional) – Spatial bin size in microns (e.g., 8 means “square_008um” table). Default is 8.

  • aggregation_method (str, optional) – How to aggregate gene expression across marker genes. One of “sum”, “mean”, “median”, or “cs” (composite score). Default is “sum”.

  • add_to_obs (bool, optional) – Whether to add the results into the obs of the spatial data table. Default is True.

  • filtering_algorithm (str, optional) – Method to determine expression cutoff. Options: “permutation” or “quantile”. Default is “permutation”.

  • num_permutations (int, optional) – Number of permutations for null distribution (used if filtering_algorithm=”permutation”). Default is 5000.

  • alpha (float, optional) – Significance level (1 - alpha quantile) for thresholding using permutation. Default is 0.01.

  • subsample_size (int, optional) – Number of bins used in permutation subsampling. Default is 25000.

  • subsample_signal_quantile (float, optional) – Quantile range for selecting moderate-expression bins before permutation. Default is 0.1.

  • permutation_gene_pool_fraction (float, optional) – Fraction of most variable genes used as the background gene pool for permutation. Default is 0.3.

  • parametric (bool, optional) – If True, fit a parametric distribution (Gamma or Exponential) to null scores for thresholding. Otherwise use empirical quantile. Default is True.

  • n_subs (int, optional) – Number of smaller subsets to split the permutation into. Default is 5.

  • quantile (float, optional) – Quantile cutoff if filtering_algorithm=”quantile”. Default is 0.7.

  • **kwargs – Additional keyword arguments (currently unused but reserved for future extensions).

Returns:

The final DataFrame with aggregated + thresholded expression for each group. Columns = one per group, indexed by bin.

Return type:

pd.DataFrame

easydecon.easydecon.composite_score(row)[source]
easydecon.easydecon.function_row_auc(row, markers_df, **kwargs)[source]

AUROC-style similarity between a spot (row) and each cluster’s marker set.

For each cluster c:
  • positives = marker genes of c present in row.index

  • negatives = all other genes in row.index

  • score = AUROC that positives have higher expression than negatives

Parameters:
  • row (pandas.Series) – Expression values for one spot. Index must be gene IDs.

  • markers_df (pandas.DataFrame) – DataFrame with markers. Index = cluster/cell type, and a column with gene IDs (e.g. ‘names’).

  • gene_id_column (str, in kwargs) – Name of the column in markers_df that holds gene IDs.

  • min_markers (int, in kwargs, optional) – Minimum number of markers that must be present in the spot to compute a score. Otherwise returns fallback (default 0.5).

  • fallback_auc (float, in kwargs, optional) – Value to use when AUC is undefined (e.g. too few markers or no negatives).

Returns:

{cluster_label: auc_score}

Return type:

dict

easydecon.easydecon.function_row_auc_specific(row, markers_df, **kwargs)[source]

High-specificity AUROC scoring. Filters out noise and focuses on top markers to reduce false positives.

easydecon.easydecon.function_row_cosine(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_diagnostic(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_euclidean(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_jaccard(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_mean(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_median(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_overlap(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_spearman(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_sum(row, markers_df, **kwargs)[source]
easydecon.easydecon.function_row_weighted_jaccard(row, markers_df, **kwargs)[source]
easydecon.easydecon.get_clusters_by_similarity_on_tissue(sdata, markers_df, common_group_name=None, bin_size=8, gene_id_column='names', method='wjaccard', add_to_obs=False, **kwargs)[source]

Compute cluster assignments based on a chosen similarity method.

Parameters:
  • sdata (AnnData-like object) – Spatial (or single-cell) data containing expression matrices. It is expected to have ‘tables’ attribute with keys like “square_00Xum”, or simply be treated as a table if the key doesn’t exist.

  • markers_df (pd.DataFrame) – DataFrame containing marker genes for each cluster. Rows typically represent clusters, columns represent information about each gene (e.g., logfoldchanges, names, etc.).

  • common_group_name (str, optional) – Name of a column in table.obs specifying spots to process. If found, only spots where common_group_name != 0 are processed. Otherwise, all spots are processed. Default is None.

  • bin_size (int, optional) – Determines the bin size (like “square_008um”) for looking up the table in sdata.tables. Default is 8.

  • gene_id_column (str, optional) – Name of the column in markers_df that contains gene IDs. Default is “names”.

  • similarity_by_column (str, optional) – Column in markers_df used to measure similarity or weight. Default is “logfoldchanges”.

  • method (str, optional) – Method to use for computing similarity. Supported methods include: “correlation”, “cosine”, “jaccard”, “overlap”, “wjaccard”, “diagnostic”, “sum”, “mean”, “median”. Default is “wjaccard”.

  • add_to_obs (bool, optional) – If True, adds the resulting assignment columns to table.obs. Default is True.

  • **method_kwargs – Additional, method-specific parameters. For example: - For method=”wjaccard”: supply lambda_param, etc.

Returns:

A DataFrame whose index matches table.obs.index with cluster assignment columns (or other metrics) computed by the specified method.

Return type:

pd.DataFrame

easydecon.easydecon.min_max_scale(series)[source]
easydecon.easydecon.napari_region_assignment(sdata, key='Shapes', bin_size=8, column='napari', target_coordinate_system='global')[source]
easydecon.easydecon.plot_assigned_clusters_from_dataframe(sdata, dataframe, sample_id, bin_size=8, title='Assigned Clusters', cmap='tab20', legend_fontsize=8, figsize=(5, 5), dpi=200, method='matplotlib', scale=1)[source]
easydecon.easydecon.process_row(row, func, **kwargs)[source]
easydecon.easydecon.process_row_with_suppression(row, func, **kwargs)[source]

Wrapper around process_row to suppress warnings in each worker.

easydecon.easydecon.read_markers_dataframe(sdata, filename=None, adata=None, exclude_celltype=[], bin_size=8, top_n_genes=60, sort_by_column='scores', ascending=False, gene_id_column='names', celltype='group', key='rank_genes_groups', log2fc_min=0.25, pval_cutoff=0.05, drop_ribosomal=False, drop_mitochondrial=False)[source]

Reads and processes marker genes data for spatial transcriptomics analysis.

This function can read marker genes data either from a file or from an AnnData object, and processes it to create a filtered and sorted DataFrame of marker genes.

Parameters:

sdataSpatialData object

The spatial data object containing the spatial transcriptomics data.

filenamestr, optional

Path to the input file containing marker genes data (CSV or Excel format). Required if adata is not provided.

adataAnnData object, optional

AnnData object containing the marker genes data. Required if filename is not provided.

exclude_celltypelist, optional

List of cell types to exclude from the analysis. Default: []

bin_sizeint, optional

Size of the spatial bin in micrometers. Default: 8

top_n_genesint, optional

Number of top genes to keep per cell type. Default: 60

sort_by_columnstr, optional

Column name to sort the genes by. Default: “scores”

ascendingbool, optional

Whether to sort in ascending order. Default: False

gene_id_columnstr, optional

Column name containing gene IDs. Default: “names”

celltypestr, optional

Column name containing cell type information. Default: “group”

keystr, optional

Key in adata.uns where marker genes are stored. Default: “rank_genes_groups”

log2fc_minfloat, optional

Minimum log2 fold change threshold for gene selection. Default: 0.25

pval_cutofffloat, optional

Maximum adjusted p-value threshold for gene selection. Default: 0.05

drop_ribosomalbool, optional

Whether to remove ribosomal genes before final selection. Removes genes starting with RPS or RPL (case-insensitive). Default: False

drop_mitochondrialbool, optional

Whether to remove mitochondrial genes before final selection. Removes genes starting with MT- or mt-. Default: False

Returns:

pandas.DataFrame

Processed DataFrame containing filtered and sorted marker genes data. The DataFrame includes columns for cell types, gene IDs, and scores.

Raises:

ValueError

If neither filename nor adata is provided. If invalid adata object is provided.

Notes:

  • The function automatically handles both CSV and Excel file formats.

  • Genes are filtered based on log2 fold change and adjusted p-value thresholds.

  • The resulting DataFrame is sorted by the specified column and limited to the top N genes per cell type.

easydecon.easydecon.sparse_var(matrix, axis=0)[source]
easydecon.easydecon.test_function()[source]
easydecon.easydecon.visualize_only_selected_clusters(sdata, clusters, bin_size=8, results_column='easydecon', temp_column='tmp')[source]

Segmentation Utilities

Easydecon bin2cell segmentation helper script with CLI support

easydecon.segmentation.main()[source]
easydecon.segmentation.parse_args()[source]
easydecon.segmentation.run_bin2cell_segmentation(sample_id, binned_002, full_image, spaceranger_image_path, mpp=0.5, model='2D_versatile_he', prob_thresh=0.2, nms_thresh=0.3, min_cells=10, min_counts=5, out_dir='stardist', device='gpu')[source]

Extra Utilities

easydecon.extra.easydecon_workflow(sdata, markers_df, marker_genes=None, mask_col='easydecon_mask', celltype: str = 'group', gene_id_column: str = 'names', exclude_group_names: list[str] | None = None, bin_size: int = 8, aggregation_method: str = 'sum', filtering_algorithm: str = 'permutation', num_permutations: int = 5000, parametric: bool = True, alpha: float = 0.01, subsample_size: int = 25000, subsample_signal_quantile: float = 0, permutation_gene_pool_fraction: float = 0.3, n_subs: int = 5, quantile: float = 0.7, phase1_output_stat: str = 'expression', method: str = 'wjaccard', similarity_by_column: str = 'logfoldchanges', lambda_param: float = 0.25, weight_column: str = 'logfoldchanges', min_markers: int = 3, fallback_auc: float = 0.5, expression_threshold: float = 0.1, evidence_to_likelihood: str = 'softmax', softmax_tau: float = 1.0, epsilon: float = 1e-12, prior_weight: float = 1.0, likelihood_weight: float = 1.0, apply_prior_presence_mask: bool = False, prior_presence_threshold: float = 0.0, results_column: str = 'easydecon', assign_method: str = 'max', allow_multiple: bool = False, diagnostic=None, fold_change_threshold: float = 2.0)[source]

Configuration

class easydecon.config.Config[source]

Bases: object

batch_size = 1000
n_jobs = 5
easydecon.config.set_batch_size(n)[source]
easydecon.config.set_n_jobs(n)[source]