Skip to content

benchmark

compare

The compare facade: evaluate methods on a task battery and report significance.

compare

compare(
    methods,
    data,
    task="classification",
    *,
    num_classes=None,
    predict_fn=None,
    metrics=None,
    prob_metrics=None,
    test="wilcoxon",
    alpha=0.05,
    ignore_index=None,
    device=None,
)

Compare methods on a standard battery and report significance.

Parameters:

Name Type Description Default
task str

"classification" or "segmentation".

'classification'
num_classes int or None

Required when metrics is not provided.

None
ignore_index int or None

Label to exclude from segmentation metrics (e.g. a void/boundary class).

None
prob_metrics frozenset[str] or None

Metrics whose names need probabilities; defaults to the task's set.

None

BenchmarkResult dataclass

Holds benchmark results.

Attributes:

Name Type Description
data Dataset

Dims (method, seed), one data variable per metric.

comparisons DataFrame

Pairwise significance results (see _stats.compare_methods).

alpha float

Significance level used.

summary

summary(reference=None)

Publication-ready table: per method/metric mean and CI, with a "*" marker when the method differs significantly from reference (default: the first method in data).