dacapo.utils.voi

Functions

voi(reconstruction, groundtruth[, ...])

Return the conditional entropies of the variation of information metric. [1]

split_vi(x[, y, ignore_x, ignore_y])

Return the symmetric conditional entropies associated with the VI.

vi_tables(x[, y, ignore_x, ignore_y])

Return probability tables used for calculating VI.

contingency_table(seg, gt[, ignore_seg, ignore_gt, norm])

Return the contingency table for all regions in matched segmentations.

divide_columns(matrix, row[, in_place])

Divide each column of matrix by the corresponding element in row.

divide_rows(matrix, column[, in_place])

Divide each row of matrix by the corresponding element in column.

xlogx(x[, out, in_place])

Compute x * log_2(x).

Module Contents

dacapo.utils.voi.voi(reconstruction, groundtruth, ignore_reconstruction=[], ignore_groundtruth=[0])

Return the conditional entropies of the variation of information metric. [1]

Let X be a reconstruction, and Y a ground truth labelling. The variation of information between the two is the sum of two conditional entropies:

VI(X, Y) = H(X|Y) + H(Y|X).

The first one, H(X|Y), is a measure of oversegmentation, the second one, H(Y|X), a measure of undersegmentation. These measures are referred to as the variation of information split or merge error, respectively.

Parameters:
  • seg (np.ndarray, int type, arbitrary shape) – A candidate segmentation.

  • gt (np.ndarray, int type, same shape as seg) – The ground truth segmentation.

  • ignore_seg (list of int, optional) – Any points having a label in this list are ignored in the evaluation. By default, only the label 0 in the ground truth will be ignored.

  • ignore_gt (list of int, optional) – Any points having a label in this list are ignored in the evaluation. By default, only the label 0 in the ground truth will be ignored.

Returns:

(split, merge) – The variation of information split and merge error, i.e., H(X|Y) and H(Y|X)

Return type:

float

Raises:

ValueError – If reconstruction and groundtruth have different shapes.

References

[1] Meila, M. (2007). Comparing clusterings - an information based distance. Journal of Multivariate Analysis 98, 873-895.

dacapo.utils.voi.split_vi(x, y=None, ignore_x=[0], ignore_y=[0])

Return the symmetric conditional entropies associated with the VI.

The variation of information is defined as VI(X,Y) = H(X|Y) + H(Y|X). If Y is the ground-truth segmentation, then H(Y|X) can be interpreted as the amount of under-segmentation of Y and H(X|Y) is then the amount of over-segmentation. In other words, a perfect over-segmentation will have H(Y|X)=0 and a perfect under-segmentation will have H(X|Y)=0.

If y is None, x is assumed to be a contingency table.

Parameters:
  • x (np.ndarray) – Label field (int type) or contingency table (float). x is interpreted as a contingency table (summing to 1.0) if and only if y is not provided.

  • y (np.ndarray of int, same shape as x, optional) – A label field to compare to x.

  • ignore_x (list of int, optional) – Any points having a label in this list are ignored in the evaluation. Ignore 0-labeled points by default.

  • ignore_y (list of int, optional) – Any points having a label in this list are ignored in the evaluation. Ignore 0-labeled points by default.

Returns:

sv – The conditional entropies of Y|X and X|Y.

Return type:

np.ndarray of float, shape (2,)

See also

vi

dacapo.utils.voi.vi_tables(x, y=None, ignore_x=[0], ignore_y=[0])

Return probability tables used for calculating VI.

If y is None, x is assumed to be a contingency table.

Parameters:
  • x (np.ndarray) – Either x and y are provided as equal-shaped np.ndarray label fields (int type), or y is not provided and x is a contingency table (sparse.csc_matrix) that may or may not sum to 1.

  • y (np.ndarray) – Either x and y are provided as equal-shaped np.ndarray label fields (int type), or y is not provided and x is a contingency table (sparse.csc_matrix) that may or may not sum to 1.

  • ignore_x (list of int, optional) – Rows and columns (respectively) to ignore in the contingency table. These are labels that are not counted when evaluating VI.

  • ignore_y (list of int, optional) – Rows and columns (respectively) to ignore in the contingency table. These are labels that are not counted when evaluating VI.

Returns:

  • pxy (sparse.csc_matrix of float) – The normalized contingency table.

  • px, py, hxgy, hygx, lpygx, lpxgy (np.ndarray of float) – The proportions of each label in x and y (px, py), the per-segment conditional entropies of x given y and vice-versa, the per-segment conditional probability p log p.

Raises:

ValueError – If x and y have different shapes.

dacapo.utils.voi.contingency_table(seg, gt, ignore_seg=[0], ignore_gt=[0], norm=True)

Return the contingency table for all regions in matched segmentations.

Parameters:
  • seg (np.ndarray, int type, arbitrary shape) – A candidate segmentation.

  • gt (np.ndarray, int type, same shape as seg) – The ground truth segmentation.

  • ignore_seg (list of int, optional) – Values to ignore in seg. Voxels in seg having a value in this list will not contribute to the contingency table. (default: [0])

  • ignore_gt (list of int, optional) – Values to ignore in gt. Voxels in gt having a value in this list will not contribute to the contingency table. (default: [0])

  • norm (bool, optional) – Whether to normalize the table so that it sums to 1.

Returns:

cont – A contingency table. cont[i, j] will equal the number of voxels labeled i in seg and j in gt. (Or the proportion of such voxels if norm=True.)

Return type:

scipy.sparse.csc_matrix

Raises:

ValueError – If seg and gt have different shapes.

dacapo.utils.voi.divide_columns(matrix, row, in_place=False)

Divide each column of matrix by the corresponding element in row.

The result is as follows: out[i, j] = matrix[i, j] / row[j]

Parameters:
  • matrix (np.ndarray, scipy.sparse.csc_matrix or csr_matrix, shape (M, N)) – The input matrix.

  • column (a 1D np.ndarray, shape (N,)) – The row dividing matrix.

  • in_place (bool (optional, default False)) – Do the computation in-place.

Returns:

out – The result of the row-wise division.

Return type:

same type as matrix

Raises:

ValueError – If row contains zeros.

dacapo.utils.voi.divide_rows(matrix, column, in_place=False)

Divide each row of matrix by the corresponding element in column.

The result is as follows: out[i, j] = matrix[i, j] / column[i]

Parameters:
  • matrix (np.ndarray, scipy.sparse.csc_matrix or csr_matrix, shape (M, N)) – The input matrix.

  • column (a 1D np.ndarray, shape (M,)) – The column dividing matrix.

  • in_place (bool (optional, default False)) – Do the computation in-place.

Returns:

out – The result of the row-wise division.

Return type:

same type as matrix

Raises:

ValueError – If column contains zeros.

dacapo.utils.voi.xlogx(x, out=None, in_place=False)

Compute x * log_2(x).

We define 0 * log_2(0) = 0

Parameters:
  • x (np.ndarray or scipy.sparse.csc_matrix or csr_matrix) – The input array.

  • out (same type as x (optional)) – If provided, use this array/matrix for the result.

  • in_place (bool (optional, default False)) – Operate directly on x.

Returns:

y – Result of x * log_2(x).

Return type:

same type as x

Raises:

ValueError – If x contains negative values.