Metric submodule#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Functions defining the metric used to find the BMU.

How to include a custom metric ?#

To write a custom metric, the following signature must be used:

def custom_metric(coord1: np.ndarray, coord2: np.ndarray, *args, squared: bool = False, axis: int = 0, **kwargs) -> float:

Note that in the above signature, *args should always correspond to iterable arguments with the same dimension as coord1 (e.g. uncertainties). On the other hand, **kwargs correspond to additional arguments which do not have to have the same shape as coord1.

Additionally, the function must be able to return either the distance (if squared = False) or its squared value (if squared = True).

For instance, assume we want to define a new metric which takes into account errors which can be scaled by a given factor. Such a metric could be written as

def new_metric(coord1: np.ndarray, coord2: np.ndarray, errors: np.ndarray, *args, squared: bool = False, axis: int = 0, factor: float = 1.0, **kwargs):

    diff = (coord1 - coord2)/(error*factor)

    if squared:
        return np.sum(diff*diff, axis=axis)
    else:
        return np.sqrt(np.sum(diff*diff, axis=axis))

Note

Even if not used, it is better to keep the *args and **kwargs parameters in the metric declaration.

A note on normalisation#

Depending on the metric used, the data may need to be normalised beforehand, and the SOM weight vectors may need to be un-normalised.

The following metrics need normalised train and test data to give an optimal result:

The following metrics need normalised train and test data and un-normalised SOM initial weight vectors:

To un-normalise the initial values of the SOM weight vectors, the unormalise_weights=True argument can be passed to the fit() method of the SOM, for instance doing:

som = SOM(m, n, dim, lr=lr, sigma=sigma, metric=chi2CigaleMetric, max_iter=max_iter)
som.fit(X, error, epochs=1, shuffle=True, n_jobs=1, unnormalise_weights=False)

API#

SOMptimised.metric.chi2CigaleMetric(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Provide a \(\chi^2\) distance defined in Cigale estimated between coord1 and coord2 using an uncertainty given by error. The \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by

\[d = \sqrt{\sum_i \left ( \frac{a_i - \alpha b_i}{e_i} \right )^2}\]

where \(\alpha\) is a scale factor which is computed by the function as

\[\alpha = \frac{\sum_i a_i b_i / e_i^2}{\sum_i b_i^2 / e_i^2}\]

Note

If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.

Parameters
  • coord1 (ndarray) – first array of coordinates

  • coord2 (int, float or ndarray [float]) – second array of coordinates

  • error (ndarray [float]) – array of uncertainties. Must have the same shape as coord1. To provide no error, set no_error to True.

  • squared (bool) – (Optional) whether to return the square of the metric or not

  • axis (int) – (Optional) axis onto which to compute the sum

  • no_error (bool) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)

Returns

euclidian distance estimated between the two sets of coordinates

Return type

float

Raises

TypeError – if

  • not isinstance(no_error, bool)

  • not isinstance(coord1, np.ndarray)

  • not isinstance(coord2, np.ndarray)

  • not isinstance(error, (np.ndarray, int, float))

  • not isinstance(squared, bool)

  • not isinstance(axis, int)

SOMptimised.metric.chi2CigaleMetricPenalised(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, multFac: Union[int, float] = 1, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Provide a penalised \(\chi^2\) distance as defined in Cigale estimated between coord1 and coord2 using an uncertainty given by error. The penalised \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by

\[d = \sqrt{\sum_{i > 0} \left ( \frac{a_i - \alpha b_i}{e_i} \right )^2 \times \exp \lbrace | m (a_0 - b_0) / e_0 | \rbrace}\]

where \(m\) is the multiplicative factor hyperparameter given by multFac and \(\alpha\) is a scale factor which is computed by the function as

\[\alpha = \frac{\sum_i a_i b_i / e_i^2}{\sum_i b_i^2 / e_i^2}.\]

Important

This metric therefore includes a penalty function (taken as exponential ) for the first coordinate (e.g. for a redshift). For no penalty, use instead chi2CigaleMetric().

Note

If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.

Parameters
  • coord1 (ndarray) – first array of coordinates where the first column is penalised

  • coord2 (int, float or ndarray [float]) – second array of coordinates where the first column is penalised

  • error (ndarray [float]) – array of uncertainties (first column used in the penalty function). Must have the same shape as coord1. To provide no error, set no_error to True.

  • multFac (int or float) – (Optional) multiplicative factor in the penalty function (see definition above)

  • squared (bool) – (Optional) whether to return the square of the metric or not

  • axis (int) – (Optional) axis onto which to compute the sum

  • no_error (bool) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)

Returns

euclidian distance estimated between the two sets of coordinates

Return type

float

Raises

TypeError – if

  • not isinstance(no_error, bool)

  • not isinstance(coord1, np.ndarray)

  • not isinstance(coord2, np.ndarray)

  • not isinstance(multFac, (int, float))

  • not isinstance(error, (np.ndarray, int, float))

  • not isinstance(squared, bool)

  • not isinstance(axis, int)

SOMptimised.metric.chi2Metric(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Provide a \(\chi^2\) distance estimated between coord1 and coord2 using an uncertainty given by error. The \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by

\[d = \sqrt{\sum_i \left ( \frac{a_i - b_i}{e_i} \right )^2}\]

Note

If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.

Parameters
  • coord1 (ndarray) – first array of coordinates

  • coord2 (int, float or ndarray [float]) – second array of coordinates

  • error (ndarray [float]) – array of uncertainties. Must have the same shape as coord1. To provide no error, set no_error to True.

  • squared (bool) – (Optional) whether to return the square of the metric or not

  • axis (int) – (Optional) axis onto which to compute the sum

  • no_error (bool) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)

Returns

euclidian distance estimated between the two sets of coordinates

Return type

float

Raises

TypeError – if

  • not isinstance(no_error, bool)

  • not isinstance(coord1, np.ndarray)

  • not isinstance(coord2, np.ndarray)

  • not isinstance(error, (np.ndarray, int, float))

  • not isinstance(squared, bool)

  • not isinstance(axis, int)

SOMptimised.metric.chi2MetricPenalised(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, multFac: Union[int, float] = 1, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Provide a \(\chi^2\) distance estimated between coord1 and coord2 using an uncertainty given by error. The \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by

\[d = \sqrt{\sum_{i > 0} \left ( \frac{a_i - b_i}{e_i} \right )^2 \times \exp \lbrace | m (a_0 - b_0) / e_0 | \rbrace}\]

Important

This metric includes a penalty function (taken as exponential ) for the first coordinate (e.g. for a redshift). For no penalty, use instead chi2Metric().

Note

If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.

Parameters
  • coord1 (ndarray) – first array of coordinates with the first column penalised

  • coord2 (int, float or ndarray [float]) – second array of coordinates with the first column penalised

  • error (ndarray [float]) – array of uncertainties (first column used in the penalty function). Must have the same shape as coord1. To provide no error, set no_error to True.

  • multFac (int or float) – (Optional) multiplicative factor in the penalty function (see definition above)

  • squared (bool) – (Optional) whether to return the square of the metric or not

  • axis (int) – (Optional) axis onto which to compute the sum

  • no_error (bool) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)

Returns

euclidian distance estimated between the two sets of coordinates

Return type

float

Raises

TypeError – if

  • not isinstance(no_error, bool)

  • not isinstance(coord1, np.ndarray)

  • not isinstance(coord2, np.ndarray)

  • not isinstance(multFac, (int, float))

  • not isinstance(error, (np.ndarray, int, float))

  • not isinstance(squared, bool)

  • not isinstance(axis, int)

SOMptimised.metric.euclidianMetric(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], *args, squared: bool = False, axis: int = 1, **kwargs) float[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Provide the euclidian distance estimated between coord1 and coord2. The euclidian distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) is given by

\[d = \sqrt{\sum_i (a_i - b_i)^2}\]
Parameters
  • coord1 (ndarray) – first array of coordinates

  • coord2 (int, float or ndarray [float]) – second array of coordinates

  • axis (int) – (Optional) axis onto which to compute the sum

Returns

euclidian distance estimated between the two sets of coordinates

Return type

float

Raises

TypeError – if

  • not isinstance(coord1, np.ndarray)

  • not isinstance(coord2, np.ndarray)

  • not isinstance(squared, bool)

  • not isinstance(axis, int)