Metric submodule
Contents
Metric submodule#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Functions defining the metric used to find the BMU.
How to include a custom metric ?#
To write a custom metric, the following signature must be used:
def custom_metric(coord1: np.ndarray, coord2: np.ndarray, *args, squared: bool = False, axis: int = 0, **kwargs) -> float:
Note that in the above signature, *args should always correspond to iterable arguments with the same dimension as coord1 (e.g. uncertainties). On the other hand, **kwargs correspond to additional arguments which do not have to have the same shape as coord1.
Additionally, the function must be able to return either the distance (if squared = False
) or its squared value (if squared = True
).
For instance, assume we want to define a new metric which takes into account errors which can be scaled by a given factor. Such a metric could be written as
def new_metric(coord1: np.ndarray, coord2: np.ndarray, errors: np.ndarray, *args, squared: bool = False, axis: int = 0, factor: float = 1.0, **kwargs):
diff = (coord1 - coord2)/(error*factor)
if squared:
return np.sum(diff*diff, axis=axis)
else:
return np.sqrt(np.sum(diff*diff, axis=axis))
Note
Even if not used, it is better to keep the *args and **kwargs parameters in the metric declaration.
A note on normalisation#
Depending on the metric used, the data may need to be normalised beforehand, and the SOM weight vectors may need to be un-normalised.
The following metrics need normalised train and test data to give an optimal result:
The following metrics need normalised train and test data and un-normalised SOM initial weight vectors:
To un-normalise the initial values of the SOM weight vectors, the unormalise_weights=True
argument can be passed to the fit()
method of the SOM, for instance doing:
som = SOM(m, n, dim, lr=lr, sigma=sigma, metric=chi2CigaleMetric, max_iter=max_iter)
som.fit(X, error, epochs=1, shuffle=True, n_jobs=1, unnormalise_weights=False)
API#
- SOMptimised.metric.chi2CigaleMetric(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Provide a \(\chi^2\) distance defined in Cigale estimated between coord1 and coord2 using an uncertainty given by error. The \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by
\[d = \sqrt{\sum_i \left ( \frac{a_i - \alpha b_i}{e_i} \right )^2}\]where \(\alpha\) is a scale factor which is computed by the function as
\[\alpha = \frac{\sum_i a_i b_i / e_i^2}{\sum_i b_i^2 / e_i^2}\]Note
If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.
- Parameters
coord1 (ndarray) – first array of coordinates
coord2 (
int
,float
or ndarray [float
]) – second array of coordinateserror (ndarray [
float
]) – array of uncertainties. Must have the same shape as coord1. To provide no error, set no_error toTrue
.squared (
bool
) – (Optional) whether to return the square of the metric or notaxis (
int
) – (Optional) axis onto which to compute the sumno_error (
bool
) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)
- Returns
euclidian distance estimated between the two sets of coordinates
- Return type
float
- Raises
TypeError – if
not isinstance(no_error, bool)
not isinstance(coord1, np.ndarray)
not isinstance(coord2, np.ndarray)
not isinstance(error, (np.ndarray, int, float))
not isinstance(squared, bool)
not isinstance(axis, int)
- SOMptimised.metric.chi2CigaleMetricPenalised(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, multFac: Union[int, float] = 1, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Provide a penalised \(\chi^2\) distance as defined in Cigale estimated between coord1 and coord2 using an uncertainty given by error. The penalised \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by
\[d = \sqrt{\sum_{i > 0} \left ( \frac{a_i - \alpha b_i}{e_i} \right )^2 \times \exp \lbrace | m (a_0 - b_0) / e_0 | \rbrace}\]where \(m\) is the multiplicative factor hyperparameter given by multFac and \(\alpha\) is a scale factor which is computed by the function as
\[\alpha = \frac{\sum_i a_i b_i / e_i^2}{\sum_i b_i^2 / e_i^2}.\]Important
This metric therefore includes a penalty function (taken as exponential ) for the first coordinate (e.g. for a redshift). For no penalty, use instead
chi2CigaleMetric()
.Note
If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.
- Parameters
coord1 (ndarray) – first array of coordinates where the first column is penalised
coord2 (
int
,float
or ndarray [float
]) – second array of coordinates where the first column is penalisederror (ndarray [
float
]) – array of uncertainties (first column used in the penalty function). Must have the same shape as coord1. To provide no error, set no_error toTrue
.multFac (
int
orfloat
) – (Optional) multiplicative factor in the penalty function (see definition above)squared (
bool
) – (Optional) whether to return the square of the metric or notaxis (
int
) – (Optional) axis onto which to compute the sumno_error (
bool
) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)
- Returns
euclidian distance estimated between the two sets of coordinates
- Return type
float
- Raises
TypeError – if
not isinstance(no_error, bool)
not isinstance(coord1, np.ndarray)
not isinstance(coord2, np.ndarray)
not isinstance(multFac, (int, float))
not isinstance(error, (np.ndarray, int, float))
not isinstance(squared, bool)
not isinstance(axis, int)
- SOMptimised.metric.chi2Metric(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Provide a \(\chi^2\) distance estimated between coord1 and coord2 using an uncertainty given by error. The \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by
\[d = \sqrt{\sum_i \left ( \frac{a_i - b_i}{e_i} \right )^2}\]Note
If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.
- Parameters
coord1 (ndarray) – first array of coordinates
coord2 (
int
,float
or ndarray [float
]) – second array of coordinateserror (ndarray [
float
]) – array of uncertainties. Must have the same shape as coord1. To provide no error, set no_error toTrue
.squared (
bool
) – (Optional) whether to return the square of the metric or notaxis (
int
) – (Optional) axis onto which to compute the sumno_error (
bool
) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)
- Returns
euclidian distance estimated between the two sets of coordinates
- Return type
float
- Raises
TypeError – if
not isinstance(no_error, bool)
not isinstance(coord1, np.ndarray)
not isinstance(coord2, np.ndarray)
not isinstance(error, (np.ndarray, int, float))
not isinstance(squared, bool)
not isinstance(axis, int)
- SOMptimised.metric.chi2MetricPenalised(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], error: numpy.ndarray, *args, multFac: Union[int, float] = 1, squared: bool = False, axis: int = 1, no_error: bool = False, **kwargs) float [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Provide a \(\chi^2\) distance estimated between coord1 and coord2 using an uncertainty given by error. The \(\chi^2\) distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) with error \(e = (e_i)\) is given by
\[d = \sqrt{\sum_{i > 0} \left ( \frac{a_i - b_i}{e_i} \right )^2 \times \exp \lbrace | m (a_0 - b_0) / e_0 | \rbrace}\]Important
This metric includes a penalty function (taken as exponential ) for the first coordinate (e.g. for a redshift). For no penalty, use instead
chi2Metric()
.Note
If one of the coordinates in error is 0, the \(\chi^2\) distance will diverge.
- Parameters
coord1 (ndarray) – first array of coordinates with the first column penalised
coord2 (
int
,float
or ndarray [float
]) – second array of coordinates with the first column penalisederror (ndarray [
float
]) – array of uncertainties (first column used in the penalty function). Must have the same shape as coord1. To provide no error, set no_error toTrue
.multFac (
int
orfloat
) – (Optional) multiplicative factor in the penalty function (see definition above)squared (
bool
) – (Optional) whether to return the square of the metric or notaxis (
int
) – (Optional) axis onto which to compute the sumno_error (
bool
) – (Optional) whether to use no error in the computation (i.e. Euclidian distance or not)
- Returns
euclidian distance estimated between the two sets of coordinates
- Return type
float
- Raises
TypeError – if
not isinstance(no_error, bool)
not isinstance(coord1, np.ndarray)
not isinstance(coord2, np.ndarray)
not isinstance(multFac, (int, float))
not isinstance(error, (np.ndarray, int, float))
not isinstance(squared, bool)
not isinstance(axis, int)
- SOMptimised.metric.euclidianMetric(coord1: numpy.ndarray, coord2: Union[int, float, numpy.ndarray], *args, squared: bool = False, axis: int = 1, **kwargs) float [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Provide the euclidian distance estimated between coord1 and coord2. The euclidian distance \(d\) between coordinates \(a = (a_i)\) and \(b = (b_i)\) is given by
\[d = \sqrt{\sum_i (a_i - b_i)^2}\]- Parameters
- Returns
euclidian distance estimated between the two sets of coordinates
- Return type
float
- Raises
TypeError – if
not isinstance(coord1, np.ndarray)
not isinstance(coord2, np.ndarray)
not isinstance(squared, bool)
not isinstance(axis, int)