API for the main som module
API for the main som module#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
An optimised Self Organising Map which can write and read its values into and from an external file.
Most of the code comes from Riley Smith implementation found in sklearn-som python library. Original code from Riley Smith is always marked with '.. codeauthor:: Riley Smith'
.
- class SOMptimised.som.SOM(m: int = 3, n: int = 3, dim: int = 3, lr: SOMptimised.learning_rate.LearningStrategy = <SOMptimised.learning_rate.LinearLearningStrategy object>, sigma: SOMptimised.neighbourhood.NeighbourhoodStrategy = <SOMptimised.neighbourhood.ConstantRadiusStrategy object>, metric: callable = <function euclidianMetric>, max_iter: typing.Union[int, float] = 3000, random_state: typing.Optional[int] = None)[source]#
Bases:
object
Code author: Riley Smith
Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.
The 2-D, rectangular grid self-organizing map class using Numpy.
- Parameters
m (
int
) – (Optional) shape along dimension 0 (vertical) of the SOMn (
int
) – (Optional) shape along dimesnion 1 (horizontal) of the SOMdim (
int
) – (Optional) dimensionality (number of features) of the input spacelr (
LearningStrategy
) – (Optional) learning strategy used to update the SOM weightssigma (
NeighbourhoodStrategy
) – (Optional) neighbourhood strategy used to compute the step applied to each weight.max_iter (
int
orfloat
) – (Optional) parameter to stop training if you reach this many interationsmetric (
callable
) – (Optional) metric used to compute the distance between the train data and the neurons, and between the neurons and the test datarandom_state (
int
) – (Optional) integer seed to the random number generator for weight initialization. This will be used to create a new instance of Numpy’s default random number generator (it will not call np.random.seed()). Specify an integer for deterministic results.
- _compute_points_inertia(X: numpy.ndarray, *args, bmus_indices: Optional[Union[int, list, numpy.ndarray]] = None, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Compute the inertia for a set of points. Inertia defined as squared distance from point to closest cluster center (BMU).
Note
*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:
args must always be a collection of ndarray with shapes similar to that of X
**kwargs are keyword arguments which have no constraints on their type or shape
See the metric specific implementation for more details.
- Parameters
X (ndarray) – input matrix (2D)
bmus_indices (
int
,list
[int
] or ndarray [int
]) – (Optional) indices of the best matching units for all the points. IfNone
, the bmus are computed.metric (
callable
) – (Optional) metric to use. If None, the metric provided at init is used.*args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- Returns
inertia for all the points
- Return type
ndarray [
float
]
- _find_bmu(x: numpy.ndarray, *args, metric: Optional[callable] = None, **kwargs) int [source]#
Code author: Riley Smith
Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.
Find the index of the best matching unit for the input vector x.
- Parameters
x (ndarray) – input vector (1D)
metric (
callable
) – (Optional) metric to use. If None, the metric provided at init is used.*args – additional arguments to pass to the metric. This must be a tuple or list of 1D ndarray with the same shape as x. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- Returns
index of the best matching unit
- Return type
int
- _find_bmus(X: numpy.ndarray, *args, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Find the indices of the best matching unit for the input matrix X.
Note
*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:
*args must always be a collection of ndarray with shapes similar to that of X
**kwargs are keyword arguments which have no constraints on their type or shape
See the metric specific implementation for more details.
- Parameters
X (ndarray) – input matrix (2D)
metric (
callable
) – (Optional) metric to use. If None, the metric provided at init is used.n_jobs (
int
) – (Optional) number of threads used to find the BMUs. This parameter is only used when using_find_bmus_bydata()
method.*args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- Returns
indices of the best matching units
- Return type
ndarray [
int
]
- _find_bmus_bydata(X: numpy.ndarray, *args, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Find the indices of the best matching unit for the input matrix X by looping through the data.
Note
*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:
*args must always be a collection of ndarray with shapes similar to that of X
**kwargs are keyword arguments which have no constraints on their type or shape
See the metric specific implementation for more details.
- Parameters
X (ndarray) – input matrix (2D)
metric (
callable
) – (Optional) metric to use. If None, the metric provided at init is used.n_jobs (
int
) – (Optional) number of threads used to find the BMUs*args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- Returns
indices of the best matching units
- Return type
ndarray [
int
]
- _find_bmus_byweight(X: numpy.ndarray, *args, metric: Optional[callable] = None, **kwargs) numpy.ndarray [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Find the indices of the best matching unit for the input matrix X by looping through the weights.
Note
*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:
*args must always be a collection of ndarray with shapes similar to that of X
**kwargs are keyword arguments which have no constraints on their type or shape
See the metric specific implementation for more details.
- Parameters
X (ndarray) – input matrix (2D)
metric (
callable
) – (Optional) metric to use. If None, the metric provided at init is used.*args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- Returns
indices of the best matching units
- Return type
ndarray [
int
]
- _get_locations(m: int, n: int) numpy.ndarray [source]#
Code author: Riley Smith
Return the indices of an m by n array. Indices are returned as float to save time.
- Parameters
m (
int
) – shape along dimension 0 (vertical) of the SOMn (
int
) – shape along dimension 1 (horizontal) of the SOM
- Returns
indices of the array
- Return type
ndarray [
float
]
- property cluster_centers_: numpy.ndarray#
Code author: Riley Smith
Give the coordinates of each cluster centre as an array of shape (m, n, dim).
- Returns
cluster centres
- Return type
ndarray [
int
]
- fit(X: numpy.ndarray, *args, epochs: int = 1, shuffle: bool = True, n_jobs: int = 1, unnormalise_weights: bool = False, **kwargs) None [source]#
Code author: Riley Smith
Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.
Take data (a tensor of type float64) as input and fit the SOM to that data for the specified number of epochs.
Note
*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:
*args must always be a collection of ndarray with shapes similar to that of X
**kwargs are keyword arguments which have no constraints on their type or shape
See the metric specific implementation for more details.
- Parameters
X (ndarray) – training data. Must have shape (n, self.dim) where n is the number of training samples.
epochs (
int
) – (Optional) number of times to loop through the training data when fittingshuffle (
bool
) – (Optional) whether or not to randomize the order of train data when fitting. Can be seeded with np.random.seed() prior to callingfit()
method.n_jobs (
int
) – (Optional) number of threads used to find the BMUs at the end of the loop. This parameter is only used when using_find_bmus_bydata()
method.unnormalise_weights (
bool
) – whether to unnormalise weights or not*args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- fit_predict(X: numpy.ndarray, *args, **kwargs) numpy.ndarray [source]#
Code author: Riley Smith
Convenience method for calling
fit()
followed bypredict()
.Warning
This method has not been updated accordingly with other updates. It may not work as expected.
- Parameters
- Returns
ndarray of shape (n,). The index of the predicted cluster for each item in X (after fitting the SOM to the data in X).
- Return type
ndarray [
float
]
- fit_transform(X: numpy.ndarray, *args, **kwargs) numpy.ndarray [source]#
Code author: Riley Smith
Convenience method for calling
fit()
followed bytransform()
. Unlike in sklearn, this is not implemented more efficiently (the efficiency is the same as callingfit()
directly followed bytransform()
).Warning
This method has not been updated accordingly with other updates. It may not work as expected.
- Parameters
- Returns
ndarray of shape (n, self.m*self.n). The Euclidean distance from each item in X to each cluster center.
- Return type
ndarray[
float
]
- get(param: str) numpy.ndarray [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Return the given physical parameters if it exists.
- Parameters
param (
str
) – parameter to return- Returns
array of physical parameter value associated to each node
- Return type
- Raises
KeyError – if param is not found
- property inertia_: numpy.ndarray#
Code author: Riley Smith
Inertia.
- Returns
computed inertia
- Return type
ndarray [
float
]- Raises
AttributeError – if the SOM does not have the inertia already computed
- property n_iter_: int#
Code author: Riley Smith
Number of iterations.
- Returns
number of iterations
- Return type
int
- Rtype AttributeError
if the number of iterations is not initialised yet
- predict(X: numpy.ndarray, *args, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray [source]#
Code author: Riley Smith
Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.
Predict cluster for each element in X.
Note
*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:
*args must always be a collection of ndarray with shapes similar to that of X
**kwargs are keyword arguments which have no constraints on their type or shape
See the metric specific implementation for more details.
- Parameters
X (ndarray) – training data. Must have shape (n, self.dim) where n is the number of training samples.
metric (
callable
) – (Optional) metric to use. If None, the metric provided at init is used.n_jobs (
int
) – (Optional) number of threads used to find the BMUs. This parameter is only used when using_find_bmus_bydata()
method.*args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- Returns
an ndarray of shape (n,). The predicted cluster index for each item in X.
- Return type
ndarray [
int
]- Raises
NotImplmentedError – if
fit()
method has not been called alreadyValueError –
if X is not a 2-dimensional array
if the second dimension of X has not a length equal to self.dim
- static read(fname: str, *args, **kwargs)[source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Read the result of a SOM written into a binary file with the
write()
method.- Parameters
fname (
str
) – input file*args – other arguments passed to pickle.load
- Parma **kwargs
other keyword arguments passed to pickle.load
- Returns
the loaded SOM object
- Return type
- Raises
TypeError – if fname is not of type
str
- set(param: str, value: numpy.ndarray) None [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Set the given physical parameter. Must be an array of shape (self.n*self.m,)
- Parameters
param (
str
) – parameter to setvalue (ndarray) – array with the values of the physical parameter to store
- Raises
ValueError – if value is not a 1-dimensional array of length self.m*self.n
- step(x: numpy.ndarray, counter: int, *args, **kwargs) None [source]#
Code author: Riley Smith
Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.
Do one step of training on the given input vector.
- Parameters
x (ndarray) – input vector (1D)
counter (
int
) – global counter used to compute the neighbourhood radius and the learning rate*args – additional arguments to pass to the metric. This must be a tuple or list of 1D ndarray with the same shape as x. See the metric specific signature to know which parameters to pass.
**kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.
- property train_bmus_: numpy.ndarray#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Best matching units indices for the train set.
- Returns
BMUs indices for the train set
- Return type
ndarray [
int
]- Rtype AttributeError
if the number of iterations is not initialised yet
- transform(X: numpy.ndarray, *args, **kwargs) numpy.ndarray [source]#
Code author: Riley Smith
Transform the data X into cluster distance space.
Warning
This method has not been updated accordingly with other updates. It may not work as expected.
- write(fname: str, *args, **kwargs) None [source]#
Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>
Write the result of the SOM into a binary file.
- Parameters
fname (
str
) – output filename*args – other arguments passed to pickle.dump
- Parma **kwargs
other keyword arguments passed to pickle.dump
- Raises
TypeError – if fname is not of type
str