API for the main som module#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

An optimised Self Organising Map which can write and read its values into and from an external file.

Most of the code comes from Riley Smith implementation found in sklearn-som python library. Original code from Riley Smith is always marked with '.. codeauthor:: Riley Smith'.

class SOMptimised.som.SOM(m: int = 3, n: int = 3, dim: int = 3, lr: SOMptimised.learning_rate.LearningStrategy = <SOMptimised.learning_rate.LinearLearningStrategy object>, sigma: SOMptimised.neighbourhood.NeighbourhoodStrategy = <SOMptimised.neighbourhood.ConstantRadiusStrategy object>, metric: callable = <function euclidianMetric>, max_iter: typing.Union[int, float] = 3000, random_state: typing.Optional[int] = None)[source]#

Bases: object

Code author: Riley Smith

Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.

The 2-D, rectangular grid self-organizing map class using Numpy.

Parameters
  • m (int) – (Optional) shape along dimension 0 (vertical) of the SOM

  • n (int) – (Optional) shape along dimesnion 1 (horizontal) of the SOM

  • dim (int) – (Optional) dimensionality (number of features) of the input space

  • lr (LearningStrategy) – (Optional) learning strategy used to update the SOM weights

  • sigma (NeighbourhoodStrategy) – (Optional) neighbourhood strategy used to compute the step applied to each weight.

  • max_iter (int or float) – (Optional) parameter to stop training if you reach this many interations

  • metric (callable) – (Optional) metric used to compute the distance between the train data and the neurons, and between the neurons and the test data

  • random_state (int) – (Optional) integer seed to the random number generator for weight initialization. This will be used to create a new instance of Numpy’s default random number generator (it will not call np.random.seed()). Specify an integer for deterministic results.

_compute_points_inertia(X: numpy.ndarray, *args, bmus_indices: Optional[Union[int, list, numpy.ndarray]] = None, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Compute the inertia for a set of points. Inertia defined as squared distance from point to closest cluster center (BMU).

Note

*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:

  • args must always be a collection of ndarray with shapes similar to that of X

  • **kwargs are keyword arguments which have no constraints on their type or shape

See the metric specific implementation for more details.

Parameters
  • X (ndarray) – input matrix (2D)

  • bmus_indices (int, list [int] or ndarray [int]) – (Optional) indices of the best matching units for all the points. If None, the bmus are computed.

  • metric (callable) – (Optional) metric to use. If None, the metric provided at init is used.

  • *args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

Returns

inertia for all the points

Return type

ndarray [float]

_find_bmu(x: numpy.ndarray, *args, metric: Optional[callable] = None, **kwargs) int[source]#

Code author: Riley Smith

Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.

Find the index of the best matching unit for the input vector x.

Parameters
  • x (ndarray) – input vector (1D)

  • metric (callable) – (Optional) metric to use. If None, the metric provided at init is used.

  • *args – additional arguments to pass to the metric. This must be a tuple or list of 1D ndarray with the same shape as x. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

Returns

index of the best matching unit

Return type

int

_find_bmus(X: numpy.ndarray, *args, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Find the indices of the best matching unit for the input matrix X.

Note

*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:

  • *args must always be a collection of ndarray with shapes similar to that of X

  • **kwargs are keyword arguments which have no constraints on their type or shape

See the metric specific implementation for more details.

Parameters
  • X (ndarray) – input matrix (2D)

  • metric (callable) – (Optional) metric to use. If None, the metric provided at init is used.

  • n_jobs (int) – (Optional) number of threads used to find the BMUs. This parameter is only used when using _find_bmus_bydata() method.

  • *args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

Returns

indices of the best matching units

Return type

ndarray [int]

_find_bmus_bydata(X: numpy.ndarray, *args, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Find the indices of the best matching unit for the input matrix X by looping through the data.

Note

*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:

  • *args must always be a collection of ndarray with shapes similar to that of X

  • **kwargs are keyword arguments which have no constraints on their type or shape

See the metric specific implementation for more details.

Parameters
  • X (ndarray) – input matrix (2D)

  • metric (callable) – (Optional) metric to use. If None, the metric provided at init is used.

  • n_jobs (int) – (Optional) number of threads used to find the BMUs

  • *args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

Returns

indices of the best matching units

Return type

ndarray [int]

_find_bmus_byweight(X: numpy.ndarray, *args, metric: Optional[callable] = None, **kwargs) numpy.ndarray[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Find the indices of the best matching unit for the input matrix X by looping through the weights.

Note

*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:

  • *args must always be a collection of ndarray with shapes similar to that of X

  • **kwargs are keyword arguments which have no constraints on their type or shape

See the metric specific implementation for more details.

Parameters
  • X (ndarray) – input matrix (2D)

  • metric (callable) – (Optional) metric to use. If None, the metric provided at init is used.

  • *args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

Returns

indices of the best matching units

Return type

ndarray [int]

_get_locations(m: int, n: int) numpy.ndarray[source]#

Code author: Riley Smith

Return the indices of an m by n array. Indices are returned as float to save time.

Parameters
  • m (int) – shape along dimension 0 (vertical) of the SOM

  • n (int) – shape along dimension 1 (horizontal) of the SOM

Returns

indices of the array

Return type

ndarray [float]

property cluster_centers_: numpy.ndarray#

Code author: Riley Smith

Give the coordinates of each cluster centre as an array of shape (m, n, dim).

Returns

cluster centres

Return type

ndarray [int]

fit(X: numpy.ndarray, *args, epochs: int = 1, shuffle: bool = True, n_jobs: int = 1, unnormalise_weights: bool = False, **kwargs) None[source]#

Code author: Riley Smith

Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.

Take data (a tensor of type float64) as input and fit the SOM to that data for the specified number of epochs.

Note

*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:

  • *args must always be a collection of ndarray with shapes similar to that of X

  • **kwargs are keyword arguments which have no constraints on their type or shape

See the metric specific implementation for more details.

Parameters
  • X (ndarray) – training data. Must have shape (n, self.dim) where n is the number of training samples.

  • epochs (int) – (Optional) number of times to loop through the training data when fitting

  • shuffle (bool) – (Optional) whether or not to randomize the order of train data when fitting. Can be seeded with np.random.seed() prior to calling fit() method.

  • n_jobs (int) – (Optional) number of threads used to find the BMUs at the end of the loop. This parameter is only used when using _find_bmus_bydata() method.

  • unnormalise_weights (bool) – whether to unnormalise weights or not

  • *args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

fit_predict(X: numpy.ndarray, *args, **kwargs) numpy.ndarray[source]#

Code author: Riley Smith

Convenience method for calling fit() followed by predict().

Warning

This method has not been updated accordingly with other updates. It may not work as expected.

Parameters
  • X (ndarray) – data of shape (n, self.dim). The data to fit and then predict.

  • *args – optional arguments for the fit() method

  • **kwargs – optional keyword arguments for the fit() method

Returns

ndarray of shape (n,). The index of the predicted cluster for each item in X (after fitting the SOM to the data in X).

Return type

ndarray [float]

fit_transform(X: numpy.ndarray, *args, **kwargs) numpy.ndarray[source]#

Code author: Riley Smith

Convenience method for calling fit() followed by transform(). Unlike in sklearn, this is not implemented more efficiently (the efficiency is the same as calling fit() directly followed by transform()).

Warning

This method has not been updated accordingly with other updates. It may not work as expected.

Parameters
  • X (ndarray) – data of shape (n, self.dim) where n is the number of samples

  • *args – optional arguments for the fit() method

  • **kwargs – optional keyword arguments for the fit() method

Returns

ndarray of shape (n, self.m*self.n). The Euclidean distance from each item in X to each cluster center.

Return type

ndarray[float]

get(param: str) numpy.ndarray[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Return the given physical parameters if it exists.

Parameters

param (str) – parameter to return

Returns

array of physical parameter value associated to each node

Return type

ndarray

Raises

KeyError – if param is not found

property inertia_: numpy.ndarray#

Code author: Riley Smith

Inertia.

Returns

computed inertia

Return type

ndarray [float]

Raises

AttributeError – if the SOM does not have the inertia already computed

property n_iter_: int#

Code author: Riley Smith

Number of iterations.

Returns

number of iterations

Return type

int

Rtype AttributeError

if the number of iterations is not initialised yet

predict(X: numpy.ndarray, *args, metric: Optional[callable] = None, n_jobs: int = 1, **kwargs) numpy.ndarray[source]#

Code author: Riley Smith

Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.

Predict cluster for each element in X.

Note

*args and **kwargs are additional arguments and keyword arguments which can be passed depending on the metric used. In this implementation:

  • *args must always be a collection of ndarray with shapes similar to that of X

  • **kwargs are keyword arguments which have no constraints on their type or shape

See the metric specific implementation for more details.

Parameters
  • X (ndarray) – training data. Must have shape (n, self.dim) where n is the number of training samples.

  • metric (callable) – (Optional) metric to use. If None, the metric provided at init is used.

  • n_jobs (int) – (Optional) number of threads used to find the BMUs. This parameter is only used when using _find_bmus_bydata() method.

  • *args – additional arguments to pass to the metric. These arguments are looped similarly to X, so they should be a collection of ndarray with the same shape. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

Returns

an ndarray of shape (n,). The predicted cluster index for each item in X.

Return type

ndarray [int]

Raises
  • NotImplmentedError – if fit() method has not been called already

  • ValueError

  • if X is not a 2-dimensional array

  • if the second dimension of X has not a length equal to self.dim

static read(fname: str, *args, **kwargs)[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Read the result of a SOM written into a binary file with the write() method.

Parameters
  • fname (str) – input file

  • *args – other arguments passed to pickle.load

Parma **kwargs

other keyword arguments passed to pickle.load

Returns

the loaded SOM object

Return type

SOM

Raises

TypeError – if fname is not of type str

set(param: str, value: numpy.ndarray) None[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Set the given physical parameter. Must be an array of shape (self.n*self.m,)

Parameters
  • param (str) – parameter to set

  • value (ndarray) – array with the values of the physical parameter to store

Raises

ValueError – if value is not a 1-dimensional array of length self.m*self.n

step(x: numpy.ndarray, counter: int, *args, **kwargs) None[source]#

Code author: Riley Smith

Modified by Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>.

Do one step of training on the given input vector.

Parameters
  • x (ndarray) – input vector (1D)

  • counter (int) – global counter used to compute the neighbourhood radius and the learning rate

  • *args – additional arguments to pass to the metric. This must be a tuple or list of 1D ndarray with the same shape as x. See the metric specific signature to know which parameters to pass.

  • **kwargs – additional keyword arguments to pass to the metric. See the metric specific signature to know which parameters to pass.

property train_bmus_: numpy.ndarray#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Best matching units indices for the train set.

Returns

BMUs indices for the train set

Return type

ndarray [int]

Rtype AttributeError

if the number of iterations is not initialised yet

transform(X: numpy.ndarray, *args, **kwargs) numpy.ndarray[source]#

Code author: Riley Smith

Transform the data X into cluster distance space.

Warning

This method has not been updated accordingly with other updates. It may not work as expected.

Parameters

X (ndarray) – training data. Must have shape (n, self.dim) where n is the number of training samples.

Returns

tansformed data of shape (n, self.n*self.m). The Euclidean distance from each item in X to each cluster center.

Return type

ndarray [float]

write(fname: str, *args, **kwargs) None[source]#

Code author: Wilfried Mercier - IRAP <wilfried.mercier@irap.omp.eu>

Write the result of the SOM into a binary file.

Parameters
  • fname (str) – output filename

  • *args – other arguments passed to pickle.dump

Parma **kwargs

other keyword arguments passed to pickle.dump

Raises

TypeError – if fname is not of type str