trw.hparams.interpret_params

Module Contents

Functions

is_discrete(values)

Test if a list of values is discrete or contiguous

median_by_category(categories, values)

Calculate the median for each categorical attribute

_plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)

scatter plot with optional named ticks (x, y) and median display (x, y)

_plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')

_plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)

discretize(values)

Map string to a int and record the mapping

analyse_hyperparameters(hprams_path_pattern, output_path, hparams_to_visualize=None, params_forest_n_estimators=5000, params_forest_max_features_ratio=0.6, top_k_covariance=5, create_graphs=True, verbose=True, dpi=300)

Importance hyper-pramaeter estimation using random forest regressors

Attributes

logger

trw.hparams.interpret_params.logger
trw.hparams.interpret_params.is_discrete(values)

Test if a list of values is discrete or contiguous :param values: the list to test :return: True if discrete, False else

trw.hparams.interpret_params.median_by_category(categories, values)

Calculate the median for each categorical attribute :param categories: the categories :param values: the values :return: list of tuple (category, median value)

trw.hparams.interpret_params._plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)

scatter plot with optional named ticks (x, y) and median display (x, y) :param plot_name: :param x_values: :param x_name: :param y_values: :param y_name: :param discrete_random_jitter: :param x_ticks: :param y_ticks: :param median_max_num: :return:

trw.hparams.interpret_params._plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')
trw.hparams.interpret_params._plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)
trw.hparams.interpret_params.discretize(values)

Map string to a int and record the mapping :param values: :return: (values, mapping)

trw.hparams.interpret_params.analyse_hyperparameters(hprams_path_pattern, output_path, hparams_to_visualize=None, params_forest_n_estimators=5000, params_forest_max_features_ratio=0.6, top_k_covariance=5, create_graphs=True, verbose=True, dpi=300)

Importance hyper-pramaeter estimation using random forest regressors

From simulation, the ordering of hyper-parameters importance is correct, but the importance value itself may be over-estimated (for the best param) and underestimated (for the others).

The scatter plot for each hparam is useful to understand in what direction the hyper-parameter should be modified

The covariance plot can be used to understand the relation between most important hyper-parameter

WARNING: [1] With correlated features, strong features can end up with low scores and the method can be biased towards variables with many categories. See for more details: see http://blog.datadive.net/selecting-good-features-part-iii-random-forests/ and https://link.springer.com/article/10.1186%2F1471-2105-8-25

Parameters
  • params_forest_n_estimators – number of trees used to estimate the loss from the hyperparameters

  • params_forest_max_features_ratio – the maximum number of features to be used. Note we don’t want to select all the features to limit the correlation importance decrease effect [1]

  • hprams_path_pattern – a pattern (globing) to be used to select the hyper parameter files

  • hparams_to_visualize – a list of hparam names to visualize or None. If None, display from the most important (i.e., causing the most loss variation) to the least

  • create_graphs – if True, export matplotlib visualizations

  • top_k_covariance – export the parameter covariance for the most important k hyper-parameters

  • output_path – where to export the graph

  • dpi – the resolution of the exported graph

  • verbose – if True, display additional information

Returns