trw.hparams.interpret_params

Module Contents

Functions

is_discrete(values)

Test if a list of values is discrete or contiguous

median_by_category(categories, values)

Calculate the median for each categorical attribute

_plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)

scatter plot with optional named ticks (x, y) and median display (x, y)

_plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')

_plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)

discretize(values)

Map string to a int and record the mapping

analyse_hyperparameters(run_results: List[trw.hparams.store.RunResult], output_path: str, loss_fn: Callable[[trw.hparams.store.Metrics], float] = lambda metrics: metrics['loss'], hparams_to_visualize: List[str] = None, params_forest_n_estimators: int = 5000, params_forest_max_features_ratio: float = 0.6, top_k_covariance: int = 5, create_graphs: bool = True, verbose: bool = True, dpi: int = 300) → Dict[str, List]

Importance hyper-parameter estimation using random forest regressors.

Attributes

logger

trw.hparams.interpret_params.logger
trw.hparams.interpret_params.is_discrete(values)

Test if a list of values is discrete or contiguous :param values: the list to test :return: True if discrete, False else

trw.hparams.interpret_params.median_by_category(categories, values)

Calculate the median for each categorical attribute :param categories: the categories :param values: the values :return: list of tuple (category, median value)

trw.hparams.interpret_params._plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)

scatter plot with optional named ticks (x, y) and median display (x, y) :param plot_name: :param x_values: :param x_name: :param y_values: :param y_name: :param discrete_random_jitter: :param x_ticks: :param y_ticks: :param median_max_num: :return:

trw.hparams.interpret_params._plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')
trw.hparams.interpret_params._plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)
trw.hparams.interpret_params.discretize(values)

Map string to a int and record the mapping :param values: :return: (values, mapping)

trw.hparams.interpret_params.analyse_hyperparameters(run_results: List[trw.hparams.store.RunResult], output_path: str, loss_fn: Callable[[trw.hparams.store.Metrics], float] = lambda metrics: ..., hparams_to_visualize: List[str] = None, params_forest_n_estimators: int = 5000, params_forest_max_features_ratio: float = 0.6, top_k_covariance: int = 5, create_graphs: bool = True, verbose: bool = True, dpi: int = 300) Dict[str, List]

Importance hyper-parameter estimation using random forest regressors.

From simulation, the ordering of hyper-parameters importance is correct, but the importance value itself may be over-estimated (for the best param) and underestimated (for the others).

The scatter plot for each hyper parameter is useful to understand in what direction the hyper-parameter should be modified.

The covariance plot can be used to understand the relation between most important hyper-parameter.

Warning

With correlated features, strong features can end up with low scores and the method can be biased towards variables with many categories. See for more details 1, 2.

1(1,2)

http://blog.datadive.net/selecting-good-features-part-iii-random-forests/

2

https://link.springer.com/article/10.1186%2F1471-2105-8-25

Parameters
  • run_results – a list of runs

  • output_path – where to export the graphs

  • loss_fn – a function to extract a single value (loss) from a list of metrics

  • hparams_to_visualize – a list of parameters (string) to visualize

  • params_forest_n_estimators – number of trees used to estimate the loss from the hyperparameters

  • params_forest_max_features_ratio – the maximum number of features to be used. Note we don’t want to select all the features to limit the correlation importance decrease effect 1

  • top_k_covariance – export the parameter covariance for the most important k hyper-parameters

  • create_graphs – if True, export matplotlib visualizations

  • verbose – if True, display additional information

  • dpi – the resolution of the exported graph

Returns

2 lists representing the hyper parameter name and importance