`trw.hparams.interpret_params`¶

Module Contents¶

Functions¶

`is_discrete`(values)	Test if a list of values is discrete or contiguous
`median_by_category`(categories, values)	Calculate the median for each categorical attribute
`_plot_scatter`(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)	scatter plot with optional named ticks (x, y) and median display (x, y)
`_plot_importance`(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')
`_plot_param_covariance`(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)
`discretize`(values)	Map string to a int and record the mapping
`analyse_hyperparameters`(run_results: List[trw.hparams.store.RunResult], output_path: str, loss_fn: Callable[[trw.hparams.store.Metrics], float] = lambda metrics: metrics['loss'], hparams_to_visualize: List[str] = None, params_forest_n_estimators: int = 5000, params_forest_max_features_ratio: float = 0.6, top_k_covariance: int = 5, create_graphs: bool = True, verbose: bool = True, dpi: int = 300) → Dict[str, List]	Importance hyper-parameter estimation using random forest regressors.

Attributes¶

logger

trw.hparams.interpret_params.logger¶

trw.hparams.interpret_params.is_discrete(values)¶: Test if a list of values is discrete or contiguous :param values: the list to test :return: True if discrete, False else

trw.hparams.interpret_params.median_by_category(categories, values)¶: Calculate the median for each categorical attribute :param categories: the categories :param values: the values :return: list of tuple (category, median value)

trw.hparams.interpret_params._plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)¶: scatter plot with optional named ticks (x, y) and median display (x, y) :param plot_name: :param x_values: :param x_name: :param y_values: :param y_name: :param discrete_random_jitter: :param x_ticks: :param y_ticks: :param median_max_num: :return:

trw.hparams.interpret_params._plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')¶

trw.hparams.interpret_params._plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)¶

trw.hparams.interpret_params.discretize(values)¶: Map string to a int and record the mapping :param values: :return: (values, mapping)

trw.hparams.interpret_params.analyse_hyperparameters(run_results: List[trw.hparams.store.RunResult], output_path: str, loss_fn: Callable[[trw.hparams.store.Metrics], float] = lambda metrics: ..., hparams_to_visualize: List[str] = None, params_forest_n_estimators: int = 5000, params_forest_max_features_ratio: float = 0.6, top_k_covariance: int = 5, create_graphs: bool = True, verbose: bool = True, dpi: int = 300) → Dict[str, List]¶

Importance hyper-parameter estimation using random forest regressors.

From simulation, the ordering of hyper-parameters importance is correct, but the importance value itself may be over-estimated (for the best param) and underestimated (for the others).

The scatter plot for each hyper parameter is useful to understand in what direction the hyper-parameter should be modified.

The covariance plot can be used to understand the relation between most important hyper-parameter.

Warning

With correlated features, strong features can end up with low scores and the method can be biased towards variables with many categories. See for more details 1, 2.

1(1,2): http://blog.datadive.net/selecting-good-features-part-iii-random-forests/
2: https://link.springer.com/article/10.1186%2F1471-2105-8-25

Parameters

run_results – a list of runs
output_path – where to export the graphs
loss_fn – a function to extract a single value (loss) from a list of metrics
hparams_to_visualize – a list of parameters (string) to visualize
params_forest_n_estimators – number of trees used to estimate the loss from the hyperparameters
params_forest_max_features_ratio – the maximum number of features to be used. Note we don’t want to select all the features to limit the correlation importance decrease effect 1
top_k_covariance – export the parameter covariance for the most important k hyper-parameters
create_graphs – if True, export matplotlib visualizations
verbose – if True, display additional information
dpi – the resolution of the exported graph

Returns

2 lists representing the hyper parameter name and importance

trw.hparams.interpret_params¶

Module Contents¶

Functions¶

Attributes¶

`trw.hparams.interpret_params`¶