trw.hparams.interpret_params
¶
Module Contents¶
Functions¶
|
Test if a list of values is discrete or contiguous |
|
Calculate the median for each categorical attribute |
|
scatter plot with optional named ticks (x, y) and median display (x, y) |
|
|
|
|
|
Map string to a int and record the mapping |
|
Importance hyper-pramaeter estimation using random forest regressors |
Attributes¶
- trw.hparams.interpret_params.logger¶
- trw.hparams.interpret_params.is_discrete(values)¶
Test if a list of values is discrete or contiguous :param values: the list to test :return: True if discrete, False else
- trw.hparams.interpret_params.median_by_category(categories, values)¶
Calculate the median for each categorical attribute :param categories: the categories :param values: the values :return: list of tuple (category, median value)
- trw.hparams.interpret_params._plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)¶
scatter plot with optional named ticks (x, y) and median display (x, y) :param plot_name: :param x_values: :param x_name: :param y_values: :param y_name: :param discrete_random_jitter: :param x_ticks: :param y_ticks: :param median_max_num: :return:
- trw.hparams.interpret_params._plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')¶
- trw.hparams.interpret_params._plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)¶
- trw.hparams.interpret_params.discretize(values)¶
Map string to a int and record the mapping :param values: :return: (values, mapping)
- trw.hparams.interpret_params.analyse_hyperparameters(hprams_path_pattern, output_path, hparams_to_visualize=None, params_forest_n_estimators=5000, params_forest_max_features_ratio=0.6, top_k_covariance=5, create_graphs=True, verbose=True, dpi=300)¶
Importance hyper-pramaeter estimation using random forest regressors
From simulation, the ordering of hyper-parameters importance is correct, but the importance value itself may be over-estimated (for the best param) and underestimated (for the others).
The scatter plot for each hparam is useful to understand in what direction the hyper-parameter should be modified
The covariance plot can be used to understand the relation between most important hyper-parameter
WARNING: [1] With correlated features, strong features can end up with low scores and the method can be biased towards variables with many categories. See for more details: see http://blog.datadive.net/selecting-good-features-part-iii-random-forests/ and https://link.springer.com/article/10.1186%2F1471-2105-8-25
- Parameters
params_forest_n_estimators – number of trees used to estimate the loss from the hyperparameters
params_forest_max_features_ratio – the maximum number of features to be used. Note we don’t want to select all the features to limit the correlation importance decrease effect [1]
hprams_path_pattern – a pattern (globing) to be used to select the hyper parameter files
hparams_to_visualize – a list of hparam names to visualize or None. If None, display from the most important (i.e., causing the most loss variation) to the least
create_graphs – if True, export matplotlib visualizations
top_k_covariance – export the parameter covariance for the most important k hyper-parameters
output_path – where to export the graph
dpi – the resolution of the exported graph
verbose – if True, display additional information
- Returns