trw.hparams.interpret_params
¶
Module Contents¶
Functions¶
|
Test if a list of values is discrete or contiguous |
|
Calculate the median for each categorical attribute |
|
scatter plot with optional named ticks (x, y) and median display (x, y) |
|
|
|
|
|
Map string to a int and record the mapping |
|
Importance hyper-parameter estimation using random forest regressors. |
Attributes¶
- trw.hparams.interpret_params.logger¶
- trw.hparams.interpret_params.is_discrete(values)¶
Test if a list of values is discrete or contiguous :param values: the list to test :return: True if discrete, False else
- trw.hparams.interpret_params.median_by_category(categories, values)¶
Calculate the median for each categorical attribute :param categories: the categories :param values: the values :return: list of tuple (category, median value)
- trw.hparams.interpret_params._plot_scatter(plot_name, x_values, x_name, y_values, y_name, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None, median_max_num=20)¶
scatter plot with optional named ticks (x, y) and median display (x, y) :param plot_name: :param x_values: :param x_name: :param y_values: :param y_name: :param discrete_random_jitter: :param x_ticks: :param y_ticks: :param median_max_num: :return:
- trw.hparams.interpret_params._plot_importance(plot_name, x_names, y_values, y_name, y_errors=None, x_name='hyper-parameters')¶
- trw.hparams.interpret_params._plot_param_covariance(plot_name, x_name, x_values, y_name, y_values, xy_values, discrete_random_jitter=0.2, x_ticks=None, y_ticks=None)¶
- trw.hparams.interpret_params.discretize(values)¶
Map string to a int and record the mapping :param values: :return: (values, mapping)
- trw.hparams.interpret_params.analyse_hyperparameters(run_results: List[trw.hparams.store.RunResult], output_path: str, loss_fn: Callable[[trw.hparams.store.Metrics], float] = lambda metrics: ..., hparams_to_visualize: List[str] = None, params_forest_n_estimators: int = 5000, params_forest_max_features_ratio: float = 0.6, top_k_covariance: int = 5, create_graphs: bool = True, verbose: bool = True, dpi: int = 300) Dict[str, List] ¶
Importance hyper-parameter estimation using random forest regressors.
From simulation, the ordering of hyper-parameters importance is correct, but the importance value itself may be over-estimated (for the best param) and underestimated (for the others).
The scatter plot for each hyper parameter is useful to understand in what direction the hyper-parameter should be modified.
The covariance plot can be used to understand the relation between most important hyper-parameter.
Warning
With correlated features, strong features can end up with low scores and the method can be biased towards variables with many categories. See for more details 1, 2.
- Parameters
run_results – a list of runs
output_path – where to export the graphs
loss_fn – a function to extract a single value (loss) from a list of metrics
hparams_to_visualize – a list of parameters (string) to visualize
params_forest_n_estimators – number of trees used to estimate the loss from the hyperparameters
params_forest_max_features_ratio – the maximum number of features to be used. Note we don’t want to select all the features to limit the correlation importance decrease effect 1
top_k_covariance – export the parameter covariance for the most important k hyper-parameters
create_graphs – if True, export matplotlib visualizations
verbose – if True, display additional information
dpi – the resolution of the exported graph
- Returns
2 lists representing the hyper parameter name and importance