Performance
ICCP
contains a module called Performance
which contains all functionalities to assess the performance of agents and also easily search through the logs. This functionality of ICCP
makes working with agents much easier and faster. One can easily iterate to optimize its agents performance and also compare it to other agents.
To understand why and how the performance assessment was defined this way, please read the background information on Performance
.
Class methods and definitions
Utility functions
To understand why and how utility functions are used, we refer to the background information on it.
cumulative_reward
def cumulative_reward(data: pd.DataFrame) -> float:
""" Given a dataframe containing a reward column, computes cumulative reward"""
negative_cumulative_heating
def negative_cumulative_heating(data: pd.DataFrame) -> float:
""" Given a dataframe containing a reward column, computes cumulative reward"""
constant_utility
def constant_utility(data: pd.DataFrame) -> float:
""" Constant utility function, to be used when there is no preference between two sessions"""
Statistics computation
IQR
def IQR(arr: np.ndarray) -> float:
""" Computes the inter-quartile range of an array"""
CVaR
def CVaR(arr: np.ndarray, alpha: float = 0.05) -> float:
""" Computes the conditional value at risk of an array with risk threshold alpha"""
Across time (during training)
Definitions of methods to evaluate in-training risk and dispersion of a single run.
compute_dispersion_across_time
def compute_dispersion_across_time(
data: pd.DataFrame, column_name: str, window: int
) -> float:
"""We define the measure of dispersion across a training run as the mean of the
rolling inter-quartile range.
Args:
data (pd.DataFrame): DataFrame containing the specified column
column_name (str): column name over which dispersion should be computed
window (int): window length of the rolling inter-quartile range.
Returns:
float: the measure of dispersion across time
"""
compute_risk_across_time
def compute_risk_across_time(
data: pd.DataFrame, column_name: str, alpha: float, window: int
) -> float:
"""We define the measure of risk across a training run as the mean of the
rolling conditional value at risk.
Args:
data (pd.DataFrame): DataFrame containing the specified column
column_name (str): column name over which dispersion should be computed
window (int): window length of the rolling inter-quartile range.
Returns:
float: the measure of risk across time
"""
across_time
def across_time(
data: pd.DataFrame,
window: int = 3 * 6,
column_name: str = "Reward",
alpha: float = 0.05,
) -> Tuple[float, float]:
"""Given a dataframe summarizing a training session, computes the mean risk and mean dispersion of the specified column
over a specified time window.
Why do we use a window?
It is such that the computed risk and dispersion still make sense in the context of real-life usage. Indeed, a training session
may last over 5 months of simulated data but we should assess the behaviour of the controller over the span of a day or hours
as it is the time span effectively experienced by the human users. In this case, having a window = 3*6 spans 3 hours because the
Simple Environment does 6 steps per hour.
Args:
data (pd.DataFrame): Summary DataFrame of the training session.
window (int, optional): Time window over which dispersion and risk are computed. Defaults to 3*6.
column_name (str, optional): Name of the column over which we compute. Defaults to "Reward".
alpha (float, optional): Risk threshold. Defaults to 0.05.
Returns:
Tuple[float, float]: (dispersion, risk)
"""
Across runs (during training)
Definitions of methods to evaluate in-training risk and dispersion of multiple runs.
across_runs
def across_runs(
agent: Agent,
agent_config_path: str, # absolute path to the logging of the agent such that its configuration can be loaded
parameter: Tuple[str, List[Any]],
num_episodes: int,
num_iterations: int = None,
column_names: List[str] = [
"Reward",
"Tset",
], # list of the columns in summary df on which we wish to measure risk and dispersion
utility_function: Callable[[pd.DataFrame], float] = cumulative_reward,
window: int = 3 * 6, # window to compute iqr and cvar across time
alpha=0.05,
) -> Dict[str, Any]:
"""Given an agent initilaized with its environment, the absolute path to the configuration / log we wish to assess,
loads the correct configuration on the agent.
Then, given the parameter name that we wish to vary and the list of the different values it will take,
it iterates over all the possible new configurations and:
- compute the dispersion and risk of the specified column names for each training run / session (by using across_time())
- compute the dispersion and risk of the utility function over all training runs / sessions
Args:
agent (Agent): Agent to be assessed, only needs to be instantiated with the correct environment
agent_config_path (str): Absolute path to the configuration / log we wish to assess
num_episodes (int): number of episodes of each training session
num_iterations (int, optional): number of iterations of each training session. Defaults to None.
column_names (List[str], optional): Name of the columns over which we will compute dispersion and risk at each training session.
Defaults to [ "Reward", "Tset", ].
utility_function (Callable[[pd.DataFrame], float], optional): Utility function to assess performance of a training session.
Defaults to cumulative_reward.
window (int, optional): Time window over which dispersion and risk are computed. Defaults to 3*6.
alpha (float, optional): Risk threshold. Defaults to 0.05.
Returns:
(Dict[str,Any]): dictionary summarizing all results of the performance assessment.
"""
Across fixed policy (after training)
Definitions of methods to evaluate after training risk and dispersion of multiple runs.
across_fixed_policy
def across_fixed_policy(
agent: Agent,
agent_config_path: str, # absolute path to the logging of the agent such that its configuration can be loaded
num_testing: int,
num_episodes: int,
num_iterations: int = None,
column_names: List[str] = [
"Reward",
"Tset",
], # list of the columns in summary df on which we wish to measure risk and dispersion
utility_function: Callable[[pd.DataFrame], float] = cumulative_reward,
window: int = 3 * 6, # window to compute iqr and cvar across time
alpha=0.05,
) -> Dict[str,Any]:
"""Given an agent initilaized with its environment, the absolute path to the configuration / log we wish to assess,
loads the correct configuration on the agent.
It will test the fixed policy num_testing times and:
- compute the dispersion and risk of the specified column names for each testing run / session (by using across_time())
- compute the dispersion and risk of the utility function over all testing runs / sessions
Args:
agent (Agent): Agent to be assessed, only needs to be instantiated with the correct environment
agent_config_path (str): Absolute path to the configuration / log we wish to assess
num_testing(int): The number of times the fixed policy will be tested
num_episodes (int): number of episodes of each training session
num_iterations (int, optional): number of iterations of each testing session. Defaults to None.
column_names (List[str], optional): Name of the columns over which we will compute dispersion and risk at each testing session.
Defaults to [ "Reward", "Tset", ].
utility_function (Callable[[pd.DataFrame], float], optional): Utility function to assess performance of a training session.
Defaults to cumulative_reward.
window (int, optional): Time window over which dispersion and risk are computed. Defaults to 3*6.
alpha (float, optional): Risk threshold. Defaults to 0.05.
Returns:
(Dict[str,Any]): dictionary summarizing all results of the performance assessment.
"""
Useful methods
- How to use
search_paths
is explained in Searching through logs.
- How to use
all_combinations_list
andsearch_similar
is explained in Hyperparameter tuning.
- How to use
load_trained_agent
,load_json_params
, andload_summary_df
is explained in Loading from logs.