Using the make_hills() function

Using the `make_hills()` function#

The hillmaker.hills.make_hills function is the gateway to hillmaker and is used by the CLI, the object oriented API, or on its own to launch the hillmaking process. It has numerous input arguments for customizing how hillmaker works. In this tutorial we will describe all of the input arguments and discuss their use. This same information applies to CLI as well as the object oriented API’s Scenario.make_hills method.

import pandas as pd
import hillmaker as hm

ssu_stopdata = 'https://raw.githubusercontent.com/misken/hillmaker-examples/main/data/ssu_2024.csv'
# ssu_stopdata = './data/ssu_2024.csv'
ssu_stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
ssu_stops_df.info() # Check out the structure of the resulting DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   PatID      59877 non-null  int64         
 1   InRoomTS   59877 non-null  datetime64[ns]
 2   OutRoomTS  59877 non-null  datetime64[ns]
 3   PatType    59877 non-null  object        
 4   LOS_hours  59877 non-null  float64       
dtypes: datetime64[ns](2), float64(1), int64(1), object(1)
memory usage: 2.3+ MB

help(hm.make_hills)

Help on function make_hills in module hillmaker.legacy:

make_hills(scenario_name: str = None, data: str | pathlib.Path | pandas.core.frame.DataFrame = None, in_field: str = None, out_field: str = None, start_analysis_dt: str | datetime.date | datetime.datetime | pandas._libs.tslibs.timestamps.Timestamp | numpy.datetime64 = None, end_analysis_dt: str | datetime.date | datetime.datetime | pandas._libs.tslibs.timestamps.Timestamp | numpy.datetime64 = None, cat_field: str = None, bin_size_minutes: int = 60, export_bydatetime_csv: bool = True, export_summaries_csv: bool = True, csv_export_path: str | pathlib.Path = PosixPath('.'), make_all_dow_plots: bool = True, make_all_week_plots: bool = True, export_all_dow_plots: bool = False, export_all_week_plots: bool = False, plot_export_path: str | pathlib.Path = PosixPath('.'), **kwargs) -> Dict
    Compute occupancy, arrival, and departure statistics by category, time bin of day and day of week.
    
    Main function that first calls `bydatetime.make_bydatetime` to calculate occupancy, arrival
    and departure values by date by time bin and then calls `summarize.summarize`
    to compute the summary statistics.
    
    Parameters
    ----------
    scenario_name : str
        Used in output filenames
    data : str, Path, or DataFrame
        Base data containing one row per visit. If Path-like, data is read into a DataFrame.
    in_field : str
        Column name corresponding to the arrival times
    out_field : str
        Column name corresponding to the departure times
    start_analysis_dt : datetime-like, str
        Starting datetime for the analysis (must be convertible to pandas Timestamp)
    end_analysis_dt : datetime-like, str
        Ending datetime for the analysis (must be convertible to pandas Timestamp)
    cat_field : str, optional
        Column name corresponding to the categories. If none is specified, then only overall occupancy is summarized.
        Default is None
    bin_size_minutes : int, optional
        Number of minutes in each time bin of the day, default is 60. This bin size is used for plots and reporting and
        is an aggregation of computations done at the finer bin size resolution specified by `resolution_bin_size_mins`.
        Use a value that divides into 1440 with no remainder.
    cats_to_exclude : list, optional
        Category values to ignore, default is None
    occ_weight_field : str, optional
        Column name corresponding to the weights to use for occupancy incrementing, default is None
        which corresponds to a weight of 1.0
    percentiles : list or tuple of floats (e.g. [0.5, 0.75, 0.95]), optional
        Which percentiles to compute. Default is (0.25, 0.5, 0.75, 0.95, 0.99)
    los_units : str, optional
        The time units for length of stay analysis.
        See https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html for allowable values (smallest
        value allowed is 'seconds', largest is 'days'). The default is 'hours'.
    
    export_bydatetime_csv : bool, optional
       If True, bydatetime DataFrames are exported to csv files. Default is False.
    export_summaries_csv : bool, optional
       If True, summary DataFrames are exported to csv files. Default is False.
    csv_export_path : str or Path, optional
        Destination path for exported csv files, default is the current directory
    
    make_all_dow_plots : bool, optional
       If True, day of week plots are created for occupancy, arrival, and departure. Default is True.
    make_all_week_plots : bool, optional
       If True, full week plots are created for occupancy, arrival, and departure. Default is True.
    export_all_dow_plots : bool, optional
       If True, day of week plots are exported for occupancy, arrival, and departure. Default is False.
    export_all_week_plots : bool, optional
       If True, full week plots are exported for occupancy, arrival, and departure. Default is False.
    plot_export_path : str or Path, optional, default is the current directory
        Destination path for exported png files, default is the current directory
    
    plot_style : str, optional
        Matplotlib built in style name. Default is 'ggplot'.
    figsize : Tuple, optional
        Figure size. Default is (15, 10)
    bar_color_mean : str, optional
        Matplotlib color name for the bars representing mean values. Default is 'steelblue'
    alpha : float, optional
        Transparency level for bars. Default = 0.5.
    plot_percentiles : list or tuple of floats (e.g. [0.75, 0.95]), optional
        Which percentiles to plot. Default is (0.95)
    pctile_color : list or tuple of color codes (e.g. ['blue', 'green'] or list('gb'), optional
        Line color for each percentile series plotted. Order should match order of percentiles list.
        Default is ('black', 'grey').
    pctile_linestyle : List or tuple of line styles (e.g. ['-', '--']), optional
        Line style for each percentile series plotted. Default is ('-', '--').
    pctile_linewidth : List or tuple of line widths in points (e.g. [1.0, 0.75])
        Line width for each percentile series plotted. Default is (0.75, 0.75).
    cap : int, optional
        Capacity of area being analyzed, default is None
    cap_color : str, optional
        matplotlib color code, default='r'
    xlabel : str, optional
        x-axis label, default='Hour'
    ylabel : str, optional
        y-axis label, default='Patients'
    main_title : str, optional
        Main title for plot, default = '{Occupancy or Arrivals or Departures} by time of day and day of week'
    main_title_properties : None or dict, optional
        Dict of main title properties, default={{'loc': 'left', 'fontsize': 16}}
    subtitle : str, optional
        subtitle for plot, default = 'Scenario: {scenario_name}'
    subtitle_properties : None or dict, optional
        Dict of subtitle properties, default={'loc': 'left', 'style': 'italic'}
    legend_properties : None or dict, optional
        Dict of legend properties, default={{'loc': 'best', 'frameon': True, 'facecolor': 'w'}}
    first_dow : str, optional
        Controls which day of week appears first in plot. One of 'mon', 'tue', 'wed', 'thu', 'fri', 'sat, 'sun'
    
    edge_bins: int, default 1
        Occupancy contribution method for arrival and departure bins. 1=fractional, 2=entire bin
    highres_bin_size_minutes : int, optional
        Number of minutes in each time bin of the day used for initial computation of the number of arrivals,
        departures, and the occupancy level. This value should be <= `bin_size_minutes`. The shorter the duration of
        stays, the smaller the resolution should be. The current default is 5 minutes.
    keep_highres_bydatetime : bool, optional
        Save the high resolution bydatetime dataframe in hills attribute. Default is False.
    nonstationary_stats : bool, optional
       If True, datetime bin stats are computed. Else, they aren't computed. Default is True
    stationary_stats : bool, optional
       If True, overall, non-time bin dependent, stats are computed. Else, they aren't computed. Default is True
    verbosity : int, optional
        Used to set level in loggers. 0=logging.WARNING (default=0), 1=logging.INFO, 2=logging.DEBUG
    
    
    Returns
    -------
    dict of DataFrames and plots
       The bydatetime DataFrames, all summary DataFrames and any plots created.
    
    Example
    -------
    Use like this::
    
        # Required inputs
        scenario_name = 'ssu_summer24'
        in_field_name = 'InRoomTS'
        out_field_name = 'OutRoomTS'
        start_date = '2024-06-01'
        end_date = '2024-08-31'
    
        # Optional inputs
        cat_field_name = 'PatType'
        bin_size_minutes = 30
        csv_export_path = './output'
    
        # Optional plotting inputs
        plot_export_path = './output'
        plot_style = 'default'
        bar_color_mean = 'grey'
        percentiles = [0.85, 0.95]
        plot_percentiles = [0.95, 0.85]
        pctile_color = ['blue', 'green']
        pctile_linewidth = [0.8, 1.0]
        cap = 110
        cap_color = 'black'
        main_title = 'Occupancy summary'
        main_title_properties = {'loc': 'center', 'fontsize':20}
        subtitle = 'Summer 2024 analysis'
        subtitle_properties = {'loc': 'center'}
        xlabel = ''
        ylabel = 'Patients'
    
        # Optional plotting related inputs
    
        # Use legacy function interface
        hills = hm.make_hills(scenario_name=scenario_name, data=ssu_stops_df,
                              in_field=in_field_name, out_field=out_field_name,
                              start_analysis_dt=start_date, end_analysis_dt=end_date,
                              cat_field=cat_field_name,
                              bin_size_minutes=bin_size_minutes,
                              csv_export_path=csv_export_path,
                              plot_export_path=plot_export_path, plot_style = plot_style,
                              percentiles=percentiles, plot_percentiles=plot_percentiles,
                              pctile_color=pctile_color, pctile_linewidth=pctile_linewidth,
                              cap=cap, cap_color=cap_color,
                              main_title=main_title, main_title_properties=main_title_properties,
                              subtitle=subtitle, subtitle_properties=subtitle_properties,
                              xlabel=xlabel, ylabel=ylabel
                             )

Required input arguments#

`scenario_name` (str)#

This is a string that gets used in a few places:

part of filenames of exported CSV files,
part of filenames of exported plots,
plot subtitle default

Since it gets used in filenames, best to avoid spaces and special characters (other than underscore). Any non-alphanumeric characters other than the underscore will get transformed to underscores.

`data` (str | Path | DataFrame)#

The data table with each row representing one visit, or stop, by an entity. You can pass a string or a Path object to a CSV file, which will then get read into a pandas DataFrame. Or, you can directly pass in a DataFrame.

For example, in the SSU example, each row is a a patient who visits the short stay unit. In cycle share data, each row might be a rental of a bike for some period of time. Here are the first few records from ssu_stops_df. It is NOT necessary to have a field containing the duration of time that the entity spent in the system (e.g. LOS_hours below). You only need to have fields representing the arrival and departure times from the system - InRoomTS and OutRoomTS in this example.

ssu_stops_df.head()

	PatID	InRoomTS	OutRoomTS	PatType	LOS_hours
0	1	2024-01-01 07:44:00	2024-01-01 09:20:00	IVT	1.600000
1	2	2024-01-01 08:28:00	2024-01-01 11:13:00	IVT	2.750000
2	3	2024-01-01 11:44:00	2024-01-01 12:48:00	MYE	1.066667
3	4	2024-01-01 11:51:00	2024-01-01 21:10:00	CAT	9.316667
4	5	2024-01-01 12:10:00	2024-01-01 12:57:00	IVT	0.783333

`in_field` (str)#

The fieldname in data containing the arrival times. The datatype for the field itself must be a pandas Timestamp (or datetime64).

`out_field` (str)#

The fieldname in data containing the departure times. The datatype for the field itself must be a pandas Timestamp (or datetime64).

ssu_stops_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59877 entries, 0 to 59876
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   PatID      59877 non-null  int64         
 1   InRoomTS   59877 non-null  datetime64[ns]
 2   OutRoomTS  59877 non-null  datetime64[ns]
 3   PatType    59877 non-null  object        
 4   LOS_hours  59877 non-null  float64       
dtypes: datetime64[ns](2), float64(1), int64(1), object(1)
memory usage: 2.3+ MB

`start_analysis_dt` and `end_analysis_dt` (something convertible to a pandas `Timestamp`)#

These two dates define what we call the analysis date range. All records in stops_df whose in_field and out_field values overlap this range in any way, are included in the hillmaker computations.

Care must be taken in selecting the analysis date range. In an example like the SSU, where most patients are staying less than 24 hours, we are probably fine with picking a start_analysis_dt very close or even equal to the earliest arrival date in our stop data. However, for a system in which the length of stay may be on the order of several days, we need to be cognizant of warm up effects. In such a case, if we used the earliest arrival date for the start of the analysis, we are essentially assuming that the system starts out empty on that date. This is certainly not likely to be true in a busy system where entities are staying multiple days. Similarly, the end date should not be after the date of the latest arrival or the system will appear to be emptying out - again, not realistic.

See Example occupancy analysis for more on this issue.

Optional, but frequently used, input arguments#

`cat_field` (str, default=None)#

The fieldname in stops_df containing some sort of categorical information for which you would like to get hillmaker statistics. In the SSU example, this would be the PatType field. If a cat_field is specified, then arrival, departure and occupancy statistics are computed by category as well as overall. A common use of the category field is to specify a location. In this way, one hillmaker run can compute occupancy statitics for multiple locations. An example could be the name of the nursing unit visited as inpatients flow through a hospital. In the cycle share data example, a field specifying whether the renter was a subscription holder or a casual renter, lets us see the very different bike rental patterns by these two distinct populations.

`bin_size_minutes` (int, default=60)#

Central to hillmaker is the notion of dividing each day into equally sized time bins such as hours (bin_size_minutes=60) or half-hours. All of the summary tables and plots will use bin_size_minutes. Pick a value that makes sense for your study and for the level of time of day fluctuations present. Try different values and compare the plots. Large values might obscure important short-term fluctions in arrivals or occupancy.

The value used must be a factor of 1440 (minutes in a day).

More optional input arguments#

`cats_to_exclude` (list, default=None)#

If you specify a category field via cat_field, you can optionally provide a list of specific category values for which you do not want to consider in the analysis. A similar effect could be obtained by pre-filtering out these records from data.

`occ_weight_field` (str, default=None)#

While hillmaker is usually used to compute occupancy summaries, it can also be used to compute associated measures that are directly related to occupancy. A common example is using hillmaker to estimate staffing requirements based on staff to patient ratios (by category). This can be done by specifying a column in the stops Dataframe which contains the weights to use for occupancy incrementing. The default of None corresponds to a weight of 1.0. For example, in the SSU dataset, let’s assume that a 4:1 patient to staff ratio was appropriate for all patient types except CAT, for which 2:1 was needed. We could create a column of occupancy weights of 0.50 for the CAT patients and 0.25 for all other patient types.

`percentiles` (tuple or list, default=(0.25, 0.5, 0.75, 0.95, 0.99))#

Use this parameter to control which percentiles of occupancy (and number of arrivals and departures) are computed.

`los_units` (str, default=’hours’)#

A statistical summary of length of stay (difference between departure and arrival times) is done. This parameter controls the time units to use for reporting the results.

See https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html for allowable values (smallest value allowed is ‘seconds’, largest is ‘days’. The default is ‘hours’.

`export_bydatetime_csv` and `export_summaries_csv` (bool, default=True)#

These two parameters control the exporting to CSV of the bydatetime and summary Dataframe objects. They are exported to the location specified by the csv_export_path parameter.

`csv_export_path` (str or Path, default=current directory)#

Destination path for exported csv files - default is current directory.

`make_all_dow_plots` and `make_all_week_plots` (bool, default=True)#

If make_all_dow_plots=True, day of week plots are created for occupancy, arrival, and departure (resulting in 21 plots). If make_all_week_plots=True, full week plots are created for occupancy, arrival, and departure (3 plots).

For the CLI and the make_hills legacy interface, these parameters default to True and all plots are created by default. In the object oriented API, these default to False and the user has full control over plot creation and exporting.

`plot_export_path` (str or Path, default=current directory)#

Destination path for exported plots (png) files - default is current directory.

Optional plot related input arguments#

`plot_percentiles` (tuple or list of float, default=(0.95, 0.75))#

Percentiles to plot. The order matters as subsequent parameters will allow you to style these series.

`pctile_color` (list or tuple of color codes (e.g. [‘blue’, ‘green’] or list(‘gb’))#

Line color for each percentile series plotted. Order should match order of percentiles list. Default is (‘black’, ‘grey’).

`pctile_linestyle` (list or tuple of line styles (e.g. [‘-’, ‘–‘]), default is (‘-’, ‘–‘)#

Line style for each percentile series plotted.

`pctile_linewidth` (list or tuple of line widths (e.g. [1.0, 0.75]), default is (0.75, 0.75)#

Line width for each percentile series plotted.

`cap` (int, default=None)#

Use this to display a capacity line on the occupancy graphs.

`cap_color` (str, default=’r’)#

Color to use for the capacity line.

`xlabel` and `ylabel` (str, default=’Hour’ and ‘Volume’)#

Axes labels for the plots. For more detailed control over plotting, use the object oriented API.

`main_title` (str, default=’{Occupancy or Arrivals or Departures} by time of day and day of week’)#

Main title for plot.

`main_title_properties` (Dict} default={‘loc’: ‘left’, ‘fontsize’: 16})#

Dictionary of main title properties. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html.

`subtitle` (str, default=’Scenario: {scenario_name}’)#

Subtitle for plot.

`subtitle_properties` (Dict} default={‘loc’: ‘left’, ‘style’: ‘italic’})#

Dictionary of subtitle properties. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html. Both the title and subtitle are actually matplotlib title objects.

`legend_properties` (Dict} default={{‘loc’: ‘best’, ‘frameon’: True, ‘facecolor’: ‘w’})#

Dictionary of legend properties. See https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html.

`first_dow` (str, default=’mon’)#

Controls which day of week appears first in weekly plot. One of ['mon', 'tue', 'wed', 'thu', 'fri', 'sat, 'sun'].

Advanced optional input arguments#

There are a few advanced options related to how occupancy is computed. Best not to use these unless you know what you’re doing.

`edge_bins` (int, default=1)#

Occupancy contribution method for arrival and departure bins. 1=fractional (the default), 2=entire bin. The default way that hillmaker computes occupancy is described in detail in How is occupancy computed?. For the datetime bins corresponding to an entity’s arrival and departure, occupancy in that bin is the fraction of time for which the entity was present during that time bin. Alternatively, you could choose to give “full credit” for occupancy during the arrival and departure bins. However, this can lead to serious overestimates of occupancy, especially for bin sizes that are large relative to typical time spent in the system by each entity. If, for whatever reason, you choose to use edge_bins=2, you should use a small value of the next parameter, highres_bin_size_minutes,

`highres_bin_size_minutes` (int, default=`bin_size_minutes`)#

Number of minutes in each time bin of the day used for initial computation of the number of arrivals, departures, and the occupancy level - i.e. in the creation of the bydatetime table. By default, this is set equal to the value of bin_size_minutes since it doesn’t affect aggregate arrival, occupancy or departure statistics as long as the default of edge_bins=1 is used. So, why would you ever use this parameter?

If you use edge_bins=2, you should use a small value for highres_bin_size_minutes to avoid serious overestimates of occupancy.
You may want to take a very detailed look at occupancy on specific dates for very short time bin sizes.

`keep_highres_bin_size_minutes` (bool, default=False)#

If you want to save the high resolution version of the bydatetime Dateframe in the dictionary returned by make_hills(), then set this parameter to True.

`nonstationary_stats` (bool, default=True)#

If True, datetime bin stats are computed. Else, they aren’t computed. Default is True

`stationary_stats` (bool, default=True)#

If True, overall, non-time bin dependent, stats are computed. Else, they aren’t computed. Default is True

`verbosity` (int, default=1)#

Used to set level in loggers. 0=logging.WARNING, 1=logging.INFO (the default), 2=logging.DEBUG. The default level provides quite a bit of detail about the hillmaking process and is a good way to make sure that everything is okay with your data.

Calling `make_hills()`#

Using the `make_hills` function#

Before the creation of the object-oriented API, hillmaker could be used by calling a single, module level function called make_hills. This type of legacy use is still possible. The make_hills function returns the same hills dictionary returned by the OO API but can also create and export plots and dataframes via function arguments.

We’ll build on the example from the Getting started with hillmaker section. For this scenario, we want to:

use half-hourly time bins
analyze the summer period of 2024-06-01 - 2024-08-31
include a capacity line at 110 for the occupancy plot
plot the 85th and 95th percentile

# Required inputs
scenario_name = 'ssu_summer24'
in_field_name = 'InRoomTS'
out_field_name = 'OutRoomTS'
start_date = '2024-06-01'
end_date = '2024-08-31'

# Optional inputs
cat_field_name = 'PatType'
bin_size_minutes = 30
csv_export_path = './output'

# Optional plotting inputs
plot_export_path = './output'
plot_style = 'default'
bar_color_mean = 'grey'
percentiles = [0.85, 0.95]
plot_percentiles = [0.95, 0.85]
pctile_color = ['blue', 'green']
pctile_linewidth = [0.8, 1.0]
cap = 110
cap_color = 'black'
main_title = 'Occupancy summary'
main_title_properties = {'loc': 'center', 'fontsize':20}
subtitle = 'Summer 2024 analysis'
subtitle_properties = {'loc': 'center'}
xlabel = ''
ylabel = 'Patients'

# Optional plotting related inputs

# Use legacy function interface
hills = hm.make_hills(scenario_name=scenario_name, data=ssu_stops_df,
                      in_field=in_field_name, out_field=out_field_name,
                      start_analysis_dt=start_date, end_analysis_dt=end_date,
                      cat_field=cat_field_name,
                      bin_size_minutes=bin_size_minutes,
                      csv_export_path=csv_export_path, 
                      plot_export_path=plot_export_path, plot_style = plot_style,
                      percentiles=percentiles, plot_percentiles=plot_percentiles,
                      pctile_color=pctile_color, pctile_linewidth=pctile_linewidth,
                      cap=cap, cap_color=cap_color,
                      main_title=main_title, main_title_properties=main_title_properties,
                      subtitle=subtitle, subtitle_properties=subtitle_properties,
                      xlabel=xlabel, ylabel=ylabel
                     )

Now we can use the get_plot function to retrieve the occupancy plot. All the plots and csv files were also exported.

# Get and display occupancy plot
occ_plot = hm.get_plot(hills, 'occupancy')
occ_plot

_images/79ddd913d3e33d897b3e277e731c52a6bb17789f7ab831ece3e24e7b74277be8.png

Output dictionary#

The make_hills function returns a dictionary containing all of the outputs created. There are convenience functions such as get_plot, get_bydatetime_df and get_summary_df that can be used to pull out items of interest. Of course, you can also access the dictionary directly. Let’s check it out.

hills.keys()

dict_keys(['bydatetime', 'summaries', 'los_summary', 'settings', 'plots', 'runtime'])

`bydatetime` - Detailed bydatetime dataframes#

hills['bydatetime'].keys()

dict_keys(['PatType_datetime', 'datetime'])

hills['bydatetime']['PatType_datetime']

		arrivals	departures	occupancy	day_of_week	dow_name	bin_of_day_str	bin_of_day	bin_of_week
PatType	datetime
ART	2024-06-01 00:00:00	0.0	0.0	0.0	5	Sat	00:00	0	240
	2024-06-01 00:30:00	0.0	0.0	0.0	5	Sat	00:30	1	241
	2024-06-01 01:00:00	0.0	0.0	0.0	5	Sat	01:00	2	242
	2024-06-01 01:30:00	0.0	0.0	0.0	5	Sat	01:30	3	243
	2024-06-01 02:00:00	0.0	0.0	0.0	5	Sat	02:00	4	244
...	...	...	...	...	...	...	...	...	...
OTH	2024-08-31 21:30:00	0.0	0.0	0.0	5	Sat	21:30	43	283
	2024-08-31 22:00:00	0.0	0.0	0.0	5	Sat	22:00	44	284
	2024-08-31 22:30:00	0.0	0.0	0.0	5	Sat	22:30	45	285
	2024-08-31 23:00:00	0.0	0.0	0.0	5	Sat	23:00	46	286
	2024-08-31 23:30:00	0.0	0.0	0.0	5	Sat	23:30	47	287

22080 rows × 8 columns

hm.get_bydatetime_df(hills, by_category=False)

	arrivals	departures	occupancy	day_of_week	dow_name	bin_of_day_str	bin_of_day	bin_of_week
datetime
2024-06-01 00:00:00	0.0	2.0	11.366667	5	Sat	00:00	0	240
2024-06-01 00:30:00	0.0	2.0	10.200000	5	Sat	00:30	1	241
2024-06-01 01:00:00	0.0	1.0	8.500000	5	Sat	01:00	2	242
2024-06-01 01:30:00	0.0	0.0	8.000000	5	Sat	01:30	3	243
2024-06-01 02:00:00	0.0	1.0	7.100000	5	Sat	02:00	4	244
...	...	...	...	...	...	...	...	...
2024-08-31 21:30:00	0.0	2.0	2.433333	5	Sat	21:30	43	283
2024-08-31 22:00:00	0.0	0.0	1.000000	5	Sat	22:00	44	284
2024-08-31 22:30:00	1.0	0.0	1.633333	5	Sat	22:30	45	285
2024-08-31 23:00:00	0.0	0.0	2.000000	5	Sat	23:00	46	286
2024-08-31 23:30:00	0.0	0.0	2.000000	5	Sat	23:30	47	287

4416 rows × 8 columns

`summaries` - Occupancy, arrival and departure summary dataframes#

hills['summaries'].keys()

dict_keys(['nonstationary', 'stationary'])

hills['summaries']['nonstationary'].keys()

dict_keys(['PatType_dow_binofday', 'dow_binofday'])

hills['summaries']['nonstationary']['PatType_dow_binofday'].keys()

dict_keys(['occupancy', 'arrivals', 'departures'])

hills['summaries']['nonstationary']['PatType_dow_binofday']['occupancy']

	PatType	day_of_week	dow_name	bin_of_day	bin_of_day_str	count	mean	min	max	stdev	sem	var	cv	skew	kurt	p85	p95
0	ART	0	Mon	0	00:00	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	ART	0	Mon	1	00:30	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	ART	0	Mon	2	01:00	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	ART	0	Mon	3	01:30	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	ART	0	Mon	4	02:00	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1675	OTH	6	Sun	43	21:30	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1676	OTH	6	Sun	44	22:00	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1677	OTH	6	Sun	45	22:30	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1678	OTH	6	Sun	46	23:00	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1679	OTH	6	Sun	47	23:30	13.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

1680 rows × 17 columns

help(hm.get_summary_df)

Help on function get_summary_df in module hillmaker.hills:

get_summary_df(hills: dict, flow_metric: str = 'occupancy', by_category: bool = True, stationary: bool = False)
    Get summary dataframe
    
    Parameters
    ----------
    hills : dict
        Created by `make_hills`
    flow_metric : str
        Either of 'arrivals', 'departures', 'occupancy' ('a', 'd', and 'o' are sufficient).
        Default='occupancy'
    by_category : bool
        Default=True corresponds to category specific statistics. A value of False gives overall statistics.
    stationary : bool
        Default=False corresponds to the standard nonstationary statistics (i.e. by TOD and DOW)
    
    Returns
    -------
    DataFrame

hm.get_summary_df(hills, flow_metric='arrivals', by_category=False)

	day_of_week	dow_name	bin_of_day	bin_of_day_str	count	mean	min	max	stdev	sem	var	cv	skew	kurt	p85	p95
0	0	Mon	0	00:00	13.0	0.000000	0.0	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.0	0.0
1	0	Mon	1	00:30	13.0	0.076923	0.0	1.0	0.277350	0.076923	0.076923	3.605551	3.605551	13.000000	0.0	0.4
2	0	Mon	2	01:00	13.0	0.076923	0.0	1.0	0.277350	0.076923	0.076923	3.605551	3.605551	13.000000	0.0	0.4
3	0	Mon	3	01:30	13.0	0.076923	0.0	1.0	0.277350	0.076923	0.076923	3.605551	3.605551	13.000000	0.0	0.4
4	0	Mon	4	02:00	13.0	0.000000	0.0	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
331	6	Sun	43	21:30	13.0	0.230769	0.0	1.0	0.438529	0.121626	0.192308	1.900292	1.451132	0.094545	1.0	1.0
332	6	Sun	44	22:00	13.0	0.076923	0.0	1.0	0.277350	0.076923	0.076923	3.605551	3.605551	13.000000	0.0	0.4
333	6	Sun	45	22:30	13.0	0.230769	0.0	1.0	0.438529	0.121626	0.192308	1.900292	1.451132	0.094545	1.0	1.0
334	6	Sun	46	23:00	13.0	0.307692	0.0	1.0	0.480384	0.133235	0.230769	1.561249	0.946212	-1.339394	1.0	1.0
335	6	Sun	47	23:30	13.0	0.000000	0.0	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.0	0.0

336 rows × 16 columns

`los_summary` - Length of stay summary#

hills['los_summary']

{'los_stats': <pandas.io.formats.style.Styler at 0x7f6b95ab83d0>,
 'los_histo': <Figure size 640x480 with 1 Axes>,
 'los_stats_bycat': <pandas.io.formats.style.Styler at 0x7f6b95679b40>,
 'los_histo_bycat': <Figure size 900x600 with 5 Axes>}

hills['los_summary']['los_stats_bycat']

	count	mean	min	max	stdev	cv	skew	p50	p75	p95	p99
PatType
ART	1939	4.0	0.3	12.9	2.0	0.5	0.9	3.7	5.2	7.8	9.9
CAT	3604	4.9	0.1	26.6	3.5	0.7	1.3	4.0	6.7	11.5	15.9
IVT	11227	2.0	0.1	8.3	1.0	0.5	1.0	1.8	2.5	3.9	5.0
MYE	2253	2.0	0.1	8.0	1.1	0.6	1.0	1.8	2.6	4.2	5.4
OTH	1224	8.0	1.4	23.0	3.2	0.4	0.8	7.6	9.8	14.3	17.0

hills['los_summary']['los_histo_bycat']

_images/8bd37b7a70cfb35e765c0dc87a92cdc18865e3c7d0222d6927c7966e36e1ed9a.png

help(hm.get_los_plot)

Help on function get_los_plot in module hillmaker.hills:

get_los_plot(hills: dict, by_category: bool = True)
    Get length of stay histogram from length of stay summary
    
    Parameters
    ----------
    hills : dict
        Created by `make_hills`
    by_category : bool
        Default=True corresponds to category specific statistics. A value of False gives overall statistics.
    
    Returns
    -------
    plot object from matplotlib

hm.get_los_plot(hills, by_category=False)

_images/e1630296114e9113bcb6e86f2709d87173a1b62b7cef0304e72925569e7ccd1b.png

hm.get_los_stats(hills, by_category=False)

	count	mean	min	max	stdev	cv	skew	p50	p75	p95	p99
all	20247	3.1	0.1	26.6	2.6	0.8	2.3	2.2	3.6	8.7	12.9

`plots` - Daily and weekly summary plots#

hills['plots'].keys()

dict_keys(['ssu_summer24_occupancy_plot_week', 'ssu_summer24_arrivals_plot_week', 'ssu_summer24_departures_plot_week', 'ssu_summer24_occupancy_plot_Mon', 'ssu_summer24_occupancy_plot_Tue', 'ssu_summer24_occupancy_plot_Wed', 'ssu_summer24_occupancy_plot_Thu', 'ssu_summer24_occupancy_plot_Fri', 'ssu_summer24_occupancy_plot_Sat', 'ssu_summer24_occupancy_plot_Sun', 'ssu_summer24_arrivals_plot_Mon', 'ssu_summer24_arrivals_plot_Tue', 'ssu_summer24_arrivals_plot_Wed', 'ssu_summer24_arrivals_plot_Thu', 'ssu_summer24_arrivals_plot_Fri', 'ssu_summer24_arrivals_plot_Sat', 'ssu_summer24_arrivals_plot_Sun', 'ssu_summer24_departures_plot_Mon', 'ssu_summer24_departures_plot_Tue', 'ssu_summer24_departures_plot_Wed', 'ssu_summer24_departures_plot_Thu', 'ssu_summer24_departures_plot_Fri', 'ssu_summer24_departures_plot_Sat', 'ssu_summer24_departures_plot_Sun'])

hm.get_plot?

Signature:
hm.get_plot(
    hills: dict,
    flow_metric: str = 'occupancy',
    day_of_week: str = 'week',
)
Docstring:
Get plot object for specified flow metric and whether full week or specified day of week.

Parameters
----------
hills : dict
    Created by `make_hills`
flow_metric : str
    Either of 'arrivals', 'departures', 'occupancy' ('a', 'd', and 'o' are sufficient).
    Default='occupancy'
day_of_week : str
    Either of 'week', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'. Default='week'

Returns
-------
plot object from matplotlib
File:      ~/Documents/projects/hillmaker/src/hillmaker/hills.py
Type:      function

hm.get_plot(hills, 'o', day_of_week='Mon')

_images/812b4fae1dacb85680be119ca77f919ad69179a3dbb7524e93dd6f6b51e9d647.png

hm.get_plot(hills, 'o', 'week')

`settings` - Scenario settings#

hills['settings']

{'scenario_name': 'ssu_summer24',
 'in_field': 'InRoomTS',
 'out_field': 'OutRoomTS',
 'start_analysis_dt': numpy.datetime64('2024-06-01T00:00:00.000000000'),
 'end_analysis_dt': numpy.datetime64('2024-06-01T00:00:00.000000000'),
 'cat_field': 'PatType',
 'occ_weight_field': None,
 'bin_size_minutes': 30,
 'los_units': 'hours',
 'edge_bins': <EdgeBinsEnum.FRACTIONAL: 1>,
 'highres_bin_size_minutes': 30}

`runtime` - Total runtime in seconds#

hills['runtime']

21.24074344

Using a config file#

Instead of passing a bunch of arguments to make_hills, you can use a TOML formatted config file. Here’s what an example config file might look like:

[scenario_data]
scenario_name = "ss_example_1_config"
data = "./data/ssu_2024.csv"

[fields]
in_field = "InRoomTS"
out_field = "OutRoomTS"
# Just remove the following line if no category field
cat_field = "PatType"

[analysis_dates]
start_analysis_dt = 2024-01-02
end_analysis_dt = 2024-03-30

[settings]
bin_size_minutes = 60
verbosity = 1
csv_export_path = './output'
plot_export_path = './output'

# Add any additional arguments here
# Strings should be surrounded in double quotes
# Floats and ints are specified in the normal way as values
# Dates are specified as shown above

# For arguments that take lists, the entries look
# just like Python lists and following the other rules above

# cats_to_exclude = ["IVT", "OTH"]
# percentiles = [0.5, 0.8, 0.9]

# For arguments that take dictionaries, do this:
# main_title_properties = {loc = 'left', fontsize = 16}
# subtitle_properties = {loc = 'left', style = 'italic'}
# legend_properties = {loc = 'best', frameon = true, facecolor = 'w'}

The sections headings, [scenario_data], [fields], and [analysis_dates] aren’t actually necessary. You could actually put all input parameters within the [settings] section. Including the other headings is just an organizational aid.

Warning

You MUST include the [settings] section header.

hills_config = hm.make_hills(config='./input/ssu_example_1_config.toml')

2024-01-16 09:53:48,904 - hillmaker.hills - INFO - Starting scenario ss_example_1_config
2024-01-16 09:53:57,130 - hillmaker.summarize - INFO - Created nonstationary summaries - ['PatType']
2024-01-16 09:53:58,808 - hillmaker.summarize - INFO - Created nonstationary summaries - []
2024-01-16 09:53:58,859 - hillmaker.summarize - INFO - Created stationary summaries - ['PatType']
2024-01-16 09:53:58,879 - hillmaker.summarize - INFO - Created stationary summaries - []
2024-01-16 09:54:00,445 - hillmaker.hills - INFO - bydatetime and summaries by datetime created (seconds): 11.5415
2024-01-16 09:54:00,553 - hillmaker.hills - INFO - By datetime exported to csv in ./output (seconds): 0.1066
2024-01-16 09:54:00,630 - hillmaker.hills - INFO - Summaries exported to csv in ./output (seconds): 0.0765
2024-01-16 09:54:01,691 - hillmaker.plotting - INFO - Full week plots created (seconds): 1.0618
2024-01-16 09:54:03,908 - hillmaker.plotting - INFO - Individual day of week plots created (seconds): 2.2022
2024-01-16 09:54:03,909 - hillmaker.hills - INFO - Total time (seconds): 14.9937

hills_config['settings']

{'scenario_name': 'ss_example_1_config',
 'in_field': 'InRoomTS',
 'out_field': 'OutRoomTS',
 'start_analysis_dt': numpy.datetime64('2024-01-02T00:00:00.000000000'),
 'end_analysis_dt': numpy.datetime64('2024-01-02T00:00:00.000000000'),
 'cat_field': 'PatType',
 'occ_weight_field': None,
 'bin_size_minutes': 60,
 'los_units': 'hours',
 'edge_bins': <EdgeBinsEnum.FRACTIONAL: 1>,
 'highres_bin_size_minutes': 60}

Using the make_hills() function

Contents

Using the make_hills() function#

Required input arguments#

scenario_name (str)#

data (str | Path | DataFrame)#

in_field (str)#

out_field (str)#

start_analysis_dt and end_analysis_dt (something convertible to a pandas Timestamp)#

Optional, but frequently used, input arguments#

cat_field (str, default=None)#

bin_size_minutes (int, default=60)#

More optional input arguments#

cats_to_exclude (list, default=None)#

occ_weight_field (str, default=None)#

percentiles (tuple or list, default=(0.25, 0.5, 0.75, 0.95, 0.99))#

los_units (str, default=’hours’)#

export_bydatetime_csv and export_summaries_csv (bool, default=True)#

csv_export_path (str or Path, default=current directory)#

make_all_dow_plots and make_all_week_plots (bool, default=True)#

plot_export_path (str or Path, default=current directory)#

Optional plot related input arguments#

plot_style (str, default=’ggplot’)#

figsize (tuple, default=(15, 10))#

bar_color_mean (str, default=’steelblue’)#

plot_percentiles (tuple or list of float, default=(0.95, 0.75))#

pctile_color (list or tuple of color codes (e.g. [‘blue’, ‘green’] or list(‘gb’))#

pctile_linestyle (list or tuple of line styles (e.g. [‘-’, ‘–‘]), default is (‘-’, ‘–‘)#

pctile_linewidth (list or tuple of line widths (e.g. [1.0, 0.75]), default is (0.75, 0.75)#

cap (int, default=None)#

cap_color (str, default=’r’)#

xlabel and ylabel (str, default=’Hour’ and ‘Volume’)#

main_title (str, default=’{Occupancy or Arrivals or Departures} by time of day and day of week’)#

main_title_properties (Dict} default={‘loc’: ‘left’, ‘fontsize’: 16})#

subtitle (str, default=’Scenario: {scenario_name}’)#

subtitle_properties (Dict} default={‘loc’: ‘left’, ‘style’: ‘italic’})#

legend_properties (Dict} default={{‘loc’: ‘best’, ‘frameon’: True, ‘facecolor’: ‘w’})#

first_dow (str, default=’mon’)#