How is occupancy computed?#
At a very high level, this is what hillmaker does to compute occupancy for a table of stop records named data
with a category field named category
. Details on each of these steps are included below.
Compute the
bydatetime
table that contains the number of entities that have arrived, departed and are present in each datetime bin of the analysis range for each uniquecategory
value.Use pandas to compute aggregate summary statistics based on the
bydatetime
table.Create plots and summary tables
Compute the bydatetime
table#
In order to compute aggregate summary statistics for occupancy by time of day and day of week, we first need to determine how many patients (entities, in general) arrived, departed, and are present during each time bin of each date over the analysis range. In our Short Stay Unit (SSU) example, we have been using an analysis date range of 2024-01-02 through 2024-09-30. Let’s use hourly time bins. This leads to 6552 hourly datetime bins as shown below. Note that we are using a 24-hour clock.
import pandas as pd
bydatetime_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_PatType_datetime.csv')
bydatetime_df['datetime']
0 2024-01-02 00:00:00
1 2024-01-02 01:00:00
2 2024-01-02 02:00:00
3 2024-01-02 03:00:00
4 2024-01-02 04:00:00
...
32755 2024-09-30 19:00:00
32756 2024-09-30 20:00:00
32757 2024-09-30 21:00:00
32758 2024-09-30 22:00:00
32759 2024-09-30 23:00:00
Name: datetime, Length: 32760, dtype: object
Assume the very first patient arrives at 06:15 on 2024-01-02 and departs at 09:36 the same day. For the time bins starting at 07:00 and 08:00, the patient is in the unit for the entire time bin. However, for the arrival bin, 06:00, the patient is only present for 45 minutes. Similarly, for the departure bin, the patient is in the unit for 36 minutes.
Time bin |
Minutes occupied |
Occupancy contribution |
---|---|---|
2024-01-02 06:00 |
45 |
\(45/60=0.75\) |
2024-01-02 07:00 |
60 |
\(60/60=1.00\) |
2024-01-02 08:00 |
60 |
\(60/60=1.00\) |
2024-01-02 09:00 |
36 |
\(36/60=0.60\) |
During the hillmaking process, occupancy contributions by datetime bin for each patient are computed and are accumulated in NumPy arrays. Similarly, the number of arriving and departing entities in each datetime bin are also counted and stored in separate arrays. Eventually these arrays are converted to a pandas DataFrame
that we refer to as the bydatetime table. Here’s a few records from a typical example - the PatType
field is the category.
bydatetime_df.head(10)
PatType | datetime | arrivals | departures | occupancy | dow_name | bin_of_day_str | day_of_week | bin_of_day | bin_of_week | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ART | 2024-01-02 00:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 00:00 | 1 | 0 | 24 |
1 | ART | 2024-01-02 01:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 01:00 | 1 | 1 | 25 |
2 | ART | 2024-01-02 02:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 02:00 | 1 | 2 | 26 |
3 | ART | 2024-01-02 03:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 03:00 | 1 | 3 | 27 |
4 | ART | 2024-01-02 04:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 04:00 | 1 | 4 | 28 |
5 | ART | 2024-01-02 05:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 05:00 | 1 | 5 | 29 |
6 | ART | 2024-01-02 06:00:00 | 2.0 | 0.0 | 0.500000 | Tue | 06:00 | 1 | 6 | 30 |
7 | ART | 2024-01-02 07:00:00 | 6.0 | 0.0 | 4.400000 | Tue | 07:00 | 1 | 7 | 31 |
8 | ART | 2024-01-02 08:00:00 | 5.0 | 1.0 | 10.583333 | Tue | 08:00 | 1 | 8 | 32 |
9 | ART | 2024-01-02 09:00:00 | 5.0 | 1.0 | 14.300000 | Tue | 09:00 | 1 | 9 | 33 |
What about the boundaries of the analysis date range?#
As long as the records are in the stops dataframe, hillmaker will account for patients who might have arrived before but discharged after the start date. In our SSU example, the start date was 2024-01-02 because we wanted to ignore the impact of the January 1 holiday. However, records from 2024-01-01 are in the stops dataframe.
ssu_stopdata = 'https://raw.githubusercontent.com/misken/hillmaker-examples/main/data/ssu_2024.csv'
# ssu_stopdata = './data/ssu_2024.csv'
stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df[stops_df['InRoomTS'] < pd.Timestamp('2024-01-02')]
PatID | InRoomTS | OutRoomTS | PatType | LOS_hours | |
---|---|---|---|---|---|
0 | 1 | 2024-01-01 07:44:00 | 2024-01-01 09:20:00 | IVT | 1.600000 |
1 | 2 | 2024-01-01 08:28:00 | 2024-01-01 11:13:00 | IVT | 2.750000 |
2 | 3 | 2024-01-01 11:44:00 | 2024-01-01 12:48:00 | MYE | 1.066667 |
3 | 4 | 2024-01-01 11:51:00 | 2024-01-01 21:10:00 | CAT | 9.316667 |
4 | 5 | 2024-01-01 12:10:00 | 2024-01-01 12:57:00 | IVT | 0.783333 |
5 | 6 | 2024-01-01 14:16:00 | 2024-01-01 17:35:00 | IVT | 3.316667 |
6 | 7 | 2024-01-01 14:40:00 | 2024-01-01 17:24:00 | IVT | 2.733333 |
7 | 8 | 2024-01-01 17:25:00 | 2024-01-02 01:53:00 | CAT | 8.466667 |
Notice that the last patient who arrived on 2024-01-01 wasn’t discharged until 2024-01-02 01:53. If we look at the bydatetime table, we can see the occupancy contributions of this patient.
bydatetime_total_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_datetime.csv')
bydatetime_total_df.head(3)
datetime | arrivals | departures | occupancy | dow_name | bin_of_day_str | day_of_week | bin_of_day | bin_of_week | |
---|---|---|---|---|---|---|---|---|---|
0 | 2024-01-02 00:00:00 | 0.0 | 0.0 | 1.000000 | Tue | 00:00 | 1 | 0 | 24 |
1 | 2024-01-02 01:00:00 | 0.0 | 1.0 | 0.883333 | Tue | 01:00 | 1 | 1 | 25 |
2 | 2024-01-02 02:00:00 | 0.0 | 0.0 | 0.000000 | Tue | 02:00 | 1 | 2 | 26 |
We see that:
there is one patient in the system from 12a-1a. This patient arrived on 2024-01-01 and had not yet been discharged as of midnight on 2024-01-02.
between 1a-2a (at 01:53), this patient was discharged. The occupancy value of 0.883333 for the 1a-2a time bin means that the patient was discharged after spending approximately \(88\%\) (\(53/60\) minutes) of the 1a-2a period in the SSU.
Similarly, those patients who arrive during the analysis date range but are discharged after the end date, are included by hillmaker for the time spent in the system duing the analysis date range.
Using numpy to speed up the computations#
In early Python versions of hillmaker, the computations described above were done by directly incrementing specific cells of a pandas DataFrame
- the approach is described in this, now outdated, blog post. Unfortunately, updating specific cells in a pandas DataFrame
is really slow.
The current version of hillmaker does all of the occupancy incrementing using vectorized operations on numpy arrays and is much faster (~5-10x). For each stop record, an occupancy incrementor array is computed. For the example in the “Occupancy contribution for first patient” table above, the array would be [0.75 1.0 1.0 0.6]
. The incrementor arrays are then added, at the appropriate spot, to an overall occupancy array, where each array element corresponds to a datetime bin. Adding numpy arrays is very fast. See datetime.py for all the details - it’s pretty well commented.
Compute the arrival, departure and occupancy summaries#
Once the bydatetime
table is computed, summary statistics can then be computed using pandas grouping and aggregate functions such as mean
and quantile
. Here’s a snippet.
occ_PatType_dow_binofday_df = pd.read_csv('output/cli_demo_ssu_60_occupancy_PatType_dow_binofday.csv')
occ_PatType_dow_binofday_df.iloc[50:75]
PatType | day_of_week | dow_name | bin_of_day | bin_of_day_str | count | mean | min | max | stdev | sem | var | cv | skew | kurt | p25 | p50 | p75 | p95 | p99 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50 | ART | 2 | Wed | 2 | 02:00 | 39.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
51 | ART | 2 | Wed | 3 | 03:00 | 39.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
52 | ART | 2 | Wed | 4 | 04:00 | 39.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
53 | ART | 2 | Wed | 5 | 05:00 | 39.0 | 0.358120 | 0.000000 | 1.333333 | 0.377373 | 0.060428 | 0.142410 | 1.053762 | 1.100851 | 0.604813 | 0.008333 | 0.250000 | 0.575000 | 1.180000 | 1.320667 |
54 | ART | 2 | Wed | 6 | 06:00 | 39.0 | 3.672222 | 1.683333 | 5.716667 | 1.031559 | 0.165182 | 1.064113 | 0.280909 | 0.059364 | -0.694543 | 3.008333 | 3.516667 | 4.408333 | 5.183333 | 5.628000 |
55 | ART | 2 | Wed | 7 | 07:00 | 39.0 | 7.820940 | 4.966667 | 10.633333 | 1.385444 | 0.221849 | 1.919455 | 0.177145 | 0.012103 | -0.394016 | 6.900000 | 7.600000 | 8.875000 | 9.788333 | 10.614333 |
56 | ART | 2 | Wed | 8 | 08:00 | 39.0 | 10.516239 | 6.466667 | 15.383333 | 2.184222 | 0.349755 | 4.770826 | 0.207700 | 0.452734 | -0.204397 | 9.291667 | 10.083333 | 11.633333 | 14.235000 | 15.237667 |
57 | ART | 2 | Wed | 9 | 09:00 | 39.0 | 12.370940 | 6.833333 | 20.316667 | 2.871884 | 0.459869 | 8.247715 | 0.232148 | 0.561270 | 0.441556 | 10.591667 | 12.000000 | 14.108333 | 17.330000 | 19.341333 |
58 | ART | 2 | Wed | 10 | 10:00 | 39.0 | 13.094017 | 8.250000 | 22.400000 | 2.954995 | 0.473178 | 8.731995 | 0.225675 | 0.862908 | 1.336721 | 11.350000 | 12.716667 | 15.008333 | 17.261667 | 21.114333 |
59 | ART | 2 | Wed | 11 | 11:00 | 39.0 | 13.336752 | 7.683333 | 21.333333 | 2.775359 | 0.444413 | 7.702620 | 0.208099 | 0.999808 | 1.666926 | 11.916667 | 12.766667 | 14.266667 | 19.376667 | 20.909000 |
60 | ART | 2 | Wed | 12 | 12:00 | 39.0 | 13.040598 | 8.316667 | 19.433333 | 2.560224 | 0.409964 | 6.554748 | 0.196327 | 0.492908 | 0.419179 | 11.383333 | 12.733333 | 14.683333 | 17.041667 | 19.408000 |
61 | ART | 2 | Wed | 13 | 13:00 | 39.0 | 11.941026 | 8.716667 | 17.983333 | 2.236499 | 0.358126 | 5.001928 | 0.187295 | 0.692769 | 0.055469 | 10.158333 | 11.633333 | 13.158333 | 15.430000 | 17.400667 |
62 | ART | 2 | Wed | 14 | 14:00 | 39.0 | 9.993162 | 4.300000 | 16.133333 | 2.498238 | 0.400038 | 6.241195 | 0.249995 | -0.040163 | 0.074037 | 8.591667 | 9.883333 | 11.683333 | 13.605000 | 15.246667 |
63 | ART | 2 | Wed | 15 | 15:00 | 39.0 | 7.287607 | 2.000000 | 13.616667 | 2.672925 | 0.428011 | 7.144528 | 0.366777 | 0.182371 | 0.306827 | 5.808333 | 7.000000 | 9.008333 | 12.061667 | 13.464667 |
64 | ART | 2 | Wed | 16 | 16:00 | 39.0 | 5.062821 | 1.000000 | 10.316667 | 2.303012 | 0.368777 | 5.303866 | 0.454887 | 0.262786 | -0.271186 | 3.258333 | 5.000000 | 6.708333 | 8.330000 | 10.291333 |
65 | ART | 2 | Wed | 17 | 17:00 | 39.0 | 3.279915 | 0.350000 | 7.950000 | 1.934457 | 0.309761 | 3.742123 | 0.589789 | 0.603736 | -0.200242 | 1.975000 | 3.000000 | 4.616667 | 6.926667 | 7.766333 |
66 | ART | 2 | Wed | 18 | 18:00 | 39.0 | 1.961538 | 0.000000 | 5.833333 | 1.525045 | 0.244203 | 2.325762 | 0.777474 | 0.713760 | -0.158408 | 0.875000 | 1.833333 | 2.975000 | 4.651667 | 5.561000 |
67 | ART | 2 | Wed | 19 | 19:00 | 39.0 | 1.068803 | 0.000000 | 4.000000 | 1.084218 | 0.173614 | 1.175529 | 1.014422 | 0.929103 | 0.299327 | 0.000000 | 1.000000 | 1.641667 | 3.270000 | 3.734000 |
68 | ART | 2 | Wed | 20 | 20:00 | 39.0 | 0.683333 | 0.000000 | 3.716667 | 0.950854 | 0.152258 | 0.904123 | 1.391493 | 1.525316 | 1.818134 | 0.000000 | 0.033333 | 1.000000 | 2.433333 | 3.400000 |
69 | ART | 2 | Wed | 21 | 21:00 | 39.0 | 0.367949 | 0.000000 | 2.216667 | 0.681892 | 0.109190 | 0.464976 | 1.853225 | 1.752237 | 1.752261 | 0.000000 | 0.000000 | 0.291667 | 2.000000 | 2.134333 |
70 | ART | 2 | Wed | 22 | 22:00 | 39.0 | 0.234188 | 0.000000 | 2.000000 | 0.550584 | 0.088164 | 0.303143 | 2.351033 | 2.395984 | 4.757798 | 0.000000 | 0.000000 | 0.000000 | 1.625000 | 2.000000 |
71 | ART | 2 | Wed | 23 | 23:00 | 39.0 | 0.188889 | 0.000000 | 2.000000 | 0.505819 | 0.080996 | 0.255853 | 2.677865 | 2.836720 | 7.494658 | 0.000000 | 0.000000 | 0.000000 | 1.100000 | 2.000000 |
72 | ART | 3 | Thu | 0 | 00:00 | 39.0 | 0.095299 | 0.000000 | 2.000000 | 0.360655 | 0.057751 | 0.130072 | 3.784456 | 4.546027 | 21.998469 | 0.000000 | 0.000000 | 0.000000 | 0.550000 | 1.620000 |
73 | ART | 3 | Thu | 1 | 01:00 | 39.0 | 0.048718 | 0.000000 | 1.500000 | 0.246952 | 0.039544 | 0.060985 | 5.069009 | 5.701831 | 33.637380 | 0.000000 | 0.000000 | 0.000000 | 0.040000 | 1.082000 |
74 | ART | 3 | Thu | 2 | 02:00 | 39.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Using different bin sizes for bydatetime and summary dataframes#
By default, whatever you specify for the bin_size_minutes
parameter (default is 60 minutes) is the resolution at which the bydatetime
table is created. However, if for some reason you want to create (and save) a version of the bydatetime
table with smaller time bin sizes, you can do it.
There is a highres_bin_size_minutes
parameter that you can set to a smaller value than bin_size_minutes
if you would like to compute occupancy in the bydatetime table at a finer resolution but still want to report aggregate statistics using bin_size_minutes
. For example, you could set highres_bin_size_minutes=10
but keep bin_size_minutes=60
. Using the default settings in hillmaker (see next section), this will NOT affect the aggregate statistics. However, it allows you to create a separate version of the bydatetime table at this higher resolution for further analysis. In order to save the high resolution version, set keep_highres_bydatetime=True
.
The edge_bins
parameter and its impact on occupancy calculations#
Since hillmaker’s earliest days, it has always been possible to treat the arrival and departure bins in two different ways. The default behavior, edge_bins=1
, uses the method described in this notebook in which a fractional occupancy contribution is computed based on the fraction of time the entity was in the system during the arrival and departure bins. However, if you really want to give the “full credit” for occupancy during the arrival and departure bins (i.e. use a value of \(1.0\) instead of the fraction of the bin occupied, you can set edge_bins=2
.
Warning
Using edge_bins=2
with coarse time bins and short lengths of stay can lead to dramatic overestimates of occupancy.
If you do use edge_bins=2
for some reason, you should consider setting the highres_bin_size_minutes
to a small value to mitigate overestimating occupancy. Quite honestly, we’ve really only kept this option around for research purposes.