How is occupancy computed?#

At a very high level, this is what hillmaker does to compute occupancy for a table of stop records named data with a category field named category. Details on each of these steps are included below.

  1. Compute the bydatetime table that contains the number of entities that have arrived, departed and are present in each datetime bin of the analysis range for each unique category value.

  2. Use pandas to compute aggregate summary statistics based on the bydatetime table.

  3. Create plots and summary tables

Compute the bydatetime table#

In order to compute aggregate summary statistics for occupancy by time of day and day of week, we first need to determine how many patients (entities, in general) arrived, departed, and are present during each time bin of each date over the analysis range. In our Short Stay Unit (SSU) example, we have been using an analysis date range of 2024-01-02 through 2024-09-30. Let’s use hourly time bins. This leads to 6552 hourly datetime bins as shown below. Note that we are using a 24-hour clock.

import pandas as pd
bydatetime_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_PatType_datetime.csv')
bydatetime_df['datetime']
0        2024-01-02 00:00:00
1        2024-01-02 01:00:00
2        2024-01-02 02:00:00
3        2024-01-02 03:00:00
4        2024-01-02 04:00:00
                ...         
32755    2024-09-30 19:00:00
32756    2024-09-30 20:00:00
32757    2024-09-30 21:00:00
32758    2024-09-30 22:00:00
32759    2024-09-30 23:00:00
Name: datetime, Length: 32760, dtype: object

Assume the very first patient arrives at 06:15 on 2024-01-02 and departs at 09:36 the same day. For the time bins starting at 07:00 and 08:00, the patient is in the unit for the entire time bin. However, for the arrival bin, 06:00, the patient is only present for 45 minutes. Similarly, for the departure bin, the patient is in the unit for 36 minutes.

Time bin

Minutes occupied

Occupancy contribution

2024-01-02 06:00

45

\(45/60=0.75\)

2024-01-02 07:00

60

\(60/60=1.00\)

2024-01-02 08:00

60

\(60/60=1.00\)

2024-01-02 09:00

36

\(36/60=0.60\)

During the hillmaking process, occupancy contributions by datetime bin for each patient are computed and are accumulated in NumPy arrays. Similarly, the number of arriving and departing entities in each datetime bin are also counted and stored in separate arrays. Eventually these arrays are converted to a pandas DataFrame that we refer to as the bydatetime table. Here’s a few records from a typical example - the PatType field is the category.

bydatetime_df.head(10)
PatType datetime arrivals departures occupancy dow_name bin_of_day_str day_of_week bin_of_day bin_of_week
0 ART 2024-01-02 00:00:00 0.0 0.0 0.000000 Tue 00:00 1 0 24
1 ART 2024-01-02 01:00:00 0.0 0.0 0.000000 Tue 01:00 1 1 25
2 ART 2024-01-02 02:00:00 0.0 0.0 0.000000 Tue 02:00 1 2 26
3 ART 2024-01-02 03:00:00 0.0 0.0 0.000000 Tue 03:00 1 3 27
4 ART 2024-01-02 04:00:00 0.0 0.0 0.000000 Tue 04:00 1 4 28
5 ART 2024-01-02 05:00:00 0.0 0.0 0.000000 Tue 05:00 1 5 29
6 ART 2024-01-02 06:00:00 2.0 0.0 0.500000 Tue 06:00 1 6 30
7 ART 2024-01-02 07:00:00 6.0 0.0 4.400000 Tue 07:00 1 7 31
8 ART 2024-01-02 08:00:00 5.0 1.0 10.583333 Tue 08:00 1 8 32
9 ART 2024-01-02 09:00:00 5.0 1.0 14.300000 Tue 09:00 1 9 33

What about the boundaries of the analysis date range?#

As long as the records are in the stops dataframe, hillmaker will account for patients who might have arrived before but discharged after the start date. In our SSU example, the start date was 2024-01-02 because we wanted to ignore the impact of the January 1 holiday. However, records from 2024-01-01 are in the stops dataframe.

ssu_stopdata = 'https://raw.githubusercontent.com/misken/hillmaker-examples/main/data/ssu_2024.csv'
# ssu_stopdata = './data/ssu_2024.csv'
stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df[stops_df['InRoomTS'] < pd.Timestamp('2024-01-02')]
PatID InRoomTS OutRoomTS PatType LOS_hours
0 1 2024-01-01 07:44:00 2024-01-01 09:20:00 IVT 1.600000
1 2 2024-01-01 08:28:00 2024-01-01 11:13:00 IVT 2.750000
2 3 2024-01-01 11:44:00 2024-01-01 12:48:00 MYE 1.066667
3 4 2024-01-01 11:51:00 2024-01-01 21:10:00 CAT 9.316667
4 5 2024-01-01 12:10:00 2024-01-01 12:57:00 IVT 0.783333
5 6 2024-01-01 14:16:00 2024-01-01 17:35:00 IVT 3.316667
6 7 2024-01-01 14:40:00 2024-01-01 17:24:00 IVT 2.733333
7 8 2024-01-01 17:25:00 2024-01-02 01:53:00 CAT 8.466667

Notice that the last patient who arrived on 2024-01-01 wasn’t discharged until 2024-01-02 01:53. If we look at the bydatetime table, we can see the occupancy contributions of this patient.

bydatetime_total_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_datetime.csv')
bydatetime_total_df.head(3)
datetime arrivals departures occupancy dow_name bin_of_day_str day_of_week bin_of_day bin_of_week
0 2024-01-02 00:00:00 0.0 0.0 1.000000 Tue 00:00 1 0 24
1 2024-01-02 01:00:00 0.0 1.0 0.883333 Tue 01:00 1 1 25
2 2024-01-02 02:00:00 0.0 0.0 0.000000 Tue 02:00 1 2 26

We see that:

  • there is one patient in the system from 12a-1a. This patient arrived on 2024-01-01 and had not yet been discharged as of midnight on 2024-01-02.

  • between 1a-2a (at 01:53), this patient was discharged. The occupancy value of 0.883333 for the 1a-2a time bin means that the patient was discharged after spending approximately \(88\%\) (\(53/60\) minutes) of the 1a-2a period in the SSU.

Similarly, those patients who arrive during the analysis date range but are discharged after the end date, are included by hillmaker for the time spent in the system duing the analysis date range.

Using numpy to speed up the computations#

In early Python versions of hillmaker, the computations described above were done by directly incrementing specific cells of a pandas DataFrame - the approach is described in this, now outdated, blog post. Unfortunately, updating specific cells in a pandas DataFrame is really slow.

The current version of hillmaker does all of the occupancy incrementing using vectorized operations on numpy arrays and is much faster (~5-10x). For each stop record, an occupancy incrementor array is computed. For the example in the “Occupancy contribution for first patient” table above, the array would be [0.75 1.0 1.0 0.6]. The incrementor arrays are then added, at the appropriate spot, to an overall occupancy array, where each array element corresponds to a datetime bin. Adding numpy arrays is very fast. See datetime.py for all the details - it’s pretty well commented.

Compute the arrival, departure and occupancy summaries#

Once the bydatetime table is computed, summary statistics can then be computed using pandas grouping and aggregate functions such as mean and quantile. Here’s a snippet.

occ_PatType_dow_binofday_df = pd.read_csv('output/cli_demo_ssu_60_occupancy_PatType_dow_binofday.csv')
occ_PatType_dow_binofday_df.iloc[50:75]
PatType day_of_week dow_name bin_of_day bin_of_day_str count mean min max stdev sem var cv skew kurt p25 p50 p75 p95 p99
50 ART 2 Wed 2 02:00 39.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
51 ART 2 Wed 3 03:00 39.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
52 ART 2 Wed 4 04:00 39.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
53 ART 2 Wed 5 05:00 39.0 0.358120 0.000000 1.333333 0.377373 0.060428 0.142410 1.053762 1.100851 0.604813 0.008333 0.250000 0.575000 1.180000 1.320667
54 ART 2 Wed 6 06:00 39.0 3.672222 1.683333 5.716667 1.031559 0.165182 1.064113 0.280909 0.059364 -0.694543 3.008333 3.516667 4.408333 5.183333 5.628000
55 ART 2 Wed 7 07:00 39.0 7.820940 4.966667 10.633333 1.385444 0.221849 1.919455 0.177145 0.012103 -0.394016 6.900000 7.600000 8.875000 9.788333 10.614333
56 ART 2 Wed 8 08:00 39.0 10.516239 6.466667 15.383333 2.184222 0.349755 4.770826 0.207700 0.452734 -0.204397 9.291667 10.083333 11.633333 14.235000 15.237667
57 ART 2 Wed 9 09:00 39.0 12.370940 6.833333 20.316667 2.871884 0.459869 8.247715 0.232148 0.561270 0.441556 10.591667 12.000000 14.108333 17.330000 19.341333
58 ART 2 Wed 10 10:00 39.0 13.094017 8.250000 22.400000 2.954995 0.473178 8.731995 0.225675 0.862908 1.336721 11.350000 12.716667 15.008333 17.261667 21.114333
59 ART 2 Wed 11 11:00 39.0 13.336752 7.683333 21.333333 2.775359 0.444413 7.702620 0.208099 0.999808 1.666926 11.916667 12.766667 14.266667 19.376667 20.909000
60 ART 2 Wed 12 12:00 39.0 13.040598 8.316667 19.433333 2.560224 0.409964 6.554748 0.196327 0.492908 0.419179 11.383333 12.733333 14.683333 17.041667 19.408000
61 ART 2 Wed 13 13:00 39.0 11.941026 8.716667 17.983333 2.236499 0.358126 5.001928 0.187295 0.692769 0.055469 10.158333 11.633333 13.158333 15.430000 17.400667
62 ART 2 Wed 14 14:00 39.0 9.993162 4.300000 16.133333 2.498238 0.400038 6.241195 0.249995 -0.040163 0.074037 8.591667 9.883333 11.683333 13.605000 15.246667
63 ART 2 Wed 15 15:00 39.0 7.287607 2.000000 13.616667 2.672925 0.428011 7.144528 0.366777 0.182371 0.306827 5.808333 7.000000 9.008333 12.061667 13.464667
64 ART 2 Wed 16 16:00 39.0 5.062821 1.000000 10.316667 2.303012 0.368777 5.303866 0.454887 0.262786 -0.271186 3.258333 5.000000 6.708333 8.330000 10.291333
65 ART 2 Wed 17 17:00 39.0 3.279915 0.350000 7.950000 1.934457 0.309761 3.742123 0.589789 0.603736 -0.200242 1.975000 3.000000 4.616667 6.926667 7.766333
66 ART 2 Wed 18 18:00 39.0 1.961538 0.000000 5.833333 1.525045 0.244203 2.325762 0.777474 0.713760 -0.158408 0.875000 1.833333 2.975000 4.651667 5.561000
67 ART 2 Wed 19 19:00 39.0 1.068803 0.000000 4.000000 1.084218 0.173614 1.175529 1.014422 0.929103 0.299327 0.000000 1.000000 1.641667 3.270000 3.734000
68 ART 2 Wed 20 20:00 39.0 0.683333 0.000000 3.716667 0.950854 0.152258 0.904123 1.391493 1.525316 1.818134 0.000000 0.033333 1.000000 2.433333 3.400000
69 ART 2 Wed 21 21:00 39.0 0.367949 0.000000 2.216667 0.681892 0.109190 0.464976 1.853225 1.752237 1.752261 0.000000 0.000000 0.291667 2.000000 2.134333
70 ART 2 Wed 22 22:00 39.0 0.234188 0.000000 2.000000 0.550584 0.088164 0.303143 2.351033 2.395984 4.757798 0.000000 0.000000 0.000000 1.625000 2.000000
71 ART 2 Wed 23 23:00 39.0 0.188889 0.000000 2.000000 0.505819 0.080996 0.255853 2.677865 2.836720 7.494658 0.000000 0.000000 0.000000 1.100000 2.000000
72 ART 3 Thu 0 00:00 39.0 0.095299 0.000000 2.000000 0.360655 0.057751 0.130072 3.784456 4.546027 21.998469 0.000000 0.000000 0.000000 0.550000 1.620000
73 ART 3 Thu 1 01:00 39.0 0.048718 0.000000 1.500000 0.246952 0.039544 0.060985 5.069009 5.701831 33.637380 0.000000 0.000000 0.000000 0.040000 1.082000
74 ART 3 Thu 2 02:00 39.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

Using different bin sizes for bydatetime and summary dataframes#

By default, whatever you specify for the bin_size_minutes parameter (default is 60 minutes) is the resolution at which the bydatetime table is created. However, if for some reason you want to create (and save) a version of the bydatetime table with smaller time bin sizes, you can do it.

There is a highres_bin_size_minutes parameter that you can set to a smaller value than bin_size_minutes if you would like to compute occupancy in the bydatetime table at a finer resolution but still want to report aggregate statistics using bin_size_minutes. For example, you could set highres_bin_size_minutes=10 but keep bin_size_minutes=60. Using the default settings in hillmaker (see next section), this will NOT affect the aggregate statistics. However, it allows you to create a separate version of the bydatetime table at this higher resolution for further analysis. In order to save the high resolution version, set keep_highres_bydatetime=True.

The edge_bins parameter and its impact on occupancy calculations#

Since hillmaker’s earliest days, it has always been possible to treat the arrival and departure bins in two different ways. The default behavior, edge_bins=1, uses the method described in this notebook in which a fractional occupancy contribution is computed based on the fraction of time the entity was in the system during the arrival and departure bins. However, if you really want to give the “full credit” for occupancy during the arrival and departure bins (i.e. use a value of \(1.0\) instead of the fraction of the bin occupied, you can set edge_bins=2.

Warning

Using edge_bins=2 with coarse time bins and short lengths of stay can lead to dramatic overestimates of occupancy.

If you do use edge_bins=2 for some reason, you should consider setting the highres_bin_size_minutes to a small value to mitigate overestimating occupancy. Quite honestly, we’ve really only kept this option around for research purposes.