How is occupancy computed?

How is occupancy computed?#

At a very high level, this is what hillmaker does to compute occupancy for a table of stop records named data with a category field named category. Details on each of these steps are included below.

Compute the bydatetime table that contains the number of entities that have arrived, departed and are present in each datetime bin of the analysis range for each unique category value.
Use pandas to compute aggregate summary statistics based on the bydatetime table.
Create plots and summary tables

Compute the `bydatetime` table#

In order to compute aggregate summary statistics for occupancy by time of day and day of week, we first need to determine how many patients (entities, in general) arrived, departed, and are present during each time bin of each date over the analysis range. In our Short Stay Unit (SSU) example, we have been using an analysis date range of 2024-01-02 through 2024-09-30. Let’s use hourly time bins. This leads to 6552 hourly datetime bins as shown below. Note that we are using a 24-hour clock.

import pandas as pd

bydatetime_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_PatType_datetime.csv')
bydatetime_df['datetime']

      2024-01-02 00:00:00
      2024-01-02 01:00:00
      2024-01-02 02:00:00
      2024-01-02 03:00:00
      2024-01-02 04:00:00
                ...         
  2024-09-30 19:00:00
  2024-09-30 20:00:00
  2024-09-30 21:00:00
  2024-09-30 22:00:00
  2024-09-30 23:00:00
Name: datetime, Length: 32760, dtype: object

Assume the very first patient arrives at 06:15 on 2024-01-02 and departs at 09:36 the same day. For the time bins starting at 07:00 and 08:00, the patient is in the unit for the entire time bin. However, for the arrival bin, 06:00, the patient is only present for 45 minutes. Similarly, for the departure bin, the patient is in the unit for 36 minutes.

Time bin	Minutes occupied	Occupancy contribution
2024-01-02 06:00	45	\(45/60=0.75\)
2024-01-02 07:00	60	\(60/60=1.00\)
2024-01-02 08:00	60	\(60/60=1.00\)
2024-01-02 09:00	36	\(36/60=0.60\)

During the hillmaking process, occupancy contributions by datetime bin for each patient are computed and are accumulated in NumPy arrays. Similarly, the number of arriving and departing entities in each datetime bin are also counted and stored in separate arrays. Eventually these arrays are converted to a pandas DataFrame that we refer to as the bydatetime table. Here’s a few records from a typical example - the PatType field is the category.

bydatetime_df.head(10)

	PatType	datetime	arrivals	departures	occupancy	dow_name	bin_of_day_str	day_of_week	bin_of_day	bin_of_week
0	ART	2024-01-02 00:00:00	0.0	0.0	0.000000	Tue	00:00	1	0	24
1	ART	2024-01-02 01:00:00	0.0	0.0	0.000000	Tue	01:00	1	1	25
2	ART	2024-01-02 02:00:00	0.0	0.0	0.000000	Tue	02:00	1	2	26
3	ART	2024-01-02 03:00:00	0.0	0.0	0.000000	Tue	03:00	1	3	27
4	ART	2024-01-02 04:00:00	0.0	0.0	0.000000	Tue	04:00	1	4	28
5	ART	2024-01-02 05:00:00	0.0	0.0	0.000000	Tue	05:00	1	5	29
6	ART	2024-01-02 06:00:00	2.0	0.0	0.500000	Tue	06:00	1	6	30
7	ART	2024-01-02 07:00:00	6.0	0.0	4.400000	Tue	07:00	1	7	31
8	ART	2024-01-02 08:00:00	5.0	1.0	10.583333	Tue	08:00	1	8	32
9	ART	2024-01-02 09:00:00	5.0	1.0	14.300000	Tue	09:00	1	9	33

What about the boundaries of the analysis date range?#

As long as the records are in the stops dataframe, hillmaker will account for patients who might have arrived before but discharged after the start date. In our SSU example, the start date was 2024-01-02 because we wanted to ignore the impact of the January 1 holiday. However, records from 2024-01-01 are in the stops dataframe.

ssu_stopdata = 'https://raw.githubusercontent.com/misken/hillmaker-examples/main/data/ssu_2024.csv'
# ssu_stopdata = './data/ssu_2024.csv'
stops_df = pd.read_csv(ssu_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df[stops_df['InRoomTS'] < pd.Timestamp('2024-01-02')]

	PatID	InRoomTS	OutRoomTS	PatType	LOS_hours
0	1	2024-01-01 07:44:00	2024-01-01 09:20:00	IVT	1.600000
1	2	2024-01-01 08:28:00	2024-01-01 11:13:00	IVT	2.750000
2	3	2024-01-01 11:44:00	2024-01-01 12:48:00	MYE	1.066667
3	4	2024-01-01 11:51:00	2024-01-01 21:10:00	CAT	9.316667
4	5	2024-01-01 12:10:00	2024-01-01 12:57:00	IVT	0.783333
5	6	2024-01-01 14:16:00	2024-01-01 17:35:00	IVT	3.316667
6	7	2024-01-01 14:40:00	2024-01-01 17:24:00	IVT	2.733333
7	8	2024-01-01 17:25:00	2024-01-02 01:53:00	CAT	8.466667

Notice that the last patient who arrived on 2024-01-01 wasn’t discharged until 2024-01-02 01:53. If we look at the bydatetime table, we can see the occupancy contributions of this patient.

bydatetime_total_df = pd.read_csv('output/cli_demo_ssu_60_bydatetime_datetime.csv')
bydatetime_total_df.head(3)

	datetime	departures	occupancy	dow_name	bin_of_day_str	day_of_week	bin_of_day	bin_of_week
0	2024-01-02 00:00:00	0.0	1.000000	Tue	00:00	1	0	24
1	2024-01-02 01:00:00	1.0	0.883333	Tue	01:00	1	1	25
2	2024-01-02 02:00:00	0.0	0.000000	Tue	02:00	1	2	26

We see that:

there is one patient in the system from 12a-1a. This patient arrived on 2024-01-01 and had not yet been discharged as of midnight on 2024-01-02.
between 1a-2a (at 01:53), this patient was discharged. The occupancy value of 0.883333 for the 1a-2a time bin means that the patient was discharged after spending approximately \(88\%\) (\(53/60\) minutes) of the 1a-2a period in the SSU.

Similarly, those patients who arrive during the analysis date range but are discharged after the end date, are included by hillmaker for the time spent in the system duing the analysis date range.

Using numpy to speed up the computations#

In early Python versions of hillmaker, the computations described above were done by directly incrementing specific cells of a pandas DataFrame - the approach is described in this, now outdated, blog post. Unfortunately, updating specific cells in a pandas DataFrame is really slow.

The current version of hillmaker does all of the occupancy incrementing using vectorized operations on numpy arrays and is much faster (~5-10x). For each stop record, an occupancy incrementor array is computed. For the example in the “Occupancy contribution for first patient” table above, the array would be [0.75 1.0 1.0 0.6]. The incrementor arrays are then added, at the appropriate spot, to an overall occupancy array, where each array element corresponds to a datetime bin. Adding numpy arrays is very fast. See datetime.py for all the details - it’s pretty well commented.

Compute the arrival, departure and occupancy summaries#

Once the bydatetime table is computed, summary statistics can then be computed using pandas grouping and aggregate functions such as mean and quantile. Here’s a snippet.

occ_PatType_dow_binofday_df = pd.read_csv('output/cli_demo_ssu_60_occupancy_PatType_dow_binofday.csv')
occ_PatType_dow_binofday_df.iloc[50:75]

	PatType	day_of_week	dow_name	bin_of_day	bin_of_day_str	count	mean	min	max	stdev	sem	var	cv	skew	kurt	p25	p50	p75	p95	p99
50	ART	2	Wed	2	02:00	39.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
51	ART	2	Wed	3	03:00	39.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
52	ART	2	Wed	4	04:00	39.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
53	ART	2	Wed	5	05:00	39.0	0.358120	0.000000	1.333333	0.377373	0.060428	0.142410	1.053762	1.100851	0.604813	0.008333	0.250000	0.575000	1.180000	1.320667
54	ART	2	Wed	6	06:00	39.0	3.672222	1.683333	5.716667	1.031559	0.165182	1.064113	0.280909	0.059364	-0.694543	3.008333	3.516667	4.408333	5.183333	5.628000
55	ART	2	Wed	7	07:00	39.0	7.820940	4.966667	10.633333	1.385444	0.221849	1.919455	0.177145	0.012103	-0.394016	6.900000	7.600000	8.875000	9.788333	10.614333
56	ART	2	Wed	8	08:00	39.0	10.516239	6.466667	15.383333	2.184222	0.349755	4.770826	0.207700	0.452734	-0.204397	9.291667	10.083333	11.633333	14.235000	15.237667
57	ART	2	Wed	9	09:00	39.0	12.370940	6.833333	20.316667	2.871884	0.459869	8.247715	0.232148	0.561270	0.441556	10.591667	12.000000	14.108333	17.330000	19.341333
58	ART	2	Wed	10	10:00	39.0	13.094017	8.250000	22.400000	2.954995	0.473178	8.731995	0.225675	0.862908	1.336721	11.350000	12.716667	15.008333	17.261667	21.114333
59	ART	2	Wed	11	11:00	39.0	13.336752	7.683333	21.333333	2.775359	0.444413	7.702620	0.208099	0.999808	1.666926	11.916667	12.766667	14.266667	19.376667	20.909000
60	ART	2	Wed	12	12:00	39.0	13.040598	8.316667	19.433333	2.560224	0.409964	6.554748	0.196327	0.492908	0.419179	11.383333	12.733333	14.683333	17.041667	19.408000
61	ART	2	Wed	13	13:00	39.0	11.941026	8.716667	17.983333	2.236499	0.358126	5.001928	0.187295	0.692769	0.055469	10.158333	11.633333	13.158333	15.430000	17.400667
62	ART	2	Wed	14	14:00	39.0	9.993162	4.300000	16.133333	2.498238	0.400038	6.241195	0.249995	-0.040163	0.074037	8.591667	9.883333	11.683333	13.605000	15.246667
63	ART	2	Wed	15	15:00	39.0	7.287607	2.000000	13.616667	2.672925	0.428011	7.144528	0.366777	0.182371	0.306827	5.808333	7.000000	9.008333	12.061667	13.464667
64	ART	2	Wed	16	16:00	39.0	5.062821	1.000000	10.316667	2.303012	0.368777	5.303866	0.454887	0.262786	-0.271186	3.258333	5.000000	6.708333	8.330000	10.291333
65	ART	2	Wed	17	17:00	39.0	3.279915	0.350000	7.950000	1.934457	0.309761	3.742123	0.589789	0.603736	-0.200242	1.975000	3.000000	4.616667	6.926667	7.766333
66	ART	2	Wed	18	18:00	39.0	1.961538	0.000000	5.833333	1.525045	0.244203	2.325762	0.777474	0.713760	-0.158408	0.875000	1.833333	2.975000	4.651667	5.561000
67	ART	2	Wed	19	19:00	39.0	1.068803	0.000000	4.000000	1.084218	0.173614	1.175529	1.014422	0.929103	0.299327	0.000000	1.000000	1.641667	3.270000	3.734000
68	ART	2	Wed	20	20:00	39.0	0.683333	0.000000	3.716667	0.950854	0.152258	0.904123	1.391493	1.525316	1.818134	0.000000	0.033333	1.000000	2.433333	3.400000
69	ART	2	Wed	21	21:00	39.0	0.367949	0.000000	2.216667	0.681892	0.109190	0.464976	1.853225	1.752237	1.752261	0.000000	0.000000	0.291667	2.000000	2.134333
70	ART	2	Wed	22	22:00	39.0	0.234188	0.000000	2.000000	0.550584	0.088164	0.303143	2.351033	2.395984	4.757798	0.000000	0.000000	0.000000	1.625000	2.000000
71	ART	2	Wed	23	23:00	39.0	0.188889	0.000000	2.000000	0.505819	0.080996	0.255853	2.677865	2.836720	7.494658	0.000000	0.000000	0.000000	1.100000	2.000000
72	ART	3	Thu	0	00:00	39.0	0.095299	0.000000	2.000000	0.360655	0.057751	0.130072	3.784456	4.546027	21.998469	0.000000	0.000000	0.000000	0.550000	1.620000
73	ART	3	Thu	1	01:00	39.0	0.048718	0.000000	1.500000	0.246952	0.039544	0.060985	5.069009	5.701831	33.637380	0.000000	0.000000	0.000000	0.040000	1.082000
74	ART	3	Thu	2	02:00	39.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000

Using different bin sizes for bydatetime and summary dataframes#

By default, whatever you specify for the bin_size_minutes parameter (default is 60 minutes) is the resolution at which the bydatetime table is created. However, if for some reason you want to create (and save) a version of the bydatetime table with smaller time bin sizes, you can do it.

There is a highres_bin_size_minutes parameter that you can set to a smaller value than bin_size_minutes if you would like to compute occupancy in the bydatetime table at a finer resolution but still want to report aggregate statistics using bin_size_minutes. For example, you could set highres_bin_size_minutes=10 but keep bin_size_minutes=60. Using the default settings in hillmaker (see next section), this will NOT affect the aggregate statistics. However, it allows you to create a separate version of the bydatetime table at this higher resolution for further analysis. In order to save the high resolution version, set keep_highres_bydatetime=True.

The `edge_bins` parameter and its impact on occupancy calculations#

Since hillmaker’s earliest days, it has always been possible to treat the arrival and departure bins in two different ways. The default behavior, edge_bins=1, uses the method described in this notebook in which a fractional occupancy contribution is computed based on the fraction of time the entity was in the system during the arrival and departure bins. However, if you really want to give the “full credit” for occupancy during the arrival and departure bins (i.e. use a value of \(1.0\) instead of the fraction of the bin occupied, you can set edge_bins=2.

Warning

Using edge_bins=2 with coarse time bins and short lengths of stay can lead to dramatic overestimates of occupancy.

If you do use edge_bins=2 for some reason, you should consider setting the highres_bin_size_minutes to a small value to mitigate overestimating occupancy. Quite honestly, we’ve really only kept this option around for research purposes.