Utilization of big data to improve management of the emergency departments. Results of a systematic review

Corrado De Vito1, Carolina Di Paolo1, Annamaria Mele1, Silvia Iorio2, Giuseppe Migliara1, Grazia Pia Prencipe1,
  1. Department of Public Health and Infectious Diseases, Sapienza University of Rome
  2. Department of Molecular Medicine, Sapienza University of Rome


Background. The emphasis on using big data is growing exponentially in several sectors including biomedicine, life sciences and scientific research, mainly due to advances in information technologies and data analysis techniques. Actually, medical sciences can rely on a large amount of biomedical information and Big Data can aggregate information around multiple scales, from the DNA to the ecosystems. Given these premises, we wondered if big data could be useful to analyze complex systems such as the Emergency Departments (EDs) to improve their management and eventually patient outcomes.
Methods. We performed a systematic review of the literature to identify the studies that implemented the application of big data in EDs and to describe what have already been done and what are the expectations, issues and challenges in this field.
Results. Globally, eight studies met our inclusion criteria concerning three main activities: the management of ED visits, the ED process and activities and, finally, the prediction of the outcome of ED patients. Although the results of the studies show good perspectives regarding the use of big data in the management of emergency departments, there are still some issues that make their use still difficult. Most of the predictive models and algorithms have been applied only in retrospective studies, not considering the challenge and the costs of a real-time use of big data. Only few studies highlight the possible usefulness of the large volume of clinical data stored into electronic health records to generate evidence in real time.
Conclusion. The proper use of big data in this field still requires a better management information flow to allow real-time application.


There is a huge and increasing interest on Big Data and its applications, even if there is not still a unique definition for it. Big Data could be defined as a massive volume of heterogeneous data [1], or as electronic data sets so large and diverse that they cannot be easily managed with traditional software [2]. But the clearest and common definition is through the “4 Vs”: volume (scale or quantity of data), velocity (speed and analysis of real-time or near-real-time data), variety (different forms of data, often from disparate data sources), and veracity (quality assurance of the data). The first 3 Vs are found in most literature [3,4] and could be considered distinguishing characteristics of big data [5]. The fourth V could be considered as a goal [6]. Thus, the label “big data” refers to more than just the volume of data; it also refers to the use of a wide variety of sources, such as genetic sequences, social media, purchase records, mobile device tracking, medical monitoring devices, electronic health records (EHRs), wearable video devices, health-related mobile phone apps, and other electronic traces left by the progressive digitization of modern life [7].
Despite the challenges that big data needs to overcome, the advanced analytics that are promised through big data offer incredible opportunities for most stakeholders in health care. Emergency Departments (EDs) are one of the most important units of hospitals and one of the main way of patient’s admission. EDs produce daily a vast amount of data and can really benefit from big data analytics which, in turn, may offer a great opportunity to improve resource use, reduction of costs, optimize supplies and staffing, decrease wait times and eventually improve the quality of care provided to patients and outcomes [8]. Gathering also nontraditional digital information, such as social media (like Facebook and Twitter), Google Trends and environmental data could be useful to perform disease surveillance and to operate in real-time in emergency situations.
Two of the most important opportunities offered by big data in ED are the development of useful forecasting tools and simulation models [9]. Classic prediction models have few variables and need to be very simple, to be easily managed by providers, but with the advances in technology it can be possible to have tens or hundreds of variables enabling a more tailored medicine, able to manage patients with complex disease too. This prediction tools could consider the entire patient’s clinical profile and benefit from nonclinical factors [10].
The aim of this systematic review is to analyze the results of the studies that implemented the application of big data in ED and to describe what have already been done and what are the expectations, issues and challenges in this field.


Literature search and eligibility criteria
To describe the application of big data in ED, a systematic review of the literature was performed between January 2017 and March 2017 through PubMed and Scopus databases, using the following search string: (big data) AND (emergency care OR emergency department). The articles were retrieved from electronic database and hand search of references of retrieved studies and duplicates were removed. After screening for title some articles were excluded. Articles were considered eligible if the studies focused on the topic of the application of big data in ED and were in English language. No restrictions were applied to type of publication (e.g. editorials papers, short reports, systematic review, conference proceedings, commentaries, books reviews, dataset) and to setting and country of the studies. Two authors reviewed abstracts and full text of the resulting articles and disagreements were resolved by consensus. Articles not pertinent to big data and management of ED or emergency care or not providing sufficient details were excluded.
Data collection and analysis
Two authors independently extracted results from the retrieved articles, disagreements were resolved by a third author. Reviewers used a summary table to identify the key points from each article. The most relevant trends founded were discussed by researchers during one more consensus meeting after which reviewers were able to identify some common themes that emerged about the application of Big Data in Emergency Department that permeated multiple articles.


Globally, eight articles met the inclusion criteria and were included in the systematic review (Figure 1). Because of the relevant heterogeneity of emerged themes, the retrieved articles were categorized into specific subtopics. Researchers identified three subtopics described separately below: (a) management of ED visits; (b) ED process and activities; (c) prediction of the outcome of ED patients (Table 1).
Management of ED visits
Araz et al. [11], Ram et al. [12] and Scales et al. [13] investigated the possibility to use big data to predict the amount of ED visits in specific situations to improve the management of patient’s access, triage and high-cost patients. First two studies correlated internet data with data from electronic health records (EHRs) and clinical data to predict the amount of ED visits. Araz et al. [11] developed, through a cross-sectional study, a forecasting model to better manage hospital and other health care resources and improve service quality in near real time during the seasonal influenza outbreak. To evaluate which were the most accurate variables to predict influenza-like-illness-related (ILI-related) ED visits, authors performed a correlation analysis using the following variables: Omaha and Nebraska Google Flu Trends (GFT) data, Douglas County positive laboratory test results for influenza antigen, Douglas County ILI surveillance data, and Nebraska ILI provider data. After the analysis to estimate ILI-related ED visits, they showed that the use of GFT data can greatly improve statistical forecasting and surveillance of ILI-related ED visits. Ram et al. [12] introduced a new prediction model to estimate the number of asthma-related ED visits in a specific area. Authors combined asthma-related ED visits data from EHRs, social media data from Twitter, internet users’ search interests from Google and pollution sensor data from the United States Environmental Protection Agency (EPA), all coming from the same geographic area and the same range of time. By the analysis of the relationship between data from ED visits with each data source, they showed that the higher prediction accuracy was achieved combining air quality data with Twitter data. The aim of Scales et al.’s [13] retrospective cohort study was to assess both the incidence and the variables associated with revisits to ED of patients affected by kidney stones. The data of all patients of California were collected from three databases: the California Emergency Department Database (SEDD) that collect the total sample of ED visits which do not lead to hospitalization; the California State Inpatient Database (SID) that collect also the ED hospitalizations; the State Ambulatory Surgery Database (SASD) that gather all the ED revisits that required urgent ambulatory procedures. Through multivariate analysis, the authors showed that the 11% of all patients had at least one secondary ED visit within 30 days of the first visit. Analyzing covariates of all the cohort study, the authors did not found differences in revisits proportion between males and females, while small difference emerged in Hispanic and young patients. Moreover, revisits were directly associated to insurance status (the risk was lower in patients with private insurance vs. Medicaid payee), and inversely associated to the density of urologic care in different residential areas and the performance of a complete blood count at the first ED visit.
Emergency Department process and activities
Three studies described how to increase the quality and safety of first aid management, creating real-world simulations using advanced technologies capable of capturing large data volumes [14, 15, 16]. Chong et al. [14] aimed to build an efficient operating system to improve the management of the EDs taking into account waiting time, occupancy, admission volume and staff numbers. The authors built a patient flow model integrating a qualitative phase (conceptual model development, expertise review, conceptual model finalized) with a quantitative phase (stock and flow development, model equations development, data integration, scenario design and testing, sensitivity analysis, model validation). Through the use of a system dynamic model, which is an approach to understand the nonlinear behavior of complex systems over time, the inputs (i.e. acute bed numbers, maximum waiting room capacity, average staff numbers per site), collected from different sources (expertise review, Hong Kong Hospital Authority, Hong Kong Census and Statistics Department statistics, Accident & Emergency Medicine Academic Unit), were integrated into the patient flow model to estimate the outputs (i.e. duration in ED, occupancy in waiting room, acute bed occupancy). A pilot simulation showed that modulation of some parameters, like staff number and bed capacity, can improve the time of permanence in the EDs during daytime. Kuo et al. [15] described a methodology to improve the collection of data about patients’ activities in order to build accurate models of ED processes and activities through the use of radio frequency identification (RFID) system. The RFID system consists in three components: RFID tags (embedded into a wristband given to patients during the registration in ED), RFID readers and middleware. The authors claim that such a RFID model can be used by the ED to examine the impact of various factors that happen inside the ED and to support in decision making paths. The aim of Bruballa et al. study [16] was to obtain knowledge about the ED behavior in emergency situations (e.g. pandemics, mass accidents), from data generated by a simulation of the real system. Data were generated by an ED simulator (modelled on a real system), integrated and stored in a data warehouse and then analyzed using data mining techniques to observe patterns and trends. The simulator included six primary areas (admission area, triage boxes, waiting rooms, diagnosis boxes, treatment boxes and x-ray area) and five different types of active agents (patients, admission staff, nurses, doctors and x-ray technicians). The authors believe how actually, the only way to best manage emergency situations is to use simulation model to better understand real situations that may arise.
Prediction of the outcome of Emergency Department patients.
Yang et al. [17] and Taylor et al. [18] described the potential use of big data in predicting health conditions of patients admitted in ED. Yang et al. [17] described how fractional information from different data sources (i.e. ambient non-invasive data sensors supported by appropriate collecting techniques), can be gathered in real-time and can be applied for specific studies, integrating these information with expert knowledge of clinicians into an automated learning process. As an example of the benefits of using massive data for early trauma outcome prediction, authors described the utilization of photoplethysmograph waveform, of the corresponding each-minute heart rate and of and peripheral capillary oxygen saturation measured through a pulse oximeter and of associated ECG to discriminate internal bleeding from non-bleeding patients and to predict autonomous resuscitation in a trauma center. In Taylor et al.’s article [18], a retrospective cohort study, the authors highlighted the strength of using large volume of data stored in electronic health records (EHRs) of health care systems to improve the quality and the timing of mortality prediction in patients admitted in ED with sepsis. Clinical data of adult patients admitted in ED with a diagnosis of infection or with systemic inflammatory response syndrome (SIRS) from October 2013 to October 2014 were collected from EHRs of four trauma center and include various information: demographic data, ED procedures, laboratory results, nursing interventions, past medical history, vital signs etc. All data collected were analyzed using three models (Logistic Regression Model, classification and regression tree model, Random Forest Model). The predictive accuracy of each model was compared with traditional clinical decision rules (CDRs) accuracy and validated analytic models (Confusion, Urea Nitrogen, Respiratory Rate, Blood Pressure, 65 years of age and older score, Mortality in Emergency Department Sepsis score, Rapid Emergency Medicine Score). They showed that the random forest model was more accurate in estimating mortality for sepsis than the traditional models, proving the usefulness of local big data-driven approach.


The changes that take place in the health of a population show themselves explicitly within EDs that, for this reason are considered the ‘canary in the coalmine for the healthcare system’ [19]. Overcrowding in EDs is producing negative effects in different international contexts about patient care: delays in the treatment of serious illness, increase in patients who sign their discharge without being visited and, lastly, increase in mortality rates [20, 21, 22, 23, 24]. Many of these aspects and issues can be attributed to the inadequacy of accesses in the ED, providing us with a clue to identify weaknesses and critical problems within the primary care system, and, in a more general mind-set, about the management of psychological, social, and socio-economic support.
It is well known that access to the Emergency Room, and therefore the demand for emergency services by the population, is heavily dependent on timely and satisfactory access to primary care services as well as the distribution of risk factors regarding acute, chronic, traumatic, and mental illness, and, lastly, by the percentage of socio-economic vulnerable people in each territory or area.
This complex network of details and specific factors can be understood and studied through ED-based Big Data Research. Through the analysis of the situation in EDs, it is possible to create simulation models that work as sensors of the actual system, succeeding in helping to plan actions that can act as decision support to prevent Emergency Medicine from becoming the epicentre of the healthcare system [25].
Therefore, in noting the usefulness of such a system of data to support the measurement of a population’s health needs, the critical aspect that lies in the analysis of the Big Data from EDs concerns the need to synthesize and summarize these data with other information sources, including those from the data banks of territorial health services, social services, justice and legal services, educational services, etc. Other non-strictly medical data sources consist of trends that can be obtained from various web search engines or socials, such as Google Flu Trends, Facebook or Twitter. However, the accesses to a huge amount of data per-se is not a guarantee of success, and indeed poses new challenges, such as storage, quality, standardization, analysis, and interpretation of the data collected.
One of the biggest challenges in utilizing data generated from different sources is that data are often unstructured and non-standardized, making difficult to share them even within the same organization [26]. Furthermore, the quality of these data is often suboptimal [27] and most of the EHRs relies on self-reported data [26]. Data obtained from the web or from the socials often overestimate the variables of interest and are prone to trends manipulation [28]. Currently GFT is no longer publicly accessible [29].
Regarding the statistical analysis, a main challenge is represented by the hyper-dimensionality of the data, i.e. the individual characteristics greatly exceed the individuals observed. The statistical analysis techniques that are able to handle such a mass of data (i.e. machine learning, data mining, high-dimensional correlation) have not yet been clearly defined in their use in the medical field [30] and, above all, they are often used without a background theory on the relationships they want to examine [31].
Real-time implementation of big data in clinical context has been deepened little in literature, although it is a fundamental step for their daily use. As show by our systematic review of the literature, algorithms and models have been built to take advantage of big data to improve the management of different aspects of ED: management of visits, process and activities and outcome of patients. Although these models have been proved to be useful for better manage ED activities, like planning clinical and economical resources and improve clinical outcome in specific situation, they have been applied only in retrospective study, not considering the challenge and the costs of a real-time use of big data. Only few studies highlight the possible usefulness of the large volume of clinical data stored into EHRs to generate evidence in real time [32]. However, some issues about the globalization of these data emerged in literature [26]: fragmentation of data, language barriers, different terminology used, data acquisition and cleansing and lack of data standardization [26].
In conclusion, two of the most important opportunities offered by big data in ED are the development of useful forecasting tools and simulation models [9]. Classic prediction models have few variables and need to be very simple, to be easily managed by providers, but with the advances in technology it can be possible to have tens or hundreds of variables enabling a more tailored medicine, able to manage patients with complex disease too. This prediction tools could consider the entire patient’s clinical profile and benefit from nonclinical factors [27].
Processing Big Data and performing real–time actions in critical situations is a challenging task [1] that needs increasing knowledge and experience to make it possible in the near future.


Figure 1. Flow diagram for selection of studies included in the Systematic Review


Table 1. Summary characteristics of the studies included in the systematic review


  1. Rathore MMAhmad APaul AWan JZhang D. Real-time Medical Emergency Response System: Exploiting IoT and Big Data for Public Health. J Med Syst. 2016 Dec;40(12):283.
  2. Perry DC, Parsons N, Costa ML. ‘Big Data’ reporting guidelines: how to answer big questions, yet avoid big problems. Bone joint J 2014;96-b(12): 1575-7.
  3. McAfee A, Brynjolfsson E. Big data: the management revolution. Harv Bus Rev 2012 Oct;90(10):60-6, 68, 128.
  4. Heudecker N. Hype Cycle for Big Data. Gartner. 2013 Jul 31. URL:https://www.gartner.com/doc/2574616/ hype-cycle-big-data-.
  5. May M. Big biological impacts from big data. Science, 344 (2014), pp. 1298–1300.
  6. Kayyali B, Knott D, Van Kuiken S. The big-data revolution in US health care: accelerating value and innovation. McKinsey & Company. 2013 Apr. URL: https://digitalstrategy.nl/wp-content/uploads/E2-2013
  7. Cate F.H The big data debate Science, 346 (2014), p. 818.
  8. Hillestad R, Bigelow J, Bower A, Girosi F, Meili R, Scoville R, et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Aff (Millwood) 2005;24(5):1103-1117.
  9. Wong, H.T. Biometeorological Modelling and Forecasting of Ambulance Demand for Hong Kong: A Spatio-Temporal Approach. Ph.D. Thesis, The University of Hong Kong, Hong Kong, China, February 2012.
  10. Janke AT, Overbeek DL, Kocher KE, Levy PD. Exploring the potential of predictive analytics and Big Data in Emergency Care. Ann Emerg Med. 2016 Feb;67(2):227-36.
  11. Araz OM., Bentley D., Muelleman RL. Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska. Am J Emerg Med, 2014; Sep;32(9):1016-23.
  12. Ram S., Zhang W., Williams M., Pengetnze Y. Predicting Asthma-Related Emergency Department Visits Using Big Data. IEEE J Biomed Health Inform, 2015; Jul;19(4):1216-23.
  13. Scales CD. Jr., Lin L., Saigal CS., Bennett CJ., Ponce NA., Mangione CM., Litwin MS. Emergency department revisits for patients with kidney stones in California. Acad Emerg Med, 2015; Apr;22(4):468-74.
  14. Chong M.,Wang, M., Lai1 X., et al. Patient Flow Evaluation with System Dynamic Model in an Emergency Department. IEEE International Congress on Big Data, 2015; New York.
  15. Kuo Y-H., Leung J. M.Y., Tsoi K.K.F., Meng H. M., Graham C.A. Embracing Big Data for Simulation Modelling of Emergency Department Processes and Activities. IEEE International Congress on Big Data, 2015; New York.
  16. Bruballa E., Taboada M., Cabrera E., Rexachs D., Luque E. Simulation and Big Data: A Way to Discover Unusual Knowledge in Emergency Departments. International Conference on Future Internet of Things and Cloud, 2014; Barcellona.
  17. Yang S., Njoku M., Mackenzie CF. ‘Big data’ approaches to trauma outcome prediction and autonomous resuscitation. British Journal of Hospital Medicine, 2014; Nov;75(11):637-41.
  18. Taylor R.A., Pare JR., Venkatesh A.K., Mowafi H., Melnick E.R., Fleischman W., Hall M.K. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach. Acad Emerg Med, 2016; Mar;23(3):269-78.
  19. Kamal N, Addressing Emergency Department overcrowding through a systems approach    using big data research. Journal of Health & Medical Informatics. 2014; 5(1) 148.
  20. Derose SF, Gabayan GZ, Chiu VY, Yiu SC, Sun BC. Emergency Department Crowding Predicts Admission Length-of-Stay But Not Mortality in a Large Health System. Medical care. 2014;52(7):602-611.
  21. Guttmann A, Schull MJ, Vermeulen MJ, et al. Association between waiting times and short term mortality and hospital admission after departure from emergency department: population based cohort study from Ontario, Canada. BMJ. 2011;342.
  22. Miro O, Antonio MT, Jimenez S, et al. Decreased health care quality associated with emergency department overcrowding. Eur J Emerg Med. 1999;6:105–107.
  23. Sun BC, Hsia RY, Weiss RE, et al. Effect of Emergency Department crowding on outcomes of admitted patients. Ann Emerg Med. 2013;61:605–611.
  24. Chalfin DB, Trzeciak S, Likourezos A, et al. Impact of delayed transfer of critically ill patients from the emergency department to the intensive care unit. Crit Care Med. 2007;35:1477–1483.
  25. Pines JM, Decker SL, Hu T. Exogenous predictors of national performance measures for emergency department crowding. Ann Emerg Med. 2012;60:293–298.
  26. Kruse CS et al challenges and opportunities of big data in health care: a systematic review. JMIR Med Inform. 2016 Nov 21;4(4):e38.
  27. Janke AT et al. Exploring the potential of predictive analytics and big data in emergency care Ann Emerg Med. 2016 Feb;67(2):227-36.
  28. Lazer D et al. The parable of google flu: traps in big data analysis.
  29. Available at: https://research.googleblog.com/2015/08/the-next-chapter-for-flu-trends.html
  30. Binder H et al. Big data in medical science – a biostatistical view. Dtsch Arztebl Int. 2015 Feb; 112(9): 137–142.
  31. Coveney PV, Big data need big theory too. Philos Trans A Math Phys Eng Sci. 2016 Nov 13; 374(2080): 20160153.
  32. Bates DW, Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Affairs, 2014; Jul;33(7):1123-31.