It’s finally Spring, even in Boston, and our yard is atwitter. Birds are tweeting from the dogwood tree and yew hedge while waiting their turn to grab sunflower seed from the feeder. Our backyard birds include a brilliant scarlet Mr. Cardinal who trails along behind his rust-and-brown Missus. We’ve been pleasantly surprised to see a pair of bluebirds take turns on the suet. Every morning velvet brown doves stroll about collecting seed from the ground and little black-capped chickadees cluster on the feeder. It’s good to see birds and people basking in these warm sunny Spring days.
What 140 characters can tell us
These days, as it turns out, people are tweeting more than the birds through Twitter, a free online social networking service. Twitter allows short 140-character posts or “tweets” that are published in real-time. Unlike Facebook or LinkedIn which have limited sharing with approved friends and family, Twitter posts are broadcast to the public. Since Twitter is real-time and the information is in the public domain immediately upon posting, it is a source of timely data. And Twitter generates a lot of data. The Internet Live Stats website has a digital clock spinning like hummingbird wings counting real-time tweets around the globe. Around the world there are on average 6,000 tweets per second and over 500 million tweets per day. Internet Live Stats notes that it took 3 years from 2006, when Twitter first launched, to 2009 for 1 billion tweets to be sent; now 1 billion tweets are posted in 2 days.
Big Data is a hot topic in healthcare, so it is no surprise that Health IT folks are interested in Twitter data. Recently Twitter has worked with academic healthcare researchers to share big data generated by billions of tweets. In 2011 Johns Hopkins used Twitter data to spot emerging influenza outbreaks. The CDC tracks flu outbreaks by accessing hospital visits and diagnoses, but this results in several weeks of lag time between the outbreak and data. Real-time tweets essentially saying, “I have the flu” identify cases much sooner.
Predicting behavior and outcomes through big data
Asthma is a major public health condition resulting in almost 2 million emergency department (ED) visits per year. If locations and timing of asthma outbreaks can be predicted, then local public health and healthcare providers can intervene proactively to improve patient outcomes and reduce high-cost ED utilization. Last July researchers at the University of Arizona published an article in IEEE Journal of Biomedical and Health Informatics (vol. 19, no. 4) about using big data to predict asthma-related ED visits. The Arizona research team combined nontraditional digital information like Twitter data, Google search data, and environmental sensor data with ED clinical data to develop an asthma-outbreak predictive model. Research results showed good real-time predictive value.
The value of Twitter data is that it is a huge volume of data publicly published in real-time, but the Twitter dataset has limitations. Both the Johns Hopkins and University of Arizona research groups have had to develop ways to sort through Twitter “noise”. Tweets about flu in the news or concerns about the flu are not the same as a tweet about having the flu. Research teams have created algorithms and data-extraction techniques to identify meaningful tweets. In addition, geographic identification of disease occurrence or outbreaks makes the data useful for public health interventions. Geographic identification is only present in about 3% of tweets.
Unconventional use of datasets to solve public health and community problems has worked in other situations, too. Dr. Atul Gawande, the keynote speaker at the National Kidney Foundation (NKF) meeting this week, will talk about individualized and patient safety. In his 2011 New Yorker article, “The Hot Spotters,” he described Dr. Jeffery Brenner’s use of hospital billing records to identify ED visits associated with victims of serious assault in Camden, NJ. The ED data was used to create crime-victim maps. Dr. Brenner subsequently used ED data to identify “hot spot” housing areas for residents who had high healthcare resource utilization. Today Dr. Brenner is the Executive Director of the Camden Coalition which continues to use data in innovative ways to humanize and de-fragment healthcare.
Healthcare data is no longer just traditional claims data or even clinical EHR data. Social media data from outlets like Twitter creates opportunity for new healthcare insight, particularly as health intersects with social behavior. New data sources like Twitter, Google search, or pervasive sensing technology will offer real-time data insights. Knowing what’s happening now or about to happen through predictive modeling will hopefully enable healthcare to become more proactive and preventive, and more focused on health prevention, management, and wellness than on disease treatment.
I’m predicting a beautiful spring with lots of tweets and a low pollen count, with fewer allergies and a minimal Kleenex utilization. No actual data, just high hopes and a gut feeling!
Dugan Maddux, MD, FACP, is the Vice President for CKD Initiatives for FMC-NA. Before her foray into the business side of medicine, Dr. Maddux spent 18 years practicing nephrology in Danville, Virginia. During this time, she and her husband, Dr. Frank Maddux, developed a nephrology-focused Electronic Health Record. She and Frank also developed Voice Expeditions, which features the Nephrology Oral History project, a collection of interviews of the early dialysis pioneers.