The Future of HIV Prevention and Treatment Research – “Big Data”?
By Jason Chiu
Even after three decades of prevention and treatment efforts, HIV/AIDS remains a great public health challenge worldwide. The U.S. is experiencing a concentrated epidemic among men who have sex with men (MSM). Yet, traditional public health strategies struggle to engage MSM in HIV prevention and treatment. MSM continue to have the highest HIV rates, and they were least likely to be tested for HIV, be connected to care, adhere to medications, and survive 5 years after HIV diagnosis. Some researchers have turned to “big data” to gain insights into HIV epidemiology to improve public health preparedness and intervention effectiveness.
What is big data? While there is no clear definition, many people characterize big data based on its large size and complexity. For example, a lot of people immediately associate big data with Internet giants, such as Facebook, Amazon, Google, Twitter, and many others. Let’s take a look at Facebook. Facebook is collecting data from profiles, posts, updates, likes, groups, messages, and other features from its worldwide 1.28 billion users – just imagine the amount of data that has been collected over the years (more than 10 years now). Through complex algorithms, these data have been used to connect individuals (people you may know), improve user experience and interface, and most importantly, maximize target advertisements (profits!!!). These big data have also been used in health research. Google and Yahoo researchers have successfully used their search logs (using influenza-related search terms) to predict influenza patterns ahead of the Centers for Disease Control and Prevention (CDC) to help public health agencies to better prepare for the influenza seasons.
Researchers at CBAM (@uclacbam) and the Center for Digital Behavior (@cdbucla) have begun to explore the power of big data in HIV prevention and treatment. Researchers have looked at HIV-related tweets by using keywords: sexual and drug-related words (e.g. sex, get high, and other street names for drug and sex), and found that counties (Centers for Disease Control and Prevention data) that are disproportionately impacted by HIV had the highest number of HIV-related tweets. This study suggests that Twitter might be a cheaper and real-time alternative for tracking HIV epidemic. In addition, the team has also used Facebook to deliver HIV education to a private virtual community (a Facebook private group). By observing the posts in this private group, the study found that participants who posted about HIV prevention and testing were more likely to request an HIV self-testing kit at the end of study. While these results are promising, more studies are needed to refine the role of big data in the combat against HIV.
With all the affordable and easy-to-use technologies (think about all the data collection/storage/analysis tools), we are at an age of data explosion. So why aren’t people using big data? Well, long story short, it’s complicated and difficult to analyze. For example, when I hear the word data, I think numbers, but that’s no longer true. Posts, searches, and tweets are text data, and most of us in HIV prevention are not trained in machine learning techniques to analyze it. Moreover, the sheer size of these data sets is overwhelming, and the conventional analytical tools are not able to handle it. For example, traditionally, we focus on a few outcomes before and after the intervention, and most people will agree that longitudinal analysis of this kind is difficult. So now imagine dealing with real-time data coming in by the second. Public health practitioners need to foster interdisciplinary collaboration with experts in data mining/visualization and other big data analytical techniques to help make sense of all the data we now have access to.
In conclusion (and the take-home message for the blog), as HIV researchers/practitioners, we should be familiar with big data methods, because big data approaches are going to provide us with real-time surveillance of HIV epidemic and interventions to make better and more cost effective decisions in HIV prevention and treatment.