With AI, Your Apple Watch Could Flag Signs of Diabetes
Before modern chemistry brought doctors blood and urine tests for diagnosing diabetes, they had to rely on their taste buds. Sweet-tasting pee has long been the disease’s telltale biomarker; mellitus literally means honey. Too much sugar in your bodily fluids means your metabolism has gone haywire—either your cells aren’t making insulin or they’re not responding to it.
But a little over a decade ago, a group of researchers discovered a less obvious link. One of the complications of diabetes is nerve damage, and in the cardiovascular system that damage can cause irregular heart rates. Which you can measure, either with electricity or light. So one day soon, doctors might diagnose diabetes with their patients’ wrist bling instead of blood pricks or pee strips. Oh, what difference a few centuries make.
In 2005, heart rate sensors were something only elite athletes and very sick people used. Today, one in five Americans own one. Which is why there’s now a deep learning company trying to make something out of the connection between heart rate and diabetes. On Wednesday, at the annual AAAI Conference on Artificial Intelligence in New Orleans, digital health-tracking startup Cardiogram presented research suggesting the Apple Watch’s heart rate sensor and step counter can make a good guess at whether or not a person has diabetes—when paired with the right machine-learning algorithms, of course.
Apple has been eyeing a career change—from personal trainer to personal physician—for its signature wearable for a while now. In November the company teamed up with health insurer Aetna to give away more than 500,000 Apple Watches as part of a pilot to try to reduce health costs. And it embarked on a study with Stanford to test the watch’s skills at detecting irregular heartbeats, which can lead to stroke or heart attack. This most recent collaboration between Cardiogram—a San Francisco-based startup staffed by former Google engineers—and a landmark UC San Francisco heart health study is just the latest in these moves.
Cardiogram offers a free app for organizing heart-rate data from the Apple Watch and devices with similar sensors—from companies like Fitbit, Garmin, and Android Wear. It uses the same kind of artificial neural networks that Google uses to turn speech into text, and repurposes them to interpret heart-rate and step-count data. On its own, that data is mostly meaningless for detecting disease, and not just because the sensors themselves have significant errors. Training a model that can pick out condition-specific patterns requires labeled data. To learn what a diabetic heart rate signature looks, it needs some diabetics.
That’s where UCSF comes in. In 2013 it kicked off a major heart disease project called the Health eHeart study, aiming to collect massive amounts of digital health data on one million people. As of mid-January, the study had registered 196,000 participants, who each fill out a survey about known medical conditions, family histories, medications, and blood test results. About 40,000 of them have also opted to link that information with their Cardiogram app.
“That’s where we get our labels,” says Cardiogram co-founder Brandon Ballinger, who previously worked as a tech lead on Google’s speech recognition software. “In medicine, your labeled answers each represent a life at risk. Compared to what an internet company is working with, it’s actually a very small number of examples.”
So Cardiogram has had to adopt some tricks from the tech world to train its neural network, DeepHeart, to spot human disease. One of these is a technique called semi-supervised sequence learning, which was originally invented to work on text data like Amazon product reviews. But instead of a sequence of words, they sub in a sequence of heart rate measurements—about 4,000 per week. Some fancy math compresses that information into a single number summarizing the amount of heart rate variability. Then those summaries are what get tied to labeled patient data, and the real training can begin.
Using this method, DeepHeart was able to spot diabetics who weren’t part of the training group 85 percent of the time. The results are on par with the company’s previous work: Last year, the Cardiogram and UCSF released results showing that DeepHeart could wrestle one week’s worth of a person’s Apple Watch data into predictions for hypertension, sleep apnea, and atrial fibrillation with accuracy rates between 80 and 90 percent.
So how do Cardiogram’s algorithms make good guesses without directly measuring the amount of sugar in someone’s blood? Nobody really knows.
“Diabetes is very clearly a cardiovascular condition, but it’s not one with an obvious physiological connection to heart rate variability,” says Mark Pletcher, one of the principal investigators of the Health eHeart study and a co-author on the paper presented Wednesday. When you train machine learning algorithms on data without knowing the mechanisms behind the underlying patterns, you often get a signal without understanding why. “It makes me nervous, frankly. We’ve had a lot of internal discussions about whether this could be picking up medications diabetics use or some other extraneous factor. But we haven’t come up with anything.”
That’s the kind of thing that sends up red flags for Eric Topol, a cardiologist and Director of the Scripps Translational Science Institute, where he’s leading the digital health arm of the NIH’s billion dollar Precision Medicine Initiative. “This combines features of the black box of algorithms and the black box of biology,” he says, of the Cardiogram study. “It’s unconvincing and shaky. At best it would be considered hypothesis-generating.” The hypothesis here being that DeepHeart might be picking up a diabetes signal. But it might be picking up something else.
Ballinger is quick to counter these kinds of criticisms. If your wearable tells you you’re at increased risk for diabetes, and you go to the doctor and get diagnosed by traditional means, then you’re still getting the standard quality of care, he says. So what if it’s a black box that gets you in the door? Still, he recognizes the need for prospective validation to really demonstrate the AI’s accuracy—screening people who have not yet been diagnosed with diabetes, and following them to see if they did in fact develop the disease. He says the company is actively investing in those kinds of future studies.
With the right testing, Ballinger sees business potential in his black box intelligence. Cardiogram’s app for Apple Watch and other devices is free today. But the startup plans to add features that advise a user be tested for atrial fibrillation, high blood pressure, sleep apnea, or diabetes as soon as later this year. To stay on the right side of the US Food and Drug Administration, the app can’t function as a standalone diagnostic, more like some friendly advice. But the kind of advice an insurer might cover if they thought it would get people into treatment earlier and save healthcare costs.
Which leaves them a long way to go, given the evidence that’s currently out there. Or rather, lack thereof. “Setting aside the accuracy piece, which is something the FDA would want to know about, there’s almost no data out there on whether or not these wearables can actually change patient outcomes,” says Brennan Spiegel, a gastroenterologist and the director of Health Services Research at Cedars-Sinai in Los Angeles. “Creating the tech isn’t the hard part. The hard part is using the tech to change patient behavior. And that’s really hard to do. It’s not a computer science, it’s behavioral and social science.”
Still, if the Health eHeart and Cardiogram studies can say one thing pretty definitively at this point, it’s that people are eager to engage with apps capable of medical-grade measurements, if and when they become available. The question is if a healthier you is truly just a push notification away.
Fitbit’s new smart watch wants to be a personal medical device.
Science says fitness trackers don’t work. Here’s why you should wear one anyway.
Don’t know the difference between supervised, semi-supervised, and unsupervised deep learning? The WIRED Guide to Artificial Intelligence can help you with that.