Home » Blog » analytics » Watson from IBM: Why semantic text tech helps analytics » Page 3
If you're new here, you may want to first register and subscribe to the RSS feed. Thanks for visiting!
This example also illustrates nicely why this is a valuable computation technique. If you’re able to “lazily” (a technical term) leave data unstructured until it’s value is certain, you can eliminate the significant design, storage, and data entry associated with database schemas.
We briefly touched on IBM’s Watson previously in our article on Cisco’s CES talk on Internet of Things. The Internet of Things will create a lot of data. Some of it, especially in hindsight, will not be optimally structured. And, as our example above illustrated, it is actually more efficient to leave data unstructured when there is a great deal of it and the relative future importance of various features is uncertain. That’s where Watson will come to play.
Apparently, IBM Watson could mine text very well for Jeopardy answers, somehow. But what is this technology, exactly? Can it do something besides play a mean game of Jeopardy?
Medicine like playing Jeopardy!?
IBM’s first killer application for Watson is assisting medical doctors in keeping up with the latest research. A huge amount of medical research is published each year. Articles sometimes provide new insights into diagnosis and treatments of disease, especially the more exotic and interesting cases. But the amount of new literature is vast and nearly impossible for specialists to keep up with, let alone your average general practitioner.
Enter Watson, which can understand large quantities text almost like a human. It can also answer “natural language” questions from humans (medical doctors) and respond to those questions in a natural way, as proved on Jeopardy. (It doesn’t really yet “understand” the way a human does. But it is able to create statistical models of the meanings of questions and text. So, when it is asked a question by a human, it is able to find the medical articles that are statistically most likely to be relevant in answering that question, and present that result back to the doctor.
Now, to be perfectly fair, this may not be an entirely new concept. (More on why tech to search scientific articles isn’t new in a bit.) Moreover, medical articles play to the computer’s strengths (much like the game of chess in that other famous IBM exhibition). While they are pretty fair along in the continuum of structured versus unstructured text, medical articles still have more structure than an average newspaper article or short story. Researchers go through a precise ritual when writing a medical article. There’s an abstract, introduction, conclusion, and so on. Space is very limited, so research describe concisely what they are doing in a set number of words in each section following a pre-set scientific style. (This is unlike a classic novel by, say, Agatha Christie or Lewis Carrol, which might jump from first-person to third-person narration or switch to prose mid-story.) Scientific articles use a lot of jargon. This also plays to the strength of computers, which can have a potentially unlimited vocabulary.
Searching scientific articles isn’t new tech
Scientific articles use citations to link paragraphs and sentences to other scientific articles. These citations also follow one of a small number of allowed formats, which provide a standardized reference intended to retrieve the cited article. The vast majority of these other articles, going back several decades, will already be on-line. Again, advantage computer, since it will be able to instantly retrieve and scan each citation to learn more about the meaning of the article. The poor human specialize must either already be familiar with the article (as is sometimes the case with highly cited articles in specialized fields) or spend time reading and retrieving it.
Moreover, in many cases the National Library of Medicine (NLM) and similar groups have electronically annotated cited articles in a machine-readable way. (Each discipline has their own system, but medicine and biology often use the MeSH ontology.) This was originally intended to speed researcher’s searching for related articles in PubMed/Medline (the online electronic article abstract searching system set up by the NLM, which took the place of multiple similar commercial services in the 1990s). If you knew (or know) the MeSH terms for the subjects you are interested in, you can pull their abstracts over the Internet via Medline. This, in turn, sometimes allows access to the full-text articles on publisher sites.
Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011
Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011
Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011
There are 7 comments so far
Leave a Comment
Don't worry. We never use your email for spam.Recent Comments
- florimee on genetic disease turns you into a real-life vampire
- Acculation on Alien Pioneer plaque starmap to 3D printed jewelry transmedia: maker movement data-driven multiplatform media
- Acculation on Free Video Data Science Assessment Tool
- Acculation on Free Business Advice Chatbot Product
- Acculation on Online Consultation with Dr. Krebs (Big Data and Management Consulting)
We were curious to know what the folks at IBM thought about some of our proposed uses for Watson, so we posted to the IBM developer forum. Will Sennett of IBM was kind of enough to write a detail response on the IBM site.
Here’s an excerpt: “I’d have to dig a bit more at the FBI assistant example … certainly solutions in the big data and analytics realm that are a great fit for government…. On the HR side, I think you’re spot on. In fact, one of our Watson Mobile Developer
[Waston] application difficulty and complexity is probably dependent on the data ….”
Read his full response on the IBM forum.
[…] recent articles on IBM Watson analytics and Google Glass generated a lot of interest with people contacting us privately to ask for advice […]
[…] are all the trolls on the Internet? We have done our best to tick people off in this blog. We skewered Google Glass. We did not have kind words for IBM Watson’s marketing department. We’ve even poked fun […]
[…] been a fan of IBM’s Watson semantic meaning analytics system since IBM first announced they were opening up their ecosystem. Around the time of CES we pointed […]
[…] up the topic of semantic text systems. In our earlier article from April, we mentioned a “bear in the woods” scenario. The idea there is that structured data, such as the forms used in hospital […]
[…] of our most read articles have been on IBM Watson, including suggestions & possible alternatives. We’ve pushed IBM several times to come up with better demos for […]
Twitter comments updated.