This is a 2005 visualization of the Internet IP addresses (or an Internet map). Its an example of data visualization for a network. #colors #color #colorful #pretty #tech #tree #data #visualization #internet #map Photo: Wikimedia/Matt Britt/The Opte Project/CC-BY-2.5

Internet map: network data visualization...

#photo: of with device, second smartest on the planet. We're going to do a blog post on the AI superintelligence shortly. It's actually significant (from a convergent evolution and AI design point of view) that dolphins, not chimps, are the second smartest animals on the planet. This was an encore version of our earlier dolphin photo. This US Navy dolphin (in the original photo) has what looks like a camera but is described as a "locator becaon." It looks like a wearable for a dolphin! Check out our earlier photos and blog posts on this subject.

Talk with one of our Ivy League PhD data...

Streamgraph: multidimensional data visua...

Scoreboards are the business original analytics dashboards. This is the Indianapolis Motor Speedway Pylon. #auto #autoracing #racing. #sport #stadium #grey #white #seats #man #blue #sky #clouds #flag #wind #road #race #automobile #car #cars #speed #speedway. Photo credit: Wikimedia/public domain

Data viz: scoreboards as the original an...

Multicolor 3D scatterplot: traditional d...

Home » Blog » analytics » Watson from IBM: Why semantic text tech helps analytics » Page 4

24
Apr

Watson from IBM: Why semantic text tech helps analytics

Posted by Acculation in analytics, Art, artificial intelligence, crowdsourcing, education, Featured, Internet of Things, math, tech, unstructured data, watson with 7 comments.

IBM's Watson natural language processing software is named after IBM founder, Thomas J. Watson, Sr., pictured here in this 1920s photo from IBM's corporate archives. Photo: Wikimedia/IBM/CC-BY-SA-3.0

Thus, a computer system analyzing medical articles like Watson need not strictly limit itself to the human-readable text. It could parse the semi-structured citations data for each paragraph and sentence, and jump from that to previously human-annotated machine-readable MeSH terms for that cited article. From these MeSH terms, it could gain further insights into the meaning of the paragraph or sentence referencing the cited article. (Medical articles typically have a great many citations.) In addition to MeSH terms, there is the machine-readable Scientific Citation Index (and competitors) which seek to quantify the quality, importance, or influence of scientific articles by counting the numbers of times each has been cited in other scientific articles.

However outstanding Watson’s ability to process human-readable text is, it would be foolish for Watson not to use this already machine-readable data to gain additional inssights. Many other fields of scientific endeavor don’t yet have the extensive machine-readable subject annotations such as the MeSH terms that the NLM has painstaking assigned to each article. And other areas of scientific endeavor may rely far more on non-text to convey meaning, such as mathematical equations, theorems, computer code, tables or graphs. Many fields use far fewer citations owing to a much smaller body of relevant literature. Although Watson’s Jeopardy championship gives a clear hint, we won’t know how significant these differences are until there’s a Watson for pure mathematics or geophysical chemistry.

A #Watson for geophysical chemistry may be a ways off. #ibm

Click To Tweet

Of course,, these other fields lack the business model of physicians with the financial resources and a real need for a computer assistant. This makes medical articles a low-hanging fruit for both business and technical reasons.

We briefly suggested it perhaps wasn’t all that new of a technology. Recall we mentioned one of the machine-readable datums associated with a scientific article was a count of how often it had been cited in other articles. This has been used for many decades to attempt to quantify (in a crude way) the quality, popularity, or importance of scientific articles. It was sort of the PageRank of its day. (PageRank, named after Google founder Larry Page, was the original Google search engine algorithm.)

Origins of Google

In fact, this is where Google got it’s start. Recall that Google was originally a PhD project at Stanford to help libraries keep track of scientific articles. Google’s founders, then Stanford PhD students, realized the number of times a web article was linked could be similar to the Scientific Citation Index. Thus, counting links provided a way to numerically score articles. (Prior to that, search engines mainly just looked at the keywords in each article.)

#Watson isn’t completely new. Google started out looking at science articles.

Click To Tweet

From the beginning, every Google search has been implicitly a question. They are a request for the most relevant information on a specific topic. You can even ask Google Jeopardy-like questions. (Well, you know we mean Jeopardy! answers, since questions are answers on Jeopardy!). Originally, those the topics of those questions were intended to be scientific, and the pages returned links to scientific articles.

Of course, Watson is obviously much better at solving Jeopardy! trivia than Google or Google Now. Watson wasn’t connected to the Internet during the Jeopardy challenge. (It had to rely on information it had already downloaded.) It didn’t return a web page, but rather the best sentence giving a concise, human-like solution to the trivia problem. It formulated these sentences on it’s on, by parsing the information in its databanks. (The game requires each solution to be in the grammatical form of a question. This additional twist proves Watson was generating its own, grammatically correct sentences, rather than merely mining existing sentences on web pages.)

Google can do something similar for some simple, common questions that appear to be preprogrammed (“What is the current time in Madrid?”). The search engine Wolfram Alpha that Apple’s Siri sometimes uses can answer an additional universe of questions that appear to be preprogrammed, together with solving some math problems by harnessing a Wolfram-Mathematica engine.

So Watson-like technology is already in products like Siri and Google Now. Of course, these didn’t exist (in their present sophistication) at the time IBM did its Jeopardy! Challenge. Both require access to the Internet to answer questions, and often take a much longer time to respond than Watson was permitted in competition. Often the responses are still lengthy web pages rather than the concise, accurate answers Watson was capable of generating.

#IBM #Watson-like technology is already in Siri and Google Now.

Click To Tweet

So what is Watson, exactly?

Which brings us back to our original problem. What exactly is Watson?

If you visit the Watson developer ecosystem website, the general public only has access to some glossy marketing brochures and videos. These are high-concept and mostly say little about the underlying technology.

(The Wikipedia article on Watson isn’t that much more helpful. It lengthy mostly discusses the Jeopardy! stunt. We assume this is because so little else is known about the system. It does note a medical application in field trial for lung cancer diagnosis. 90% of the nurses in that field trial now rely on Watson’s judgement. The article cites Wolfram’s Alpha, which we mention above in connection with Apple’s Siri, as the main competitor.)

In recent months, IBM has started to explain a bit more, sort-of. They’ve announced and Watson app cloud. They’re still only letting a small number of companies in at the moment. (Apparently Elance is one of them.) Everyone else gets glossy marketing brochures and videos.

Eventually, it seems IBM will publish a public API to Watson, as well as provide a hosted cloud service for Watson-enabled apps, a la a the Google App Engine. (They’re taking sign-ups for people interested in the public announcement.)

One of the initial problems with Watson is that it apparently required a substantial up-front investment. This was in form of a state-of-the-art data warehouse and the staff to run that.

(Our initial suspicions that Watson was built on top of IBM DB/2 were quickly confirmed. In addition to significant software licensing costs, the last time we checked installing and maintaining a DB/2 installation was non-trivial. Developers used to free systems like MySQL may not realize it, but there are multi-million dollar R&D investments in proprietary fast join optimization and parallelization technologies that go into systems like Oracle, SQL Server, or DB/2. This, and the existence of legacy software, is why people put up with the maintenance expense of these systems.)

The Watson cloud

So setting a cloud ecosystem makes perfect sense. IBM will maintain the technology’s complex software and hardware stack. Developers can rent Watson instances a la an Amazon EC2 model. Companies can focus on writing innovative apps, not maintaining complex data warehouse hardware, software, and support staff. The barrier to entry for innovation drops from very substantial to near zero with the cloud-based model.

#IBM #Watson cloud lets companies focus on apps not data warehouses.

Click To Tweet

Elance is mentioned in IBM’s glossy video. We’ve already discussed why searching job descriptions and resumes in a “natural” way could be huge.

Natural searches of job candidates could be huge. #ibm #watson

Click To Tweet

The missing technology for analytics and Internet of Things

Semantic text searching and improved natural language and unstructured data processing are the key missing ingredients for analytics and the Internet of Things. We’re sure to have more on to say on this key emerging technology in the future.

Semantic text processing is the missing tech for #analytics. #Watson

Click To Tweet

1 2 3 4

Tagged: abstract, analytics, art, business, careers, classic, data, education, famous, gadget, intelligent, Internet of Things, math, mine, more, novel, post, science, space, story, tech, us, watson, wolfram
7
0

Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011

There are 7 comments so far

[email protected] Author

10 years ago · Reply

We were curious to know what the folks at IBM thought about some of our proposed uses for Watson, so we posted to the IBM developer forum. Will Sennett of IBM was kind of enough to write a detail response on the IBM site.

Here’s an excerpt: “I’d have to dig a bit more at the FBI assistant example … certainly solutions in the big data and analytics realm that are a great fit for government…. On the HR side, I think you’re spot on. In fact, one of our Watson Mobile Developer
[Waston] application difficulty and complexity is probably dependent on the data ….”

Read his full response on the IBM forum.
Reviews of our app, or working more with governments on air quality | Acculation

10 years ago · Reply

[…] recent articles on IBM Watson analytics and Google Glass generated a lot of interest with people contacting us privately to ask for advice […]
Oh send in the trolls. Oh where are the trolls? There aren't any trolls.... | Acculation

10 years ago · Reply

[…] are all the trolls on the Internet? We have done our best to tick people off in this blog. We skewered Google Glass. We did not have kind words for IBM Watson’s marketing department. We’ve even poked fun […]
open semantic meaning platforms: alternatives to IBM Watson? | Acculation

10 years ago · Reply

[…] been a fan of IBM’s Watson semantic meaning analytics system since IBM first announced they were opening up their ecosystem. Around the time of CES we pointed […]
Ebola: Can big data or semantic text help?

10 years ago · Reply

[…] up the topic of semantic text systems. In our earlier article from April, we mentioned a “bear in the woods” scenario. The idea there is that structured data, such as the forms used in hospital […]
Ted Talks on IBM Watson & Bayes' rule in evolution

9 years ago · Reply

[…] of our most read articles have been on IBM Watson, including suggestions & possible alternatives. We’ve pushed IBM several times to come up with better demos for […]
Acculation Author

9 years ago · Reply

Twitter comments updated.

Don't worry. We never use your email for spam.