This is a 2005 visualization of the Internet IP addresses (or an Internet map). Its an example of data visualization for a network. #colors #color #colorful #pretty #tech #tree #data #visualization #internet #map Photo: Wikimedia/Matt Britt/The Opte Project/CC-BY-2.5

Internet map: network data visualization...

Scoreboards are the business original analytics dashboards. This is the Indianapolis Motor Speedway Pylon. #auto #autoracing #racing. #sport #stadium #grey #white #seats #man #blue #sky #clouds #flag #wind #road #race #automobile #car #cars #speed #speedway. Photo credit: Wikimedia/public domain

Data viz: scoreboards as the original an...

#Today's #Photo: @IBM Visualization of @Wikipedia data As well the store of #data, it is often difficult to come up with compelling images of different database technologies. This is a of daily Wikipedia edits activity by bot script "Pearle" on done by IBM. More info can be found in Proceedings of INTERACT (2007). "Visualizing Activity on Wikipedia with Chromograms". The data is several terrabytes in size. So this is "big data." At least for a while, until the equivalent of Moore's law for data storage makes it small data in a few decades or so. :) data Photo: Wikimedia/Fernanda B. Viégas/CC-BY-2.0

A Visualization of Wikipedia Data...

Wolfram mathematica: data visualization ...

Napoleon in Russia: Classic Historic Historical 19th Century Infographic Data Visualization Figure Drawing Illustration Art Artwork Graphic French Russian. Photo credit: public domain

Napoleon in Russia: Classic 19th Century...

Home » Blog » analytics » Watson from IBM: Why semantic text tech helps analytics » Page 2

24
Apr

Watson from IBM: Why semantic text tech helps analytics

Posted by Acculation in analytics, Art, artificial intelligence, crowdsourcing, education, Featured, Internet of Things, math, tech, unstructured data, watson with 7 comments.

IBM's Watson natural language processing software is named after IBM founder, Thomas J. Watson, Sr., pictured here in this 1920s photo from IBM's corporate archives. Photo: Wikimedia/IBM/CC-BY-SA-3.0

More significantly, the “bad guys” (if you will) are aware of these databases, and may take steps to evade automated surveillance. They may constantly change their calling phone number to avoid being blacklisted. Thus, real humans are still needed to go into these databases and study the unstructured text. If you see people writing descriptions of the same kind of illegal phone call coming from constantly changing numbers, you want to assign someone to investigate further. This is why unstructured text is required: there is no way to know in advance how scammers or criminals will behave. If you tried to anticipate every possible situation with a pre-designed form, the “bad guys” would modify their behavior to “game” the predesigned form an avoid surveillance.

FBI and Crimestoppers databases and Dr. Watson

The more important examples are things like the FBI IC3 database and the Crimestoppers database. In part for the reasons we just mentioned, unstructured text plays an essential role in the complaint used by citizens. So, at present, a human is still required to read each complaints. Since any one human can only read a small part of these databases, they might miss complex crime trends. This is where a technology like Watson might come in. Watson could read every complaint in FBI IC3 or Crimestoppers, and potentially catch significant details in the aggregate unstructured text that would be overlooked by necessarily more myopic humans. (Note that since every scam involving the Internet is potentially an FBI IC3 complaint, this database of citizen complaints is huge. Most complaints are ultimately ignored due to lack of resources. Watson could help here by connecting dots that otherwise are overlooked.)

We’ve mentioned Yelp for finding restaurants. Although it’s controversial (and there may be legal hazards to those making complaints) it also serves as a repository for citizen complaints against businesses. (There are many other companies recording these kinds of complaints, with various levels of complaint vetting, different business models, and different amounts of controversy.) Yelp and similar databases are often largely public, so the government (or Yelp’s owners) could have Watson analyze this data to find previously undiscovered trends as well.

The IBM Watson team is asking people with suggestions for how Watson’s technology might be used to make submissions to them. Since this application occurred to us, we thought we’d just throw it out here. (IBM has specific guidelines. For example, the application must involve unstructured text, as well as involving human language questions for Watson to answer. In this case, the questions to Watson would be about ideal allocation of resources, or whether there were unusual new patterns appearing in the aggregate submissions.)

Dr. Watson the spook

We’re limiting our discussion here to unclassified government databases, which must be huge and full of unstructured text. Of course, the government also has those “other” databases we’ve heard so much about recently. We’re guessing Watson already passed the background checks for a security clearance, so that’s one’s probably covered. It’s pretty obvious the technology hasn’t made it around to “more civilian” government applications yet, however.

We’re guessing IBM’s #Watson already holds a security clearance.

Click To Tweet

Forming a posse in the digital age

Maybe semantic processing of text and crowd-sourced law enforcement is how you form a posse in the digital age. (Or maybe we just think that because Americans located out West with an engineering bent. Posses have obvious legal and ethical hazards — we not being entirely serious in our enthusiasm for the crowd-sourced posse thing here. Perhaps on the emerging risks side, these new technologies might facilitate a dangerous new vigilantism.)

Does #watson semantic text processing let you form a posse in the digital age?

Click To Tweet

The problem with unstructured text

To date, computers have been mainly good at processing structured text. This means a designer or software developed needs to anticipate in advance how data will be used. They need to carefully design a database schema or web form that captures these use cases in advance.

Humans, on the other hand, are especially good at processing unstructured data. (We may create a structure around that data after the fact.) From an evolutionary standpoint, this makes sense. The importance of things we encounter may not become clear until after the fact. Therefore, it’s usually not possible to design a structure around this data in advance.

There’s a bear in the woods, or, Watson come quickly.

Usually, in these kinds of evolutionary examples a lion is used. We’ll use a bear instead. Let’s say there’s a bear in the woods. You thought the bear was tame, but in hindsight it turned out very dangerous. You previously thought the information you had about bears was not valuable, so you didn’t bother to organize it since that would involve unnecessary effort. If you’re able to go back and extract crucial information about bears from this previous unstructured data, your genes are much more likely to be passed on. Similarly, the vast majority of information streaming in to your senses will later to out to be extraneous. By being able to leave this data unstructured until its importance becomes clearer, you’re able to save energy (computation effort) and will have an evolutionary advantage.

Unstructured text helps when the bear in the woods turns out dangerous. #watson

Click To Tweet

1 2 3 4

Tagged: abstract, analytics, art, business, careers, classic, data, education, famous, gadget, intelligent, Internet of Things, math, mine, more, novel, post, science, space, story, tech, us, watson, wolfram
7
0

Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011

There are 7 comments so far

[email protected] Author

10 years ago · Reply

We were curious to know what the folks at IBM thought about some of our proposed uses for Watson, so we posted to the IBM developer forum. Will Sennett of IBM was kind of enough to write a detail response on the IBM site.

Here’s an excerpt: “I’d have to dig a bit more at the FBI assistant example … certainly solutions in the big data and analytics realm that are a great fit for government…. On the HR side, I think you’re spot on. In fact, one of our Watson Mobile Developer
[Waston] application difficulty and complexity is probably dependent on the data ….”

Read his full response on the IBM forum.
Reviews of our app, or working more with governments on air quality | Acculation

10 years ago · Reply

[…] recent articles on IBM Watson analytics and Google Glass generated a lot of interest with people contacting us privately to ask for advice […]
Oh send in the trolls. Oh where are the trolls? There aren't any trolls.... | Acculation

10 years ago · Reply

[…] are all the trolls on the Internet? We have done our best to tick people off in this blog. We skewered Google Glass. We did not have kind words for IBM Watson’s marketing department. We’ve even poked fun […]
open semantic meaning platforms: alternatives to IBM Watson? | Acculation

10 years ago · Reply

[…] been a fan of IBM’s Watson semantic meaning analytics system since IBM first announced they were opening up their ecosystem. Around the time of CES we pointed […]
Ebola: Can big data or semantic text help?

10 years ago · Reply

[…] up the topic of semantic text systems. In our earlier article from April, we mentioned a “bear in the woods” scenario. The idea there is that structured data, such as the forms used in hospital […]
Ted Talks on IBM Watson & Bayes' rule in evolution

9 years ago · Reply

[…] of our most read articles have been on IBM Watson, including suggestions & possible alternatives. We’ve pushed IBM several times to come up with better demos for […]
Acculation Author

9 years ago · Reply

Twitter comments updated.

Don't worry. We never use your email for spam.

Watson from IBM: Why semantic text tech helps analytics