Home » Blog » analytics » Watson from IBM: Why semantic text tech helps analytics » Page 2
If you're new here, you may want to first register and subscribe to the RSS feed. Thanks for visiting!
More significantly, the “bad guys” (if you will) are aware of these databases, and may take steps to evade automated surveillance. They may constantly change their calling phone number to avoid being blacklisted. Thus, real humans are still needed to go into these databases and study the unstructured text. If you see people writing descriptions of the same kind of illegal phone call coming from constantly changing numbers, you want to assign someone to investigate further. This is why unstructured text is required: there is no way to know in advance how scammers or criminals will behave. If you tried to anticipate every possible situation with a pre-designed form, the “bad guys” would modify their behavior to “game” the predesigned form an avoid surveillance.
FBI and Crimestoppers databases and Dr. Watson
The more important examples are things like the FBI IC3 database and the Crimestoppers database. In part for the reasons we just mentioned, unstructured text plays an essential role in the complaint used by citizens. So, at present, a human is still required to read each complaints. Since any one human can only read a small part of these databases, they might miss complex crime trends. This is where a technology like Watson might come in. Watson could read every complaint in FBI IC3 or Crimestoppers, and potentially catch significant details in the aggregate unstructured text that would be overlooked by necessarily more myopic humans. (Note that since every scam involving the Internet is potentially an FBI IC3 complaint, this database of citizen complaints is huge. Most complaints are ultimately ignored due to lack of resources. Watson could help here by connecting dots that otherwise are overlooked.)
We’ve mentioned Yelp for finding restaurants. Although it’s controversial (and there may be legal hazards to those making complaints) it also serves as a repository for citizen complaints against businesses. (There are many other companies recording these kinds of complaints, with various levels of complaint vetting, different business models, and different amounts of controversy.) Yelp and similar databases are often largely public, so the government (or Yelp’s owners) could have Watson analyze this data to find previously undiscovered trends as well.
The IBM Watson team is asking people with suggestions for how Watson’s technology might be used to make submissions to them. Since this application occurred to us, we thought we’d just throw it out here. (IBM has specific guidelines. For example, the application must involve unstructured text, as well as involving human language questions for Watson to answer. In this case, the questions to Watson would be about ideal allocation of resources, or whether there were unusual new patterns appearing in the aggregate submissions.)
Dr. Watson the spook
We’re limiting our discussion here to unclassified government databases, which must be huge and full of unstructured text. Of course, the government also has those “other” databases we’ve heard so much about recently. We’re guessing Watson already passed the background checks for a security clearance, so that’s one’s probably covered. It’s pretty obvious the technology hasn’t made it around to “more civilian” government applications yet, however.
Forming a posse in the digital age
Maybe semantic processing of text and crowd-sourced law enforcement is how you form a posse in the digital age. (Or maybe we just think that because Americans located out West with an engineering bent. Posses have obvious legal and ethical hazards — we not being entirely serious in our enthusiasm for the crowd-sourced posse thing here. Perhaps on the emerging risks side, these new technologies might facilitate a dangerous new vigilantism.)
The problem with unstructured text
To date, computers have been mainly good at processing structured text. This means a designer or software developed needs to anticipate in advance how data will be used. They need to carefully design a database schema or web form that captures these use cases in advance.
Humans, on the other hand, are especially good at processing unstructured data. (We may create a structure around that data after the fact.) From an evolutionary standpoint, this makes sense. The importance of things we encounter may not become clear until after the fact. Therefore, it’s usually not possible to design a structure around this data in advance.
There’s a bear in the woods, or, Watson come quickly.
Usually, in these kinds of evolutionary examples a lion is used. We’ll use a bear instead. Let’s say there’s a bear in the woods. You thought the bear was tame, but in hindsight it turned out very dangerous. You previously thought the information you had about bears was not valuable, so you didn’t bother to organize it since that would involve unnecessary effort. If you’re able to go back and extract crucial information about bears from this previous unstructured data, your genes are much more likely to be passed on. Similarly, the vast majority of information streaming in to your senses will later to out to be extraneous. By being able to leave this data unstructured until its importance becomes clearer, you’re able to save energy (computation effort) and will have an evolutionary advantage.
Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011
Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011
Search API will now always return "real" Twitter user IDs. The with_twitter_user_id parameter is no longer necessary. An era has ended. ^TS
— Twitter API (@twitterapi)November7, 2011
There are 7 comments so far
Leave a Comment
Don't worry. We never use your email for spam.Recent Comments
- florimee on genetic disease turns you into a real-life vampire
- Acculation on Alien Pioneer plaque starmap to 3D printed jewelry transmedia: maker movement data-driven multiplatform media
- Acculation on Free Video Data Science Assessment Tool
- Acculation on Free Business Advice Chatbot Product
- Acculation on Online Consultation with Dr. Krebs (Big Data and Management Consulting)
We were curious to know what the folks at IBM thought about some of our proposed uses for Watson, so we posted to the IBM developer forum. Will Sennett of IBM was kind of enough to write a detail response on the IBM site.
Here’s an excerpt: “I’d have to dig a bit more at the FBI assistant example … certainly solutions in the big data and analytics realm that are a great fit for government…. On the HR side, I think you’re spot on. In fact, one of our Watson Mobile Developer
[Waston] application difficulty and complexity is probably dependent on the data ….”
Read his full response on the IBM forum.
[…] recent articles on IBM Watson analytics and Google Glass generated a lot of interest with people contacting us privately to ask for advice […]
[…] are all the trolls on the Internet? We have done our best to tick people off in this blog. We skewered Google Glass. We did not have kind words for IBM Watson’s marketing department. We’ve even poked fun […]
[…] been a fan of IBM’s Watson semantic meaning analytics system since IBM first announced they were opening up their ecosystem. Around the time of CES we pointed […]
[…] up the topic of semantic text systems. In our earlier article from April, we mentioned a “bear in the woods” scenario. The idea there is that structured data, such as the forms used in hospital […]
[…] of our most read articles have been on IBM Watson, including suggestions & possible alternatives. We’ve pushed IBM several times to come up with better demos for […]
Twitter comments updated.