In the United States and the United Kingdom, the Motion Picture Association of America, the main lobbying arm of U.S. film studios, filed civil lawsuits against more than 100 operators of BitTorrent "tracker" servers that point to locations where digital files of movies, music and other content can be found.
Senior research scientist of the Knowledge Systems Laboratory at Stanford University, Dr. Deborah McGuinness, on "Why should you believe answers from systems on the web?"
I said I like wine and food, and this application is just a little demo that my students and I put up and built on off of the web ontology language guide. I wrote a little ontology about how to match wine and foodactually in the late 80sthat was used to show how description logics worked back then. We've used it for a lot of pedagogical examples, and it's been heavily tested by a lot of knowledge representation applications because it was out there for free. It's also the basis of this web otology guide about how you might model your ontology. It's just a simple hacked-together interface of what you might do.
I might come to the wine agent and need to figure out what the description of the wine is that I'm going to drink with my meal, and also go to my local wine cellar and pull out wines that I know are in there, and if nothing matches or if I want to go out and buy something new. Also send a query off to some online store and find out what matches.
-- ADVERTISEMENT --
Dr. Deborah McGuinness is associate director and senior research scientist of the Knowledge Systems Laboratory at Stanford University. This text continues her presentation of how the semantic web works.
While I could type in the description of the food, I could just click on something here. I clicked on fish, but just generic fish. So in a knowledge base, you know that you should have dry white varietals; medium-bodied wines match particularly well. And these little links next to themif you click that, the application can tell you why it's suggesting dry, why it's suggesting white, et cetera.
It's got structured information about all of these wines. It knows that Semillon is a varietal and that Congress Springs is a winery and that this particular one is a medium-bodied, dry white wine. It's also got a connection to an agent that goes over to an online service and poses a query in real time.
Why should you believe answers from systems on the web? If I tell you to go drink this wine with your meal, you might say I'm going to believe her. But if I tell you from a medical application to amputate your leg, you might want to ask why. You might want to know who put that information in, what it was based on, and how recent the data is.
So if a human or an agent wants to use and integrate system answers, they need to trust them, and we believe that in order to trust them, one of the enablers is understanding them, so you should have your system be transparent to facilitate that understanding and trust. You want to provide information about what sources you used, how recently they were updated, whether they were from an authoritative source, and if you're using more complicated question-answering systems that use hybrid reasoning, manipulate, and integrate, you might have to explain all aspects of that.
We've got a framework for explaining reasoning tasks, and you can store, exchange, combine, and annotate the information. The basis for this system is compatible with DAMA plus OIL and OWL. It's got a specification of proof and a proof markup language. There's a proof browser for displaying information that gets dumped, for abstracting that information, and for just providing source information.
There's a registration of everything, of inference engines, rules, language, and ontologies that are used. You get to see the collections of question-answering and reasoning systems and ontologies. There's a proof-generation service so you can use an API and have your system generate the proof markup language automatically.
I might want to also merge two ontologies together, so I do some applications in the automotive domain. Some call them cars; some call them autos. I want to know that those are exact synonyms, so the system can identify from linguistic analysis and from some structure analysis when somebody might merge those two terms together.
The system works with a number of languages in an academic setting. It works with a lot of different name spaces and supports collaborative, distributed environmental support so that a number of people can edit ontologies at the same time and merge them at the same time.
This one uses taxonomic information in most of its applications. It's still deployed on a lot of these sites today, seven years after I last touched this project. And all it's trying to do is knowledge-enhanced search.
One of the initial places to deploy this was on a community website that had information about restaurants and stores. It had 22 different real estate offices, and somebody did a search for homes for sale and found no homes and no real estate agents, because none of the agents had that phrase on their web page. This was not a very good use of search. But if I had just a small amount of background knowledge, knowing that homes for sale were something that realtors could facilitate, you could actually get those realtor listings and make them available.
This was deployed using a mainstream search interfaceVerity in this case, although it's been reengineered to do other search engines. It has a small amount of background knowledge about what real estate agencies provide and what restaurants provide. It has a GUI for supporting high school students in maintaining these ontologies, a collaborative topic set tool that lets a lot of different people build those ontologies, and then it's got a lot of content on either databases or on the web.
This was just a small application that was aimed at the high school paper initially and then kind of grew out. So this world only works with services, food and drink, shops and stores, and arts and entertainment, and if you want to look for any of these in more specifics you can just click one on this particular website.
The next slide checks beauty. All data that we got for this site was supposed to be encoded with Standard Industry Classification (SIC) code, and beauty is, but the data on the site that used that encoding or used the terms that the merchant actually put in only found two "beauty shops" in the entire account, Woolworth's and one particular parlor, because none of them were tagged with the word "beauty," and beauty didn't show up in the actual listing. But if you add a little bit of background information about what beauty salons typically have, they might have hair design, manicures, pedicures, hair care, et cetera. Now when somebody does a search on that site and looks for beauty, it also looks for the things at beauty parlors like these, and this is all the content on that particular site. It matches on five different words, and so it actually gets hit.
In the testing that we did, we ended up getting most of the retrievals that we should be getting. We usually got 80-90% of the retrievals that we should be getting. One possibly bad thing that came up as a result of doing this matching was we got a plastic surgeon, and it was a retrieval for a beauty parlor, because he did face-lifts and facials other things that were associated with beauty. But that was close enough that most people thought that that was a good tradeoff to make.
Now I'm going to conclude and go to questions. My view is that the semantic web evolution has really started. I think the time is now to start using these technologies to enable your applications. The markup languages are stable and available and useful. A number of top-level ontologies are out there in existence. I can't say there's one and only one that everybody should start from, because that's not the case. There are a lot, and they're good starting points.
Environmental tools are available in the academic space and they're coming from the commercial space. A lot of different application areas benefit from precise semantics of the web, the simplest one being search, more complicated ones being configuration. The wine agent shows a smarter search, simple configuration application, and simple applications of agent usage. The inference web shows one solution path to handling a trust problem. The finder shows a really simple application that doesn't need any rocket scientists to do it or rocket science tools to actually do it.