This is a simple scenario building excercise for a Telecommunications & Information Technology which I thought of conducting online. This is just an experimnent and the results of this could lead us to opportunities and warn us about coming Threats. I invite one and all, related with it to participate in it. The book "The Art of Long View" by Peter Schwartz the master scenario-builder said that in such an excercise, it is important to learn the opinion of everyone related to the focus of the scenario...and so here we go....
Ken Fromm:(Mr. Fromm is an independent consultant to web application and semantic-based startup companies.) A couple of the challenges out thereI know that Scott is involved with some FCC reform issues and some efforts of the SEC in content and digital rights. You want to talk a bit about content regulation?
Scott Rafer:(Mr, Rafer is CEO and president of Feedster.) There's content overlap with user overlap. The first time we'll touch FOAF in a meaningful way is when you take everyone's RSS reading listsbecause of the stuff I've done in Wi-Fi some people want to know what I read on wireless, people outside the Valley who aren't very technical.
My first degree of FOAF stringing filtered by Feedster for my wireless feeds is an application that is beginning to crop. And what all this is predicated on is effectively the current definition of fair use, as it's implemented in the United States. The FCC is trying to crush fair use and actually get to the point where they are regulating software innovation. Under rules passed in November, there are actually kinds of software in the broadcast video world that are illegal to open source. Going down the slippery slope of the FCC saying what we can and cannot put under LGPL, for instance, is a real problem.
-- ADVERTISEMENT --
Fromm: Nova, do you want to talk about some of the challenges going forward with ontologies?
Nova Spivack:(Mr. Spivack is the CEO and president of Radar Networks.) There are several big missing pieces right now in making the semantic web. Certainly the lack of ontologies is a major issue. There are, I guess Deborah would say thousands of ontologies. So there maybe isn't a lack; there may be too many from one perspective. When you start looking at these ontologies, what you find is that some of them are overly specialized; maybe they are focused, for example, on particular niches of interest to DARPA, not particularly of great use to consumers unless you live in New York (with the paranoia that we all experience there).
But anyway, there are a lot of ontologies about medicine, and then there are upper level ontologies that try to define different concepts related to abstract, philosophical sets. But if you're an end user, what you really need are ontology sets that help you work with the types of information and relationships that you deal with every day or when you're shopping, for example.
Currently, there is no good human-readable mid-level ontology that's covering common-sense concepts. Cycorp has probably the most impressive ontology. The only problem is it's so big and complex and requires such a high, steep learning curve to actually do anything with it that it's not really targeted at the needs of normal developers and regular end users. The lack of the good, open ontology that covers common-sense concepts is a big problem. That's something we're working on, too. I think that ultimately there ought to be at least something like that that comes out of the W3C or is handed to the W3C at some point to at least provide a basis for describing certain types of entities and relationships that we all have to use in our applications.
Audience question: I was involved in the building of the in-flight medical language system for a company. The head of the company often said, "You got to take a top-down, bottom-up approach." Can you speak to any kind of bottom-up approach that you've used for building ontologies, like noun phrase extraction?
Spivack: Certainly approaches like that that were attempted. If you try to build an ontology from the bottom up, you can get basically a lot of clusters of things, but somebody has to then go and figure what they mean. There are lower-level systems like WordNet, for example, used a lot in the natural language processing community, and essentially they define words and their relationships to other words. That work is being done at a low level.
The next step is to take a corpus of information and somehow try to figure out a way to automate the connections from that information to some ontology. WordNet is relatively easy to do (you can match words), but if you want to use higher-level conceptsfor example, if you have an ontology of different types of companies, and then you take SEC documents and try to figure out how to match different documents to the particular types of companies and industriesthat is hard. Even if you have a good ontology, it turns out that you need some pretty clever algorithms to do a decent job of clustering things.
So associating data with ontologies is a problem. Building ontologies, I come from the school of thought of top down. I've never seen a bottom-up ontology that I liked. There aren't many. Having built much of ontologies, I think that the amount of thinking that goes into it is just so intensive that to do it well, I just don't think that, at least without great AI, we'll be able to do it anytime in the next couple of decades.
Question: Deborah had a slide in her presentation that had a very thin red line, and there was a very wide chasm between formal ontologies and informal ontologies. And right now, the bulk of the web, the Google experience that the majority of people are using, is off to the left-hand side of that slide. So there's like two chasms to cross. Getting users across to the kind of information that would be on the second chasm, getting comfortable with what that means and how they can use it and supply it, that seems like a very big challenge.
Spivack: Two things are interesting. One is that the tools are actually encoding the data, so when you talk about describing something and you look at something like Movable Type when it actually creates the information, it's tagging things, it's telling you, it's allowing the user to just input data into a form that's quite easy to use. The other side of that is the information is just described enough to be useful for the application. Sometimes you only need to add a little bit of description to make it really a lot more useful.
If you think about a search engine, it's just another kind of agent. People are already using things like this now. Evolving the search engines and evolving the information that's encoded from the application could find you quite a bit of functionality.
Rafer: The problem the existing search engines face is that their crawlers can't support this, never mind their indexing. So we had to start from scratch on data. We had 49 feeds March 2003.
The second gap is still up to developers. One of the things we do, which a lot of people don't use yet but increasingly, is every set of search results we provide, we provide also as XML. You get RSS out of our engine as well. So to the extent that you want to take advantage of all our Booleans, everything in our table, you can do that. And start creating just little links that provide feeds of information sucked out of our index, filtered however you want, which gets you at least toy-level applications over that second gap, too. People in corporate business intelligence departments are doing competitive research, at least prototyping, this way.
Tell us what you think of this post using our On or Off rating system. Only your most recent vote will count.
Member Comments
Few quick remarks:
(a) A refreshing view on ontologies can be found in a recent interview with Tom Gruber-- see http://www.sigsemis.org (SIGSEMIS bulletin, vol 1 (3), page 4).
(b) Here is a sample list of some openly available not so small ontoloiges of varying quality developed for variety of purposes:
- TAP which covers domains of possible interest to end users/consumers: http://tap.stanford.edu/tap/download.html
- SWETO which is more useful for testing scalability and performance of SW tools
(populated with million+ entitie and relationship instances of facts): http://lsdis.cs.uga.edu/Projects/SemDis/sweto/ [accessbile under Creative Commons terms)
- GlycO which shows how extensive a domain ontology can be; the first version has 767 classes, it is 11 level deep, and is developed by domain scientists: http://lsdis.cs.uga.edu/Projects/Glycomics/index.php?page=5 (note: KB with 100s of thousands of assertions will be posted in the near future) [Open source]
(c) Here are examples of some real-world domain/application/task ontologies developed for commercial enterprises/customers (so not available publicly), with an average of 1 million entity instances (and even more relationship instances), and a couple with over 10m each: Financial Market, Terrorism, Pharma, Anti-money laundering, Equity Research, Repertoire Management.
(d) Tens of high quality ontologies (mostly developed by extensive human/committee efforts, unlike commerical-use ontologies which were primarily populated using high quality knowledge sources) can be found in biology and medical domains which are at the forefront of using ontologies.
(e) Several well known Internet businesses seem to be adopting more formal knowledge organization using "ontologies" for products, books, etc. and are considering coupling them with Web Services (ie Semantic Web Services).
Additional comments based on experiences from building real-world ontologies are on pages 46-53 of this KMWorld 2004 presentation: http://www.kmworld.com/kmw04/presentations/Sheth.pdf
I think the world of Scott Rafer. He's a brilliant guy and if I had serious money to invest, Feedster would be near the top of my list. (Oops ... I only invest in companies in China. ;-)
However, the SDF needs to supplement their semantic web seminar with academic gurus. It's really the academicians who are leading the sematic technologies charge.
I did a quick search and there are over 40 publicly accessible links in my Furl archive, including many academic links. See http://www.furl.net/members/goldentriangle and search with: "semantic web" OR "semantic technologies" . I also have a folder specifically on semantic technologies called, "Semantic Web". It's searchable, too.