Text Categorization Agents

The Problem
Automated methods to identify illegal activity on the Internet are needed. Much of the content is illegal (fraud, for example) or indicates illegal activity, such as discussions about hacking, drug-making, methods of terrorism, and trading pornographic images of children. Law enforcement organizations need to control and counter such illegal Internet activities. Sifting through vast volumes of text manually is not a feasible option in the view of the limited manpower available.

The Solution: NewsHound and ChatHound
Automated methods to identify illegal activity on the Internet are needed to cope with the volume. Intelligent software agents able to efficiently and reliably locate, read, and categorize Internet text are the solution. To label documents they read, the software agents use text categorization - a machine learning methodology. With it, the agents are trained to efficiently and accurately classify any document by generalizing from samples of positive and negative documents related to a chosen topic. ANSER uses two agents - NewsHound to help law enforcement monitor Usenet newsgroups, and ChatHound to monitor Internet chat rooms.

NewsHound applies text categorization to automatically monitor Usenet newsgroup posts. NewsHound agents learn to recognize content by collecting a set of training documents and running our text categorization algorithm. The agent then monitors selected newsgroups for posts that match the categories. NewsHound saves the posts for follow-up by law enforcement personnel. That way, instead of having to monitor millions of posts to Usenet newsgroups, law enforcement officers can concentrate on perhaps a few hundred posts that NewsHound indentifies to be of interest.

ChatHound monitors public Internet Relay Chat rooms for content of interest to the law enforcement community. It saves the text of the conversations, together with identifying information, that can be reviewed asynchronously by law enforcement personnel. This allows law enforcement to focus on the discussions of interest rather than having to wade through volumes of irrelevant information in realtime.

Other Law Enforcement Applications
Text categorization agents can monitor many types of electronc documents (for example, on personal computers or the World Wide Web), including content related to terrorism, drug operations, child exploitation, or electronic crime. Also, text categorization may be used to automate or assist data entry. Crimes reported online can be categorized and prioritized using text categorization in an automatic or an assisting mode.