Text Categorization
Agents
The Problem
Automated methods to identify illegal activity on the Internet are
needed. Much of the content is illegal (fraud, for example) or indicates
illegal activity, such as discussions about hacking, drug-making, methods
of terrorism, and trading pornographic images of children. Law enforcement
organizations need to control and counter such illegal Internet activities.
Sifting through vast volumes of text manually is not a feasible option
in the view of the limited manpower available.
The Solution:
NewsHound and ChatHound
Automated methods to identify illegal activity on the Internet are needed
to cope with the volume. Intelligent software agents able to efficiently
and reliably locate, read, and categorize Internet text are the solution.
To label documents they read, the software agents use text categorization
- a machine learning methodology. With it, the agents are trained to efficiently
and accurately classify any document by generalizing from samples of positive
and negative documents related to a chosen topic. ANSER uses two agents
- NewsHound to help law enforcement monitor Usenet newsgroups, and ChatHound
to monitor Internet chat rooms.
NewsHound applies
text categorization to automatically monitor Usenet newsgroup posts. NewsHound
agents learn to recognize content by collecting a set of training documents
and running our text categorization algorithm. The agent then monitors
selected newsgroups for posts that match the categories. NewsHound saves
the posts for follow-up by law enforcement personnel. That way, instead
of having to monitor millions of posts to Usenet newsgroups, law enforcement
officers can concentrate on perhaps a few hundred posts that NewsHound
indentifies to be of interest.
ChatHound monitors
public Internet Relay Chat rooms for content of interest to the law enforcement
community. It saves the text of the conversations, together with identifying
information, that can be reviewed asynchronously by law enforcement personnel.
This allows law enforcement to focus on the discussions of interest rather
than having to wade through volumes of irrelevant information in realtime.
Other Law Enforcement
Applications
Text categorization agents can monitor many types of electronc documents
(for example, on personal computers or the World Wide Web), including
content related to terrorism, drug operations, child exploitation, or
electronic crime. Also, text categorization may be used to automate or
assist data entry. Crimes reported online can be categorized and prioritized
using text categorization in an automatic or an assisting mode.
|