*
SSLMIT Dev Home | Site map
*
*
*
*

The BootCaT Toolkit version 0.1.2

Simple Utilities for Bootstrapping Corpora and Terms from the Web


Copyright © 2003, 2004 Marco Baroni and Silvia Bernardini.

BootCaT is a suite of perl scripts, and you may copy or redistribute it under the same terms as Perl itself.

If you use the BootCaT toolkit, we would be very curious to get some feedback from you.
Please send email to baroni@sslmit.unibo.it and/or silvia@sslmit.unibo.it.

If you publish work based on the BootCaT tools, please quote:
M. Baroni and S. Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the web. Proceedings of LREC 2004.

The perl scripts included in the BootCaT toolkit implement an iterative procedure to bootstrap specialized corpora and terms from the web, requiring only a list of "seeds" (terms that are expected to be typical of the domain of interest) as input.

In implementing the algorithm, we followed the old UNIX adage that each program should do only one thing, but do it well. Thus, we developed a small, independent tool for each separate subtask of the algorithm.

As a result, BootCaT is extremely modular: One can easily run a subset of the programs, look at intermediate output files, add new tools to the suite, or change one program without having to worry about the others.

For more information about the BootCaT tools, please take a look at the Readme file, available from the download page.
* * *

BootCat Toolkit

Download DOWNLOAD submit

* * *

*
*
Manage Your Profile | Contact Us | SSLMIT Dev Online Newsletter
©2004 SSLMIT (University of Bologna).  Terms of Use | Privacy Statement
*