Parameter description used in the example:
Configuration files are in the following folder and are in txt format:
{KD_Root_Folder}\languages\{Language}\configuration_files
Please, do not change the folder hierarchy!
This folder contains all the files used by the tool to increase performances and to obtain better results.
The file name are self explaining and its format is really understandable and easy to use.
If you use the tool in your code remember to use the KD_loader object in order to update the serialized data file.
e.g : KD_loader.run_the_updater(lang, configuration.languagePackPath);
Below an example of code integration:
import java.util.LinkedList;
import eu.fbk.dh.kd.lib.KD_configuration;
import eu.fbk.dh.kd.lib.KD_core;
import eu.fbk.dh.kd.lib.KD_core.Language;
import eu.fbk.dh.kd.lib.KD_keyconcept;
import eu.fbk.dh.kd.lib.KD_loader;
public class Main {
public static void main(String[] args) {
String languagePackPath = args[0]; //taken from command line
String pathToFIle = args[1]; //taken from command line
Language lang = Language.ITALIAN; //Specify language
KD_configuration configuration = new KD_configuration(); //Creates a new instance of KD_Configuration object
// Configuration Setup
configuration.numberOfConcepts = 20;
configuration.max_keyword_length = 4;
configuration.local_frequency_threshold = 2;
configuration.prefer_specific_concept = KD_configuration.Prefer_Specific_Concept.MEDIUM;
configuration.skip_proper_noun = false;
configuration.skip_keyword_with_proper_noun = false;
configuration.rerank_by_position = false;
configuration.group_by = KD_configuration.Group.NONE;
configuration.column_configuration = KD_configuration.ColumExtraction.TOKEN_POS_LEMMA;
configuration.only_multiword = false;
configuration.tagset = KD_configuration.Tagset.TEXTPRO;
configuration.languagePackPath = languagePackPath;//Overrides the default path with the new one taken from the command line parameter
KD_loader.run_the_updater(lang, configuration.languagePackPath); //Updates the configuration file if something is changed
KD_core kd_core = new KD_core(KD_core.Threads.TWO);//Create an instance of the KD core
LinkedList<KD_keyconcept> concept_list = kd_core.extractExpressions(lang, configuration, pathToFIle, null);
for (KD_keyconcept k : concept_list) { //loop over the extracted key_phrases and print the results
System.out.println(k.getString() + "\t" + k.getSysnonyms() + "\t" + k.score + "\t" + k.frequency);
}
}
}
This software is provided as it is. For new versions and updates please check the project web page at : KD Key-Phrases Digger at DH FBK
Keyphrase Digger (KD) is released under Apache License 2.0.
For distributors of proprietary software, please contact Rachele Sprugnoli (sprugnoli@fbk.eu).
For attribution, please always cite the following paper:
Moretti, G., Sprugnoli, R., Tonelli, S. "Digging in the Dirt: Extracting Keyphrases from Texts with KD". In Proceedings of the Second Italian Conference on Computational Linguistics (CLiC-it 2015), Trento, Italy.
KD lib uses:
If you want to see the source code of main class of the KD runnable package (the only part that contains a GPL v2 License library) please click here