Analysis
Overview
Support for spell-checking needs to be added to Sophie. After this task is completed, spell-checking capabilities should be available. At the first revision of this task the expected end-user functionality is defined and research is performed to find an appropriate third-party library for spell-checking that will be integrated in Sophie. Preliminary design for integration should be available. At next revisions, actual implementation takes place.
Task requirements
- Perform a research to find an appropriate library for spell-checking. Spell check in Sophie will be implemented initially the following way:
- The Tools tab will contain a spell-check palette. It will contain the following elements:
- Spell-check button - will run a spell check on the text within the currently selected chain.
- Toggle underline button - will turn on/off underlining of misspelled words (they will be underlined with a dotted line as in Trac).
- Replace/Ignore buttons - will replace/ignore the currently selected misspelled word.
- A list of misspelled words - it will contain the currently found misspelled words.
- Clicking on a misspelled word will highlight it (select it) in the text (and go to the page it is on if necessary).
- A list of suggestions for correction - it will contain possible corrections for the selected misspelled word.
- Double-clicking on a suggestion will replace the misspelled word.
- When a word is replaced/ignored, the next misspelled word is selected.
- Initially up to two languages in a book should be selectable (that is, spell-check should be performed against two dictionaries).
- UI for language selection is not defined yet.
- NOTE: These requirements are subject to refinement in the next revision of the task. They are listed here to serve as general guidelines for the research.
- The Tools tab will contain a spell-check palette. It will contain the following elements:
- Describe the research findings in the Design section of this page. Include the following information about each library:
- Name and website;
- Licensing information
- Provided features (incl. supported languages or dictionary format);
- Ease of integration;
- In the Implementation section, suggest the library that is most appropriate for use in Sophie. Provide preliminary design of how to integrate it in Sophie. Required spell-check functionality can be redefined here based on the library chosen.
Task result
This wiki page (containing research findings and preliminary design for integration).
Implementation idea
Jazzy and Suggester are two possible candidates. Suggester provides search suggestions as well.
Related
How to demo
Show the research findings and how the library will be integrated.
Design
Here is a list of the libraries reviewed:
Jazzy
- Websites: http://jazzy.sourceforge.net/, http://sourceforge.net/projects/jazzy
- License and code: LGPL, open-source
- Features:
- Based on Aspell algorithms (actually was a port of Aspell initially).
- Dictionaries specified as word lists (can easily generate them from Aspell dictionaries).
- Comes with an English dictionary only but can get any of about 90 Aspell dictionaries to work.
- Can be used on Strings or in a JTextComponent.
- Documentation:
- No online docs or support forums (except few threads with no answers on SourceFourge).
- Relatively good JavaDoc and in-code comments.
- Internal:
- Uses event-driven approach (each spelling error is an event containing the misspelled word and a list of suggestions; listeners can be attached to handle the error).
- Words are passed as a WordTokenizer object constructed from a String.
- Useful links:
Suggester Spellcheck
- Websites: http://www.softcorporation.com/products/suggester/, http://www.softcorporation.com/products/spellcheck/
- License and code: Free to use binaries, source is proprietary, no common licence
- Features:
- Dictionary compression (up to 2GB in memory).
- Fast (0.002ms to check a word against the dictionary, 40ms to provide suggestions).
- Written in Java 1.2.
- Provides dictionaries for 9 languages.
- Documentation:
- No documentation provided with Basic Edition. Advanced and Enterprise versions (which are paid) provide documentation.
- No access to the code except for some samples.
- Internal:
- Uses .ind files (LaTeX processed index data) packed in jars as dictionaries.
- Has a.class, b.class, etc. in JAR
Other
- JOrtho - open-source, GPL-licenced, JTextComponent based only.
- JSpell - commercial server-based solution.
- JMySpell - open-source, LGPL-licenced, supports OpenOffice dictionaries (some of them LGPL-licensed), which are more compact, early stage of development, no documentation or JavaDoc, only one project actually applying it. Might be a viable option in the future if it gets stable.
Conclusion
Jazzy seems as the better choice. It is flexible as far as language dictionaries are concerned, incorporates powerful algorithms, provides JavaDoc and seems easy to use and modify. As a disatvantage, it might be slower due to its event-driven approach. However, Suggester has no documentation and its understanding will be more difficult. It would be harder to supply it with dictionaries as well. In the implementation section, a prototype/demo of using Jazzy in Sophie will be provided.
Note: Possible issues - dictionary file size too big (a lot bigger than a .dic + .aff file that JMySpell uses for example).
Implementation
Describe and link the implementation results here (from the wiki or the repository).
Testing
Place the testing results here.
Comments
- http://sourceforge.net/projects/jazzydicts/files/ provides a lot of dictionaries in Jazzy format and a tool for conversion under GPL license.
- http://wiki.services.openoffice.org/wiki/Dictionaries#Bulgarian_.28Bulgaria.29 - OpenOffice.org dictionaries that can be converted using the above mentioned tool.
- http://sourceforge.net/projects/bgoffice/files/ - a lot of Bulgarian dictionaries.