About file formats management
Content Type handling architecture
With this article, I start a (long?) series of articles comparing how things are done in different projects. Today, I will compare how Eclipse and OpenOffice handle file content types. It is a very common feature but it not that easy to get it correct as we will see. I will first give details about OpenOffice.org, then about an Eclipse plugin and finally give some hints as a conclusion.
The problem is quite simple, given a file URI with which editors can I open it? For instance, an Emof(Xmi) file can be seen as Emof model, Xml infoset or plain text as shown on the previous figure. Thus all editors that know those models can be used. But first you need to determine that it is an Emof file. The file extension is usually not enough. So you need to inspect a file to discover its exact content-type. In our example you might need to look at xml namespaces used and/or at the root type of the xml. When those two steps are correct, you can then decide which editor has to be proposed to the user.
OpenOffice.org
OpenOffice.org is a good example because it needs to handle a lot of different types of file and can convert between them. For instance, you can open a CSV (Comma Separated Value) file, modify the corresponding spreadsheet and save it as an Excel file. While opening file, you will be asked to tell what the separator is.
The basic scheme in OOo is:
- Open dialog box
- content type detection (done by a service with interface XTypeDetection).
- extended type detection (done by a service with interface XExtendedFilterDetection)
- retrieve import filter from type (done by a complex query via FilterFactory)
- optionally display the dialog box that asks for options and is given by UIComponent field of the filter if this field is set
- set target model, ie the model to populate
- populate the model by calling the method filter(mediaDesc)
A MediaDescriptor is passed along the process. It describes the properties of a document regarding the relationship between the loaded document and the resource. In the process of loading a CSV file, the media-descriptor is fielded like this:
- set the URL property
- set Media-type to csv (because of the file-extension)
- add details (for a word document, it could be the version number)
- the CSV filter is linked with the media-descriptor (FilterName property)
- the separator character (‘;’ for instance) given by the user is added to the FilterData property
- csvFilter.setTargetDocument(doc) is called with a new model of kind scalc
- csvFilter.filter(mediaDesc) populate the scalc model
Saving a document is pretty straightforward. Note that if the filter also implements the interface ExportFilter it is used by default.
To sum it up:
- a type is just a string,
- there is one media-descriptor per document/file,
- light and deep type detection are done,
- the filter is retrieved by matching over the media-descriptor,
- there are import and export filters,
- filters might need to ask the user for details (like the separator in CSV),
- there are a limited number of models (swriter, scalc, etc).
Eclipse
Eclipse is a little bit different as it was build to support several editors for a same content-type and/or model. For instance, you can open an Uml file with the Uml editor, the xml editor and the text editor. Moreover you can use the basic Uml editor, the reflexive Uml editor, or the TopCased Uml editor or even your own.
Here again the basic scheme is:
- find content type (IContentTypeManager then IContentTypeMatcher then IContentDescriber)
- find the parser for this kind of content
By now, it seems pretty similar but Eclipse is a bunch of layers. So let’s say we would like to add a textual representation for Emof named HUTN. You need to create a plugin for that. In the configuration file plugin.xml, you would declare your new content-type:
<extension
point="org.eclipse.core.contenttype.contentTypes">
<content-type id="org.omg.content_type.hutn" name="Human textual notation"
base-type="org.eclipse.emf.ecore"
file-extensions="hutn"/>
</extension>
I have not declared any describer here, but imagine it was based on Xml, I could have added the following to my extension to declare that the root element must be abc.
<describer class="org.eclipse.core.runtime.content.XMLRootElementContentDescriber2">
<parameter name="element" value="abc"/>
</describer>
Then, of course, you need to declare (and implement) a parser. Here we will use the EMF part of Eclipse:
<extension
point="org.eclipse.emf.ecore.content_parser">
<parser
class="fr.rtaw.eclipse.hutn.HutnResourceFactory"
contentTypeIdentifier="org.omg.content_type.hutn">
</parser>
</extension>
The only code you need to write is an implementation of org.eclipse.emf.ecore.resource.impl.!ResourceImpl, especially the two methods doLoad(InputStream inputStream, Map<?, ?> options) and doSave, plus the factory.
The big question is what the base-type of Hutn should be! org.eclipse.core.runtime.text? org.eclipse.emf.ecore? If you choose text then you won’t be able to use native Ecore editors. If you use ecore, then it is not really the good hierarchy as Hutn is not a xml variant…
The trick is to explicitly link to each editor:
<extension
point="org.eclipse.ui.editors">
<editor
id="org.eclipse.emf.ecore.presentation.EcoreEditorID"
name="Ecore editor">
<contentTypeBinding contentTypeId="org.omg.content_type.hutn"/>
</editor>
</extension>
Unfortunately you will lose icon and translated label.
To sum it up:
- type is an instance of ContentType,
- types are placed in a hierarchy,
- there is one resource type per document content-type,
- this resource knows how to load and save itself.
- generic file describer are provided.
Comparison
| OOo | Eclipse | |
| type | string | IContentType |
| resource description | MediaDescriptor | IContentDescription |
| type detection | XTypeDetection, XExtendedFilterDetection | IContentDescriber |
Conclusion
Content-types and models can be placed in two distinct hierarchies, maybe even lattices. It is important to have both so as to be generic. The following figure display the ‘is-instance-of’ hierarchy for models.
Moreover an editor may be attach to a model or a meta-model. Indeed there are increasingly many reflexive editors, especially based on Emof. Thus the relation between content-type, model and editor can be described by the following class diagram:
File describer mechanism in OpenOffice and Eclipse seems appropriate and complete so I won’t comment on it. Having the possibility to ask user for input is quite nice. It is often used to ask the charset and few other options. But tying the format with a GUI Dialog box is not a good thing. Indeed you cannot automate things nor perform tests. Instead one should pass the builder a configuration object that can answer all of its questions.
This article is already quite long so I won’t add much, leaving you free for thoughts and discussions in the comments.
A start could be: how is it done on other rich client platform? Netbeans for instance?
Further reading
About OOo:
- http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/Integrating_Import_and_Export_Filters
- http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/Handling_Documents
- http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/Configuring_a_Filter_in_OpenOffice.org
- http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/Filter_Options
- http://api.openoffice.org/docs/common/ref/com/sun/star/document/module-ix.html
- http://framework.openoffice.org/documentation/filters/services.html
About Eclipse:
- http://www.developer.com/java/other/article.php/3648736
- http://help.eclipse.org/stable/nftopic/org.eclipse.platform.doc.isv/reference/api/org/eclipse/core/runtime/content/package-tree.html
- http://help.eclipse.org/stable/index.jsp?topic=/org.eclipse.platform.doc.isv/reference/extension-points/org_eclipse_core_runtime_contentTypes.html


