Deconstructing RIS (part II)

I've tried to explain in a previous entry what's wrong with the RIS bibliography format, and I also figured out a likely reason why RIS came to be what it is. In this entry I'd like to show a possible way to "fix" RIS.

What are our options to design a reference data format? Using XML is a fairly obvious first step to make the data fit for validation. Besides that, there's basically two concepts, each with its pros and cons:

- Use the smallest possible number of general-purpose elements. This approach has been promoted e.g. by Bruce D'Arcus and is enormously flexible, something that is greatly appreciated in the humanities. However, reference data of this kind do not lend themselves well to being stored in databases (except in native XML databases), and the approach does not give any guidance to the reference data authors how to encode their stuff.

- Use a predefined set of reference types, each of which uses an individual structure. This approach is weaker in terms of flexibility (but see below how some flexibility can still be gained), but the reference author gets a pretty good support how to encode the data.

In the first case the question is "how do I encode this?". In the second case the question is "which type is the best fit?". I believe the second question is far easier to answer for the uninitiated.

So what are the goals?

- Preserve the scope of RIS. For the sake of compatibility all RIS reference types should be supported, and the new format should be able to hold all information contained in RIS entries.
- Extend the scope of RIS where the latter is crippled. If there is a CHAP type and a BOOK type, then there must also be a SONG type along with the SOUND type.
- Untangle the multiple-purpose fields like M1-M3,IS and the like. Only analogous information should be stored in the same elements/database fields. Define separate elements/database fields for unrelated information. With 200GB platters hitting the market, there is no need to fold unrelated stuff into the same field.
- Sanitize the illogical A1-A3 and T1-T3 levels. These used to be a mix of the orthogonal concepts of "most likely to be asked for" on the one hand and the three-layer librarian approach on the other. Stick with the latter.
- Drop the distinction between journals and other publications that contain parts. Turn each publication with parts into a separately citable entry. Do the same with sets which are composed of several publications.
- Support relations between entries. "Is-part-of" is an obvious one, but we also might have "also-published-as" or "cited-after" relationships.
- Provide validation during data entry. This is best done using an XML schema language.
- Turn the schema into a data entry form. The schema should restrict data entry for each supported publication type in a way that you can't enter information which is not useful for this type, and which would therefore not be stored in the database anyway.
- Turn the schema into a database schema blueprint. It should be easy to deduce what information needs to be stored in order to support all reference types.

A first attempt to improve RIS was the risx.dtd, used as an XML data format in RefDB. This was a small step on purpose, just designed to fix the most obvious problems. As it was implemented as an XML DTD, it added the capability to validate your references, and it turned it into a target of transformations for bibliographic data stored in different SGML or XML applications. It also cleaned up the A1-A3 and T1-T3 mess by using three levels of bibliographic information. However, data entry was not really simplified as it did not offer the user much help about how to encode different reference types. Validation was limited to checking whether the structure matched what the database can store - it did not take into account the special requirements of each reference type. risx also did not make any attempt to clean up the RIS multi-purpose fields and other relics of the record-based data storage. However, risx is fairly simple to understand and to store in a database.

Time for another try then. I figured that a DTD would not be flexible enough to implement the idea of a data entry form. It would have required more than 30 top-level elements, one for each reference type, and each one using a wealth of subelements to encode the information appropriately for each reference type. Remember that risx didn't hinder you to add e.g. part information to a book. This was still valid, albeit useless.

Relax NG allows to rearrange a limited set of elements in an almost unlimited number of different patterns. The idea was to define e.g. a publication element which can hold all the information that any reference type wanted to put in there. This is automatically a description for a database schema designer which fields and relationships need to be implemented. Then each reference type is implemented by a set of patterns which picks the required subelements e.g. from the pool of subelements defined in the publication element. This in turn is a description for the authors of reference information which combinations of elements are allowed. The schema (the working name is rbib in want of something better) is implemented in three files which separate these implementations. rbib.rnc defines the reference types, rbib-start.rnc defines the allowed top-level elements, and rbib-library is sort of the element pool.

Now lets see how this schema addresses the goals mentioned above:

- Preserve the scope of RIS: rbib currently implements all types known to RIS, except GEN which is not supposed to be used by reference authors. This type made sense decades ago when Reference Manager still used a record-based database engine. If a reference would not match one of the predefined types, you could dump the data any way you wanted into the record by means of the GEN type. It does not make any sense in the context of rbib, hence it was dropped. Some information acceptable to RIS was restricted. E.g. I don't see why a BOOK entry should contain page information. If you're interested in a chapter, it is a CHAP entry. If you want to refer to particular pages in a book, it is a BOOK entry, but the page information goes into the citation in your document. In a few cases the content models of particular types were extended to simplify the schema. E.g. in RIS, journal articles and abstracts on the one hand and magazine and newspaper articles on the other differ only in that the former may record a media type. By allowing the media type field for the latter types too, all four types now share the same content model.
- Extend the scope of RIS where it is crippled: This is currently not implemented, but doable and planned for the near future. Improved handling for author names and additional types (e.g. for encoding songs on a record) come to mind.
- Untangle the multiple-purpose fields like M1-M3,IS and the like: This is implemented.
- Sanitize the illogical A1-A3 and T1-T3 levels: rbib uses analytical, monographic, and series information to avoid these problems.
- Drop the distinction between journals and other publications that contain parts: Several types that can contain parts may now act as a container, regardless of whether they're monographs or published periodically. Same for series.
- Support relations between entries: some are implemented, but more could be added.
- Provide validation during data entry: obviously done by using Relax NG
- Turn the schema into a data entry form: done by the design of the schema, see below.
- Turn the schema into a database schema blueprint: also done by the design of the schema, as discussed above.

How does this affect data entry? If we use a validating editor like the marvellous nXML mode in Emacs (there are also non-free tools that support this, so you don't have to become an Emacs adept), we won't be able to add part information to a book entry as the schema does not accept this. All you need to do at the beginning is to figure out the most appropriate reference type for your data. From then on, you can basically let nXML-mode suggest the next element and fill in its value if it is available, until you've finished the reference entry.

All in all, the first implementation of the rbib schema does not attempt to solve all problems of bibliographic data once and forever, but it is a suggestion that helps both users and database implementers getting beyond the current limitations.


xiangjiaomeimei meint:

http://www.tory-burch-outle... burch tops rings jordans
http://www.tory-burch-outle... burch at saks
http://www.tory-burch-outle... burch flats reviews retro shoes
http://www.oakleysunglasses... aviator sunglasses
http://www.true-religion-ou... true religion jeans
http://www.tory-burch-outle... tory burch boots
http://www.michaelkorsfacto... kors jeans 13 shoes kors kids shoes
http://www.louis--vuitton.u... vuitton factory store jordans
http://www.rayban--sunglass... ray ban wayfarer for sale
http://www.louisvuittonoutl... vuitton outlet chicago
http://www.truereligion-out... religion wholesale jeans
http://www.louisvuitton.nam... vuitton houston
http://www.toryburch-outlet... burch dallas taylor louboutin pigalle pumps
http://www.ray-bansunglasse... bans sunglasses on sale top jordans
http://www.michaelkors-outl... michael kors bags
http://www.michaelkorsoutle... kors men watch
http://www.louisvuitton-out... vuitton discount
http://www.rayban--sunglass... ray ban sunglasses shoes for men
http://www.ray-bansunglasse... ban rb3183 jordans
http://www.michaelkorsfacto... kors totes
http://www.ray-bansunglasse... ray bans
http://www.michaelkorsoutle... watches bottom heels for women jordans
http://www.tory-burch-outle... tory burch concord 11
http://www.rayban--sunglass... ban jackie ohh iii
http://www.michaelkorsoutle... kors astor satchel oakley jupiter
http://www.oakley--sunglass... vault
http://www.true-religion-ou... religion price vuitton bookbag
http://www.louisvuitton-out... vuitton
http://www.truereligion-jea... to get true religion jeans
http://www.michaelkorshandb... kors outlet return policy
http://www.rayban--sunglass... ray ban glasses
http://www.louis--vuitton.u... vuitton bags used outlet factory sale
http://www.rayban--sunglass... on ray bans
http://www.ray-bansunglasse... ban sunglasses deals handbags sale
http://www.coachoutletstore... coach handbags
http://www.ray-bansunglasse... ban 5184 oakley outlet los angeles 12 jordans shoes outlet online
http://www.rayban--sunglass... ray ban website
http://www.true--religion.u... religion mens jacket
http://www.louis--vuitton.u... vuitton sunglasses
http://www.michaelkorsfacto... kors sandals sale kors leather handbags
http://www.michaelkors-outl... kors portland
http://www.tory-burch-outle... burch factory
http://www.louisvuitton-out... vuitton purses authentic
http://www.truereligion-jea... religion store location outlet handbags
http://www.truereligion-jea... religion t shirts for women jordans
http://www.christianloubout... louboutin website kors shop
http://www.tory-burch-outle... burch glasses
http://www.michaelkorsoutle... kors watches on sale outlet
http://www.michaelkorsoutle... kors sale handbags polo
http://www.true-religion-ou... religion leggings
http://www.michaelkorsfacto... kors gunmetal handbag clearance handbags
http://www.tory-burch-outle... burch charlotte nc
Montag 14 Juli 04:41

cmoutlet meint:

http://www.oakley-vaultsung... oakley sunglasses oakley sunglasses
http://www.oakleysunglasses... oakley sunglasses
http://www.oakleysunglasses... oakley sunglasses
http://www.raybansunglass-o... ray ban sunglasses
http://www.sunglasses-rayba... ray ban sunglasses
http://www.raybansunglasses... ray ban
http://www.louisvuittonoutl... louis vuitton outlet online
http://www.louisvuittonpurs... louis vuitton handbags
http://www.longchamphandbag... longchamp outlet toms shoes
http://www.michaelkorsoutle... michael kors outlet
http://www.outletonline-mic... michael kors outlet
http://www.toryburchoutlets... tory burch outlet
http://www.burberryoutletst... burberry outlet coach outlet coach factory
http://www.coachfactoryoutl... coach factory outlet
http://www.coachoutlet--sto... coach outlet store coach bags
http://www.truereligionjean... true religion jeans
http://www.truereligionjean... true religion
http://www.truereligionoutl... true religion outlet louboutin
http://www.louboutin-outlet... louboutin
http://www.polo-ralphlauren... polo ralph lauren outlet
http://www.poloralph-lauren... polo ralph lauren gucci outlet
http://www.katespadeoutleta... kate spade outlet tiffany and co nike air max air max
http://www.airjordanshoes-r... jordan shoes nike shoes nike free chanel handbags
http://www.juicycoutureoutl... juicy couture
http://www.ferragamo-shoes-... ferragamo shoes yoga pants
http://www.lululemonoutlet-... lululemon
http://www.pradaoutlet-shoe... prada handbags
http://www.cheapnfljersey-o... nfl jerseys
http://www.rolexwatches-swi... rolex watches beats by dre
http://www.marcjacobshandba... marc jacobs
http://www.mcmhandbags-back... mcm bags
http://www.chi-flatiron-hai... chi hair
http://www.thenorthface-out... north face outlet
http://www.saclongchamp-pas... longchamp
http://www.polo-ralphlauren... ralph lauren
http://www.insanityworkouts... insanity workout
http://www.ghd-hairstraight... ghd hair
http://www.bottegaveneta-ha... bottega veneta
http://www.valentino-outlet... valentinos
http://www.hermesbirkin-bag... birkin bag supra shoes hollister
http://www.abercrombieand-f... abercrombie and fitch ugg boots
http://www.karenmillen--dre... karen millen
http://www.mulberryhandbags... mulberry
http://www.louboutin-pas-ch... louboutin herve leger new balance shoes
http://www.converse--shoes.... converse shoes asics running reebok outlet montblanc nike mercurial jerseys
http://www.celinehandbags-o... celine handbags
http://www.louisvuittonbors... louis vuitton
http://www.timberland-pas-c... timberland
http://www.scarpe-hoganoutl... hogan
http://www.sac-lancelpasche... lancel p90x workout
http://www.jimmychooshoes-o... jimmy choo
http://www.uggsoutlet-boots... ugg boots
http://www.louisvuitton-pas... louis vuitton ralph lauren uk nike air max air max nike free
http://www.nikeairmax-pas-c... nike air max air max nike free air jordan nike tn
http://www.lunetteoakley-pa... oakley
http://www.lunetterayban-pa... ray ban montre femme vans pas cher gucci oakley rayban louboutin
Donnerstag 24 Juli 08:42

Mein Kommentar