Monday, June 12, 2006

Processing usefulchem-molecules with Excel and the CDK

I have modified MoleculeBlogInfo.xls to generate mol and cml files from the entries in usefulchem-molecules. To use this software, you must first download and unzip the Chemical
Development Kit (CDK)
(both the CDK and its source code can also be found at
sourceforge.net). It is suggested you place the CDK in the folder C:\cdk; if you choose a different folder, you must edit MoleculeBogInfo.ini, modifying the line
ConversionProgram=java -cp .;C:\cdk ProcessSMILES
accordingly. It may also be necessary to install the latest version (5.0) of the Java JRE.

The CDK is a powerful Java library of chemical conversion and manipulation routines, and is used in a number of open source programs such as Jmol and JChemPaint. With it, you can easily write programs to generate molecular coordinates from SMILES, calculate molecular weights, compare fingerprints, and many, many other functions. The CDK Web
Services
and "Java Snippet" page by Rajarshi Guha are excellent examples (and source code source) of CDK usage.

Like FileConvert.xls, MoleculeBlogInfo.xls now creates cml files, which can be viewed with Jmol at the demonstration page. In addition to generating individual cml files, MoleculeBlogInfo.xls will also generate an xml file for the entire usefulchem-molecules site when you press the "Save Results" button. This xml file has the structure (I've omitted the actual data):

<?xml version="1.0" encoding="ISO-8859-1"?>
<molecules source="http://usefulchem-molecules.blogspot.com/" xmlns="http://www.xml-cml.org/schema">
<molecule id="UC####">
<smiles></smiles>
<ucNumber></ucNumber>
<canonicalMW></canonicalMW>
<naturalMW></naturalMW>
<atomArray>
<atom></atom>
</atomArray>
<bondArray>
<bond></bond>
</bondArray>
<suppliers>
<supplier></supplier>
</suppliers>
</molecule>
etc.
</molecules>
A version of FileConvert.xls which also uses the CDK, is also available.


0 Comments:

Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License