PyLSD

What is PyLSD?

PyLSD is a layer over the LSD software. The reader of this page should already be familiar with LSD. PyLSD processes enhanced LSD input files, and thus solves structure elucidation problems LSD cannot solve. The enhancements allow one to deal with compounds for which the exact molecular formula is not precisely known, for which some atoms have an unknown hybridization state and/or an unknown multiplicity (number of attached hydrogen atoms). In addition, the solutions can be ranked by decreasing order of likelihood, according to the matching of experimental 13C NMR chemical shifts with predicted ones.

An ambiguous (or complex) LSD problem, that include a variable molecular formula (VMF) and variable status atoms (VSA), is converted by PyLSD into a set of unambiguous (or simple) LSD problem files that can be separately processed by the LSD software. The solutions files are then grouped together for ranking using the 13C chemical shift prediction by nmrshiftdb2, structure diagam generation, and display.

PyLSD in written in Python language.

Find pyLSD on linkedin

Find pyLSD on GitHub

License

PyLSD is a free software that is distributed under the GPL license.

Version

This is version alpha-8. PyLSD-a8 is functional but still needs many improvements.

History

Here is the History file

Installation

Click here for installation and testing instructions. This page also indicates where to put the PyLSD command files and how to run them.

Running pyLSD

INSTALL.html indicates which intermediate files are created and provides advices for troubleshooting (see the First-aid section). The solution file, in SDF format is named mypylsdfile_0.sdf, and is created in the LSD/Data directory. It can be used in order to improve the 2D structure depictions outlsd generates.

Turning an LSD data file into a pyLSD data file

An LSD data file is not (yet) a valid PyLSD data file. The conversion can be achieved by adding two new commands: FORM and PIEC.

The FORM command indicates the molecular formula of the unknown compound. It has a single argument, a character string between double quotes, such as in FORM "C 21 H 22 N 2 O 2" for strychnine. All formula parts, elements symbols and coefficients are separated by spaces.

The PIEC command is a story in itself that is telled later, in a separate section. It takes a single integer as parameter that fixes an upper limit to the number of connected parts in the problem solutions (well, roughly...). Adding a PIEC 1 command to a LSD data file achieves what the user generally wants to do.

Solution ranking by comparison between experimental and predicted chemical shift values requires to preserve all solutions produced by LSD the writing of either a DUPL 1 or a DUPL 0 to the input file to pyLSD.

The SHIX command has no effect in LSD and may be considered there as documentation for chemical shift values. The SHIX command in pyLSD is also used to determine whether two atoms with identical status are equivalent or not. Two atoms with identical status and different chemical shifts are not considered as equivalent. Even though solution ranking is not considered as significant for the user, the SHIX commands are necessary for a correct production of a solution set by pyLSD.

The pinene.lsd LSD data file has been adapted to pyLSD and is available for testing. Please notice that the location of the substructure files have been updated because the Filters directory is not any more in the current directory but in ../LSD/. Running "python lsd.py pinene.lsd" from the command line with "Variant" as current directory should display the structure of pinene.

PyLSD specific commands

FORM

Molecular FORMula

The string argument of the FORM command contains chemical element symbols that are followed by an indication about the number of occurences of these elements in the molecule. The number of occurences may be either an integer or a range in the form n-m, in which n and m are two integers (n < m). If at least one number of occurences is a range, then a MOMA is required.

Example: FORM "C 1 H 3 N 1 O 2-3", taken from the mixture.lsd PyLSD data file. This molecular formula fits with nitromethane, methyl nitrite and nitromethane.

MOMA

MOlecular MAss

The argument of the MOMA command indicates a molecular mass or a molecular mass range. A molecular mass is the sum of the atomic masses of the atoms that constitute a molecule, expressed in atomic mass units (amu), The atomic masses are integer values (number of nucleons in the most abundant isotope) according to the first line of the paragraphs in Variant/statuslist.txt.

Example: MOMA 1-1000, taken from the mixture.lsd PyLSD data file. The molecular mass must be between 1 and 1000 amu, thus meaning that no constraint on mass is imposed.

MULT

MULTiplicity

This is an extension of the LSD MULT command. For chemical element symbols, LSD supports usual symbols for usual valence (S for divalent sulfur) and usual symbols followed by unusual valence (S4 for tetravalent sulfur). PyLSD also supports alternative valences but the usual valence must explicitely be given (S24 for di- and tetravalent sulfur). For hybridization state, multiplicity and electric charge, alternative values are given, as usual, between parenthesis and are separated by blanks.

Example: MULT 20 N35 (2 3) (0 1 2) (0 1), defines atom 20 as a nitrogen atom either tri- or pentavalent, sp2 or sp3, bound to 0, 1 or 2 hydrogen atoms, with either a 0 or a +1 electric charge.

ELEC

Molecular ELECtric charge

The ELEC command either imposes a single molecular electric charge value or proposes alternative values. If no ELEC command is present, the imposed value is 0. Electric charges are expressed by integers, in proton electric charge units. Alternative values are given, as usual, between parenthesis and are separated by blanks.

Example: ELEC (-1 0 1), constrains the molecular electric charge to be -1, 0 or +1, in proton electric charge units.

MAXP/MAXN

MAXimum number of Positively/Negatively charged atoms

The MAXP/MAXN command has a single integer argument that is the maximum number of positively/negatively charged atoms in the molecule. If no MAXP/MAXN command is present, then no control takes place.

Example: MAXP 1 constrains the molecule to have at most 1 positively charged atom.

DEMU

DEfault MUlt parameter

The DEMU command is only necessary if the molecular formula has alternatives. In this case, each element has a minimum number of occurences. The number of MULT commands for an element cannot be higher than the minimum number of occurences. If the actual number of occurences is strictly higher than the minimum number, the supplementary atoms get, by default, the most general status for the element, as inferred from Variant/statuslist.txt. The DEMU command overrides the default status for an element, given as first command parameter. The following parameters are those of a MULT command.

Example: DEMU N N (1 2 3) (0 1 2 3) (0 1) indicates that any supplementary nitrogen atom (relatively to the minimum number of nitrogen atoms, as indicated by the FORM command) is a trivalent nitrogen, of any hybridization, any multiplicity, either not electrically charged or with a single positive charge.

The PIEC command.

The CNTD command with 1 as argument forces LSD to deliver connected (in one piece) solutions. When its argument is 0, this control is disabled. The PIEC pyLSD-specific command operates on solution connectivity but at a different level. If one or more VSAs are present, the first task of pyLSD to propose a coordinance to each VSA. The coordinance concerns only the graph of heavy (non-hydrogen) atoms, considering each chemical bond between them as simple. The coordinance of an atom is simply the number of its neighbors. The molecular coordinance is sum the coordinances of all the heavy atoms; it is equal to twice the number of bonds between atoms (again, all bonds are simple). It can be proved that:
number of rings = number of bonds - number of atoms + number of connected parts.
Considering that each atom has a defined coordinance and therefore that the molecule has a defined number of bonds, that the molecule has a defined number of atoms (the VMF ambiguity has already been resolved at this time), then a set of possible number of connected parts corresponds to a set of possible number of rings. If all the possible number of rings are negative, then the currently proposed VSA coordinance set is not a valid one. The parameter of the PIEC indicates that the number of connected parts of the solution is comprised between 1 and the parameter value (included).

Looking for all the isomers of benzene, made of neutral carbon atoms, sp, sp2 or sp3, bound to 0, 1, 2 or 3 hydrogen atoms, it might be possible to consider that all atoms are monocoordinated (6 sp atoms, each bound to 1 H atom, like in a set of 3 acetylene molecules). The molecular coordinance is 6, resulting in 3 bonds. With 6 atoms and 1 as single possible number of pieces, the number of rings would be ‑2. With PIEC 1, the possibility of having 6 monocoordinated carbon atoms must not be further explored. The tri-acetylene solution can only be produced with a PIEC 3 command. However, PIEC 1 does not prevent the generation of a non-connected solution. The solution that consists in cyclobutadiene and acetylene has 5 bonds. With 6 atoms and 1 connected part, the number of rings is 0, which is acceptable. Therefore, the PIEC command is only a way to eliminate unrealistic possibilities in the assignment of particular coordinance values to the VSAs, and not a real control on the solution connectivity. This control can only be achieved by changing LSD itself through a modification of the CNTD command.

Solution ranking.

If at least one carbon atom has its experimental chemical shift defined by a SHIX command, then the prediction of the chemical shifts will be carried out. The sum of the absolute values of the differences between experimental and predicted values is used as criterion for solution ranking. The best fit solution is presented first.

Acknowledgments

Contact

jean.marc.nuzillard@gmail.com