PyLSD is a layer over the LSD software. The reader of this page should already be familiar with LSD. PyLSD processes enhanced LSD input files, and thus solves structure elucidation problems LSD cannot solve. The enhancements allow one to deal with compounds for which the exact molecular formula is not precisely known, for which some atoms have an unknown hybridization state and/or an unknown multiplicity (number of attached hydrogen atoms). In addition, the solutions can be ranked by decreasing order of likelihood, according to the matching of experimental 13C NMR chemical shifts with predicted ones.
An ambiguous (or complex) LSD problem, that include a variable molecular formula (VMF) and variable status atoms (VSA), is converted by PyLSD into a set of unambiguous (or simple) LSD problem files that can be separately processed by the LSD software. The solutions files are then grouped together for ranking using the 13C chemical shift prediction by nmrshiftdb2, structure diagam generation, and display.
PyLSD in written in Python language.
Find pyLSD on GitHub
PyLSD is a free software that is distributed under the GPL license.
This is version alpha-8. PyLSD-a8 is functional but still needs many improvements.
Here is the History file
Click here for installation and testing instructions. This page also indicates where to put the PyLSD command files and how to run them.
python lsd.py mypylsdfile.lsd
commandINSTALL.html indicates which intermediate files are created and provides advices for troubleshooting (see the First-aid section). The solution file, in SDF format is named mypylsdfile_0.sdf, and is created in the LSD/Data directory. It can be used in order to improve the 2D structure depictions outlsd generates.
An LSD data file is not (yet) a valid PyLSD data file.
The conversion can be achieved by adding two new commands: FORM
and PIEC
.
The FORM
command indicates the molecular formula of the unknown compound.
It has a single argument, a character string between double quotes,
such as in FORM "C 21 H 22 N 2 O 2"
for strychnine.
All formula parts, elements symbols and coefficients are separated by spaces.
The PIEC
command is a story in itself that is telled later, in a separate section.
It takes a single integer as parameter that fixes an upper limit to the number of connected parts in
the problem solutions (well, roughly...).
Adding a PIEC 1
command to a LSD data file achieves what the user
generally wants to do.
Solution ranking by comparison between experimental and predicted chemical shift values
requires to preserve all solutions produced by LSD the writing of either a DUPL 1
or a DUPL 0
to the input file to pyLSD.
The SHIX
command has no effect in LSD and may be considered there as
documentation for chemical shift values. The SHIX
command in pyLSD is also used
to determine whether two atoms with identical status are equivalent or not.
Two atoms with identical status and different chemical shifts are not considered as equivalent.
Even though solution ranking is not considered as significant for the user, the
SHIX
commands are necessary for a correct production of a solution set by pyLSD.
The pinene.lsd LSD data file has been adapted to pyLSD and is available for testing. Please notice that the location of the substructure files have been updated because the Filters directory is not any more in the current directory but in ../LSD/. Running "python lsd.py pinene.lsd" from the command line with "Variant" as current directory should display the structure of pinene.
FORM
Molecular FORMula
The string argument of the FORM
command
contains chemical element symbols that are followed by an indication
about the number of occurences of these elements in the molecule.
The number of occurences may be either an integer or a range in the form n-m
,
in which n and m are two integers (n < m).
If at least one number of occurences is a range, then a MOMA
is required.
Example: FORM "C 1 H 3 N 1 O 2-3"
, taken from the mixture.lsd PyLSD data file.
This molecular formula fits with nitromethane, methyl nitrite and nitromethane.
MOMA
MOlecular MAss
The argument of the MOMA
command indicates a molecular mass or
a molecular mass range. A molecular mass is the sum of the atomic masses of the atoms
that constitute a molecule, expressed in atomic mass units (amu),
The atomic masses are integer values (number of nucleons in
the most abundant isotope) according to the first line of the paragraphs in
Variant/statuslist.txt.
Example: MOMA 1-1000
, taken from the mixture.lsd PyLSD data file.
The molecular mass must be between 1 and 1000 amu, thus meaning that no constraint on mass is imposed.
MULT
MULTiplicity
This is an extension of the LSD MULT
command.
For chemical element symbols, LSD supports usual symbols for usual valence
(S for divalent sulfur) and usual symbols followed by unusual valence
(S4 for tetravalent sulfur). PyLSD also supports alternative valences but the
usual valence must explicitely be given (S24 for di- and tetravalent sulfur).
For hybridization state, multiplicity and electric charge, alternative values
are given, as usual, between parenthesis and are separated by blanks.
Example: MULT 20 N35 (2 3) (0 1 2) (0 1)
, defines atom 20
as a nitrogen atom either tri- or pentavalent, sp2 or sp3, bound to 0, 1 or 2
hydrogen atoms, with either a 0 or a +1 electric charge.
ELEC
Molecular ELECtric charge
The ELEC
command either imposes a single molecular electric charge value
or proposes alternative values. If no ELEC
command is present,
the imposed value is 0. Electric charges are expressed by integers, in
proton electric charge units. Alternative values
are given, as usual, between parenthesis and are separated by blanks.
Example: ELEC (-1 0 1)
, constrains the molecular electric charge
to be -1, 0 or +1, in proton electric charge units.
MAXP/MAXN
MAXimum number of Positively/Negatively charged atoms
The MAXP/MAXN
command has a single integer argument that is
the maximum number of positively/negatively charged atoms in the molecule.
If no MAXP/MAXN
command is present, then no control takes place.
Example: MAXP 1
constrains the molecule to have at most 1
positively charged atom.
DEMU
DEfault MUlt parameter
The DEMU
command is only necessary if the molecular formula
has alternatives. In this case, each element has a minimum number of occurences.
The number of MULT commands for an element cannot be higher than the minimum number
of occurences. If the actual number of occurences is strictly higher than
the minimum number, the supplementary atoms get, by default, the most general status
for the element, as inferred from Variant/statuslist.txt.
The DEMU
command overrides the default status for an element,
given as first command parameter. The following parameters are those of
a MULT
command.
Example: DEMU N N (1 2 3) (0 1 2 3) (0 1)
indicates that
any supplementary nitrogen atom (relatively to the minimum number
of nitrogen atoms, as indicated by the FORM
command)
is a trivalent nitrogen, of any hybridization, any multiplicity,
either not electrically charged or with a single positive charge.
PIEC
command.
The CNTD
command with 1 as argument forces LSD to deliver
connected (in one piece) solutions. When its argument is 0, this control is disabled.
The PIEC
pyLSD-specific command operates on solution connectivity
but at a different level.
If one or more VSAs are present, the first task of pyLSD to propose a coordinance to each VSA.
The coordinance concerns only the graph of heavy (non-hydrogen) atoms, considering each
chemical bond between them as simple. The coordinance of an atom is simply the number
of its neighbors. The molecular coordinance is sum the coordinances of all the heavy atoms;
it is equal to twice the number of bonds between atoms (again, all bonds are simple).
It can be proved that:
number of rings = number of bonds - number of atoms + number of connected parts.
Considering that each atom has a defined coordinance and therefore that the molecule has
a defined number of bonds, that the molecule has a defined number of atoms (the VMF ambiguity
has already been resolved at this time), then a set of possible number of connected parts
corresponds to a set of possible number of rings.
If all the possible number of rings are negative, then the currently proposed VSA
coordinance set is not a valid one.
The parameter of the PIEC
indicates that the number of connected parts of the
solution is comprised between 1 and the parameter value (included).
Looking for all the isomers of benzene, made of neutral carbon atoms, sp, sp2 or sp3,
bound to 0, 1, 2 or 3 hydrogen atoms, it might be possible to consider that all atoms
are monocoordinated (6 sp atoms, each bound to 1 H atom, like in a set of 3 acetylene molecules).
The molecular coordinance is 6, resulting in 3 bonds. With 6 atoms and 1 as single possible
number of pieces, the number of rings would be ‑2. With PIEC 1
, the possibility
of having 6 monocoordinated carbon atoms must not be further explored.
The tri-acetylene solution can only be produced with a PIEC 3
command.
However, PIEC 1
does not prevent the generation of a non-connected solution.
The solution that consists in cyclobutadiene and acetylene has 5 bonds. With 6 atoms and
1 connected part, the number of rings is 0, which is acceptable.
Therefore, the PIEC
command is only a way to eliminate unrealistic possibilities
in the assignment of particular coordinance values to the VSAs, and not a real
control on the solution connectivity. This control can only be achieved by changing LSD itself
through a modification of the CNTD
command.
If at least one carbon atom has its experimental chemical shift defined by a
SHIX
command, then the prediction of the chemical shifts will be carried out.
The sum of the absolute values of the differences between experimental
and predicted values is used as criterion for solution ranking.
The best fit solution is presented first.