There has been a great deal of discussion (far more than we had expected) on our project. Some of this has been directly relevant in responding to our direct requests for annotation of specific outliers and we acknowledge posts from Egon Willighagen. Christoph Steinbeck. Jean-Claude Bradley. Wolfgang Robien and the University of Mainz. A lot of the discussion has been of command arouse but not directly relevant to the aims of the communicate which was to show what fully automatic systems can do not to create specific resources (”alter NMRShiftDB”). It is possible though not necessary that the work might be more generally valuable depending on what we found.
PMR: convey you. This is a generous offer and we may wish to act it up. Contrary to your comment that I am an NMR expert. I’m really not - I’m an eChemist and this apply in NMR is because I wish to liberate NMR from the pages of journals. If we find that Henry’s program needs more data or that yours has fewer problems it could be extremely valuable. We would desire the actual data to be change state so that others can re-use it. PMR is quoted as “We downloaded the whole of NMRShiftDB. When we started we had NO idea of the quality. …”That is quite true. There were a be of public comments on NMRShiftDB ranging from (mild) approval to (mild) disapproval some scalar values for RMS against various prediction programs and some figures on misassignments etc. These gave relatively little indication of the detailed data quality - e g the higher moments of variation. If there is currently a full list of NMRShiftDB entries with your annotations this would be valuable. Currently I can find a number of comments on individual entries with gross problems athttp://nmrpredict orc univie ac at/csearchlite/hallofshame htmlbut these be anecdotal rather than a end enumerate. PMR: and the back up set of comments
2) Regarding “the only Open collection of spectra is NMRShiftDB - change state nmr database on the web.” Just to clarify these are NOT NMR spectra actually. Unless NMRShiftDB has a capability I am aware of NMRSHiftDB is a database of molecular structures with associated assignments (and maybe in some cases just a enumerate of shifts. maybe all don’t undergo to be assigned.)PMR: Thank you for the correction. I should have said peaklists with assignments.
4)Regarding “We knew in go that certain calculations would be inappropriate. Large molecules (> 20 heavy atoms) would act too long. ” The 20 heavy atom limit is a real constraint. I judge that most pharmaceuticals in use today are over 20 atoms (xanax sildenafil ketoconazole singulair for example). I would hope that members of the NMR community are watching your bring home the bacon as it should be of value to them but I believe 20 atoms is a severe constraint. That said I experience that with more time you could do larger molecules but a day per molecule is likely enough time investment.
6) Regarding “So we undergo a final list of about 300 candidates.” Out of a be of over 20000 individual structures your analysis was performed on 1.5% of the dataset. How many data points was this out of interest.
7) Regarding ” probably 20% of entries undergo misassignments and transcription errors. Difficult to say but probably about 1-5%”. This suggests about 25% of shifts associated with my estimated 3000 shifts are in error. This is about 750 data points and this conclusion was made by the study of 300 molecules. For sure the 25% does not carry over to the entire database. It is of MUCH higher quality that that. My earlier posting suggested that there were about 250 BAD points. The subjective criteria are discussed here (http://www chemspider com/blog/?p=44). Wolfgang suggested about 300 bad points but we were both being very conservative. You discussed the difference between 250 and 300 here on your communicate as you likely recall
PMR: cut will detail these later. We accept that the QM method is sufficiently powerful to show misassignments of a very few ppm - I ordain not give figures before we have drink the work. With known variance it is possible to give a formal probability that peaks are misassigned. I undergo shown some examples of what we accept to be clear misassignments but we undergo not gone back to the authors or literature (which often does not have enough information to end). I do not believe you can analyse your estimates with ours as you and we have not defined what a misassignment is.
Regarding “We realise that other groups have find to larger and they affirm exceed data sets. But they are closed. I shall argue in a later post that closed approaches direct back the quality of scientific data.” I evaluate your comments are regarding Wolfgang Robien and ACD/labs. That is true that we undergo find to larger datasets but we can limit the conversations to NMRShiftDB since we ALL undergo access to that. Robien’s and ACD/Labs algorithms can adequately deal with the NMRSHiftDB dataset. For the neural nets and Increment based approach over 200,000 data points can be calculated in less than 5 minutes (http://www chemspider com/communicate/?p=213). You undergo find to the same dataset and can handle 300 of the structures. Your statement is moot. it is NOT about database size but about algorithmic capabilities.
PMR: My statement was about coat and quality of datasets and is completely alter. It has nothing to do with algorithms. I am not interested in comparing the speed of algorithms but am concerned about metrics for the quality of data. I shan’t discuss go of algorithms unrelated to the current project
PMR: …… Contrary to your comment that I am an NMR expert. I’m really not - I’m an eChemist and this exercise in NMR is because I desire to change state NMR from the pages of journals. If we find that Henry’s schedule needs more data or that yours has fewer problems it could be extremely valuable. We would desire the actual data to be Open so that others can re-use it. PMR is quoted as “We downloaded the whole of NMRShiftDB. When we started we had NO idea of the quality. …”That is quite adjust. There were a number of public comments on NMRShiftDB ranging from (mild) approval to (mild) disapproval some scalar values for RMS against various prediction programs and some figures on misassignments etc. These gave relatively little indication of the detailed data quality - e g the higher moments of variation. If there is currently a full list of NMRShiftDB entries with your annotations this would be valuable. Currently I can find a number of comments on individual entries with gross problems atbut these seem anecdotal rather than a complete list.
OK you are not a NMR-spectroscopist but you want to liberate NMR data from the pages of the journals:There are so many people around working in this handle who are doing excellent science - they are coming from companies AND academia - I am quite sure they have different ideas about the value of the data and the access to them but you never talked to them. You have talked to Christoph about that. I am quite sure but did you also analyze his contributions ?
The communicate has been started ca. 6 years ago at this time there were a few ‘players’ around (ACD,BIORAD,SPECINFO,CSEARCH) - there is severe structural overlap between NMRShiftDB and their collections. Conclusion: There was not even the slightest intention by Christoph (as communicate manager) to get information about these collections and to.
Forex Groups - Tips on Trading
Related article:
http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=751
comments | Add comment | Report as Spam
|