photog.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for your photos and banter. Photog first is our motto Please refer to the site rules before posting.

Administered by:

Server stats:

246
active users

#Cheminformatics

1 post1 participant0 posts today

Fwd: "Unleashing 4 million IUPAC names into the wild" chembl.blogspot.com/2025/08/un

"In support of Egon Willighagen's 'One Million IUPAC Names' project, we have just released more than 4 million IUPAC names text-mined from patents. Here are the details as listed on Zenodo:"

chembl.blogspot.comUnleashing 4 million IUPAC names into the wildIn support of Egon Willighagen's ' One Million IUPAC Names ' project, we have just released more than 4 million IUPAC names text-mined from...

Oscar4 5.3.0 has been released: github.com/BlueObelisk/oscar4/

"OSCAR (Open Source Chemistry Analysis Routines) is an open source extensible system for the automated annotation of chemistry in scientific articles. It can be used to identify chemical names, reaction names, ontology terms, enzymes and chemical prefixes and adjectives, and chemical data such as state, yield, IR, NMR and mass spectra and elemental analyses."

[maven-release-plugin] copy for tag 5.3.0
GitHubRelease 5.3.0 · BlueObelisk/oscar4[maven-release-plugin] copy for tag 5.3.0

I think I am going to try to recover a bit of #cheminformatics / #chemistry #history, and make the index of the Internet Journal of Chemistry (IJC) FAIR in @wikidata

While the journal no longer exists, many articles are cited quite a few times.

I did some exploration some time ago, and for some I found full text "self-archiving" versions online.

And, TIL that Web of Science has entries for the articles too, which I just added for the 9 articles already in #Wikidata: w.wiki/Eide

scientific discussion moved from "letters to the editor" to PubPeer. There is a lot to be said about that, but one thing that must be said is that PubPeer can disappear.

But Letters to the Editor are preserved, for better or worse. if not mistaken, it was @dalke that pointed me at #ICCS2025 to a 1977-1978 discussion via such letters. I looked them up, and annotated some of the citations with the Citation Typing Ontology, made @nanopub and put them in @wikidata

new preprint with #opensource #cheminformatics by @Kohulan et al.: "Cheminformatics Microservice V-3: A Web Portal for Chemical Structure Manipulation and Analysis" doi.org/10.26434/chemrxiv-2025

"Here, we present Cheminformatics Microservice V3, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework."

ChemRxivCheminformatics Microservice V-3: A Web Portal for Chemical Structure Manipulation and AnalysisThe widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present Cheminformatics Microservice V3, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework. This latest version features a newly developed, interactive web-based frontend built with React, providing users with an intuitive graphical interface for manipulating and analysing chemical structures. The frontend supports essential cheminformatics operations, including structure editing, PubChem database integration, batch molecular processing, and standardised InChI/RInChI identifier generation. The microservice V3 addresses critical accessibility barriers in computational chemistry by providing researchers with immediate access to analytical tools, eliminating the need for specialised technical expertise or complex software installations. This approach facilitates reproducible research workflows and broadens the utilisation of cheminformatics methodologies across interdisciplinary research communities. The platform is publicly accessible at https://app.naturalproducts.net, and the complete source code and documentation are available on GitHub.

When I started chemfp, PubChem and CAS had <100M records. Real-world #cheminformatics data doubles about every 10 years (since WWII), so I figured 200M was good enough.

Chemfp's FPB format fails at around 268M recs due to its hash table format layout. (I stored 16*X instead of X in a 4 byte field.)

I've now been working with bigger synthetic data sets.

I've tweaked the FPB format to handle 1B records. 🎉

Can't go much bigger as id lookups get slow due to 32-bit hash collisions/pigeon-holing.