Blog

Interpretable Word Embeddings from knowledge graph embeddings

Tuesday, Nov 22, 2022 by Knut Jägersberg

Interpretable Word Embeddings from knowledge graph embeddings A while ago, I created interpretable word embeddings using polar opposites (I used their jupyter notebook from here https://github.com/Sandipan99/POLAR) from wikidata5m knowledge graph embeddings (from here: https://graphvite.io/docs/latest/pretrained_model.html). It resulted in a gigantic file of pretrained embeddings which sort concepts along 700 semantic differentials, i.e. like good/bad. However, the wikidata5m knowledge graph is huge. Roundabout 5 million concepts and 13 million spellings. A joined parquet file would properly take 100 GB of disk space.

Continue Reading

Cleaning data science engagement data

Wednesday, Nov 16, 2022 by Knut Jägersberg

Content Intelligence headline engagement In this post, I’ll mix together a bunch of headlines datasets I discovered with engagement and make it suitable for predicting engagement level from text for the domain of content intelligence. Data sources tweets on data science reddit posts search keywords blog posts ML paper social shares Content Intelligence Tweets These tweets come from various topics from data science and content marketing.

Continue Reading