NLP tools
·
Word
embeddings: GloVe (https://nlp.stanford.edu/projects/glove/),
Word2Vec (https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa),
BERT (https://pypi.org/project/bert-embedding//),
ELMo (https://allennlp.org/elmo),
RoBERTa
·
Stanford
NLP software (https://nlp.stanford.edu/software/)
·
Unigrams,
bigrams, trigrams
·
Linguistic
Inquiry and Word Count (LIWC): (https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015_LanguageManual.pdf)
· POS tags: NLTK toolkit (https://www.nltk.org)
·
Morphological
analysis:
o
Polyglot: https://polyglot.readthedocs.io/en/latest/MorphologicalAnalysis.html
o
Morfessor : https://morfessor.readthedocs.io/en/latest/
o
LegaliPy: https://github.com/henchc/syllabipy
·
Flesch
reading ease and other readability formulas (Kincaid et al 1975)
·
Speciteller: Specificity score (Li and Nenkova 2015)
·
Concreteness
score (Brysbaert et
al 2014)
·
Dictionary
of Affect and revised 2009 version (Whissell
1989, Whissell 2009)
·
Hedge
words and phrases (Ulinski
et al 2018)
·
textstat:
tools to extract readability measures from text (readability, complexity, and grade level)
·
Tools to restore punctuation in
unpunctuated text/ASR results:
·
Information on NLP
for Chinese data
· Polyglot (Multilingual text processing toolkit extracted from 136 language in Wikipedi
· Other useful text features:
o Number of filled pauses
o Response latency
o False starts and other speech disfluencies
o Repetitions
o Lexical diversity: determined by type/token ratio
o Creativity: similarity of this response to other responses
· Sentiment lexicons
o The General Inquirer (Stone et al. 1966)
§ Positive (1915), Negative (2291), Strong vs Weak, Pleasure, Pain, etc.
o MPQA Subjectivity Cues Lexicon
§ 2718 positive, 4912 negative
o Bing Liu Opinion Lexicon
§ 2006 positive, 4783 negative
o Product reviews on Amazon
§ Multidomain sentiment analysis dataset
§ Amazon product data, 143 million reviews
o Movie reviews on IMDB
§ Cornell movie review data, labeled with sentiment polarity, scale, and subjectivity
§ Large Movie Review Dataset v1.0, 25k movie reviews
§ IMDB Movie Reviews Dataset, 50k movie reviews
§ Bag of Words Meets Bags of Popcorn, 50k movie reviews
o Reviews from Rotten Tomatoes
§ Stanford Sentiment Treebank, 11k reviews
o Tweets with emoticon
§ Sentiment140, 160k tweets
o Twitter data on US airlines
§ Twitter US Airline Sentiment, with negative reasons (e.g. “rude service”)
o Paper reviews
o SentiWordNet
§ WordNet synsets automatically labeled with positivity, negativity, and objectiveness
o NRC Word-emotion Association Lexicon (Mohammad and Turney 2011)
§ Labeled by Turkers for joy, sadness, anger, fear, trust, disgust, anticipation surprise
o Lexicon of Valence, Arousal and Dominancy (Warriner et al 2013)
§ AMT ratings of 14k words
o Sentiment in Twitter (Go et al 2009) (Kouloumpis et al 2011)
o Emoji in Twitter (Felbo et al 2017)
o Attention Modeling for Targeted Sentiment (Liu and Zhang 2017)
o BERT in Sentiment Analysis (Google AI Language)
Speech
approaches
·
Aenaes:
text/speech alignment (https://www.readbeyond.it/aeneas/)
·
End2End
speech processing tools: ESPnet
·
MFCC
features
·
Speech
processing benchmark: SUPERB
· Acoustic-prosodic features
o OpenSMILE (https://www.audeering.com/opensmile/)
o Parselmouth (https://parselmouth.readthedocs.io/en/stable/)
o Praat (https://www.fon.hum.uva.nl/praat/)
o
Prosodic labeling
o
http://www.speech.cs.cmu.edu/tobi/
o
https://www.ling.ohio-state.edu/research/phonetics/E_ToBI/
o Prosodic analysis: AuToBI – A Tool for Automatic ToBI annotation (https://github.com/AndrewRosenberg/AuToBI)
§ PyToBI: ToBI labeling with Python
o Video series in speech acoustics:
·
Denoising
o
To
remove background noise or music: Spleeter, Descript,
Audacity, CMGAN, SEMamba
o
Denoising
script (multiple methods included)
·
Audio
analysis: Librosa
·
Time
Stamp transcriptors:
https://transkriptor.com/audio-to-text-timestamps/#
·
ASR
o
Kaldi
(https://github.com/kaldi-asr/kaldi)
o
Google
Cloud Speech-to-Text (https://cloud.google.com/speech-to-text)
o
Free:
Kaldi, Simon, Mozilla DeepSpeech,
Whisper, WhisperX,
Coqui, SpeechBrain,
Wav2letter
o
Paid:
Labellerr, Google Cloud Speech-to-Text,
Deepgram, Express Scribe, M$Azure,
Nuance
Dragon, IBM
Watson, Verbit,
…
o
And
more: https://www.goodfirms.co/blog/best-free-open-source-speech-recognition-software
o
Basic
information: https://cmusphinx.github.io/wiki/tutorialconcepts/
o
https://en.speechocean.com/about/newsdetails/62.html
o
https://fosspost.org/open-source-speech-recognition/
·
TTS
o
Simon King Merlin
video tutorial: http://www.speech.zone/courses/one-off/merlin-interspeech2017/
o
http://www.cs.cmu.edu/~awb/synthesizers.html
o
NaturalReader, Balabolka,
Panopreter Basic, WordTalk, Zabaware
Text-to-Speech Reader, WaveNet and WaveNet2, Deep Voice AI,
Tacotron2,
Tacotron-3, Re-Flow-TTS, WaveGlow, Deepmind
TTS, Cartesia
o
Noise
reduction: (https://dl.acm.org/doi/10.1145/2964284.2967306)
o
Calculating
spectral
centroids
· Old and new speech software:
o
SoX conversion
software: http://sox.sourceforge.net
o
http://linux-sound.org/speech.html
· Spectrogam reading practice:
o
https://home.cc.umanitoba.ca/~robh/howto.html
o
https://linguistics.ucla.edu/people/hayes/103/SpectrogramReading/index.htm
Visual
features
·
Fisher
Vector encoding (FV) (https://papers.nips.cc/paper/1998/file/db1915052d15f7815c8b88e879465a1e-Paper.pdf)
·
Vector
of Linearly Aggregated Descriptors (VLAD) (https://lear.inrialpes.fr/pubs/2010/JDSP10/jegou_compactimagerepresentation.pdf)
·
Facial
expression detection (FED) (https://www.jstor.org/stable/30204706?seq=1#metadata_info_tab_contents)
Statistical
measures and z-score normalization
·
More
here on ANOVA, Kruskal-Wallis test, regression, t-tests, Wilcoxon signed-rank
test, F1 scores and z-score normalization: http://www.cs.columbia.edu/~julia/courses/Resources/Stats.pdf
Machine
Learning
·
Weka
·
Scikit-learn
(https:/scikit-learn.org/stable)
·
Deep
learning models
o ChatGPT – GPT3.5 GPT4 (zero-shot, fine-tuned) (OpenAI 2023)
o Llama 2 (zero-shot, fine-turned) (Touvron et al 2023)
o
PaLM 2 (Anil et al 2023)
o
Alpaca-LoRA (Hu et al 2021)
o Transfer learning w/teacher-student network – many papers
o Many BERT uses
o Multi-Modality Multi-Loss Fusion Network: end2end model that optimizes for feature extraction and ML processes; used for multiple modality corpora (Multimodal Learning with Transformers: A Survey, peng, zhu, clifton 2023)
o
ImageBindhttps://imagebind.metademolab.com
o
Tensor
Fusion Network (TFN): for emotion, with Pytorch (tutorials for Pytorch)
o
MULT:
a transformer encoder with cross-modal attention
o Reading-comprehension datasets:
§ MultiSpanQA (Li et al 2022)
§ SQuAD (Rajpurkar et al 2016)
§ Quoref (Dasigi et al 2019)
·
Some
other potentially useful papers:
·
https://www.aclweb.org/anthology/W16-0301.pdf
·
https://www.aclweb.org/anthology/W17-3101.pdf
·
http://www.cs.columbia.edu/speech/PaperFiles/2019/clpsych19.pdf
·
http://www.cs.columbia.edu/speech/PaperFiles/2010/Hirschberg_etal2010.pdf
· https://docs.google.com/spreadsheets/d/1xDiuK4l5JJvwZKI6kDlMwcO1Twe7h7ItgnpBfmO75wA/edit
- gid=0