Acknowledgments
Dr. Zhong Chongguang participated in proofreading the article titled “Step-by-Step Improvement of Kaggle Competition Model Accuracy, Taking the Claims of American Good Hands Insurance Company as an Example” published by DataPi THU on June 5 and THU DataPi on June 8, and provided many constructive suggestions. The DataPi translation team sincerely thanks Dr. Zhong!
Author: Melanie Tosik
Translation: Min Li
Proofreading: Ding Nanya
The length of this article is 1100 words, and it is recommended to read in 3 minutes.
Melanie Tosik currently works at the travel search company WayBlazer, where her job is to generate personalized travel recommendations through natural language requests. Reflecting on her learning journey, she has compiled a list of learning resources for beginners who wish to get started with natural language processing.
Visualization dependency parsing tree on the displaCy website
https://demos.explosion.ai/displacy/?text=Great%2C%20this%20is%20just%20what%20I%20needed!&model=en&cpu=1&cph=0
I remember reading a passage that said if you feel the need to answer the same question twice, it might be a good idea to post the answer on a blog. Based on this principle, and to save time answering questions, I provide the standard way to ask this question here: “My background is in research **science, and I am very interested in learning NLP. Where should I start?”
Before you dive into reading this article, please note that the list below only provides a very general starting list (which may not be exhaustive). To help readers better understand, I have added brief descriptions in parentheses and estimated the difficulty level. It is best to have basic programming skills (e.g., Python).
Online Courses
• Dan Jurafsky and Chris Manning: Natural Language Processing[A fantastic video introduction series]
https://www.youtube.com/watch?v=nfoudtpBV68&list=PL6397E4B26D00A269
• Stanford CS224d: Deep Learning for Natural Language Processing[More advanced machine learning algorithms, deep learning, and neural network architectures for NLP]
http://cs224d.stanford.edu/syllabus.html
• Coursera: Introduction to Natural Language Processing[NLP course offered by the University of Michigan]
https://www.coursera.org/learn/natural-language-processing
Libraries and Open Resources
• spaCy (website, blog)[Python; an emerging open-source library with cool usage examples, API documentation, and demo applications]
Website URL: https://spacy.io/
Blog URL: https://explosion.ai/blog/
Demo App URL: https://spacy.io/docs/usage/showcase
• Natural Language Toolkit (NLTK) (website, book)[Python; an introduction to practical programming for NLP, mainly for educational purposes]
Website URL: http://www.nltk.org
Book URL: http://www.nltk.org/book/
• Stanford CoreNLP (website)[A high-quality natural language analysis toolkit developed in Java]
Website URL: https://stanfordnlp.github.io/CoreNLP/
Active Blogs
• Natural Language Processing Blog (Hal Daumé)
Blog URL: https://nlpers.blogspot.com/
• Google Research Blog
Blog URL: https://research.googleblog.com/
• Language Log Blog (Mark Liberman)
Blog URL: http://languagelog.ldc.upenn.edu/nll/
Books
• Speech and Language Processing (Daniel Jurafsky and James H. Martin)[A classic NLP textbook covering all the basics of NLP, with a 3rd edition coming soon]
https://web.stanford.edu/~jurafsky/slp3/
• Foundations of Statistical Natural Language Processing (Chris Manning and Hinrich Schütze)[More advanced statistical NLP methods]
https://nlp.stanford.edu/fsnlp/
• Introduction to Information Retrieval (Chris Manning, Prabhakar Raghavan, and Hinrich Schütze)[An excellent reference book on ranking/search]
https://nlp.stanford.edu/IR-book/
• Neural Network Methods in Natural Language Processing (Yoav Goldberg)[An in-depth introduction to NN methods in NLP, with corresponding introductory books]
https://www.amazon.com/Network-Methods-Natural-Language-Processing/dp/1627052984
Introductory book: http://u.cs.biu.ac.il/~yogo/nnlp.pdf
Other Miscellaneous
• How to Build a word2vec Model in TensorFlow[Learning guide]
https://www.tensorflow.org/versions/master/tutorials/word2vec/index.html
• Resources for Deep Learning in NLP[An overview of top resources on deep learning categorized by topic]
https://github.com/andrewt3000/dl4nlp
• Last Words: Computational Linguistics and Deep Learning – On the Importance of Natural Language Processing. (Chris Manning)[Article]
http://mitp.nautil.us/article/170/last-words-computational-linguistics-and-deep-learning
• Understanding Natural Language with Distributed Representations (Kyunghyun Cho)[Independent lecture notes on ML/NN methods for NLU]
https://github.com/nyu-dl/NLP_DL_Lecture_Note/blob/master/lecture_note.pdf
• Bayesian Inference with Tears (Kevin Knight)[Tutorial workbook]
http://www.isi.edu/natural-language/people/bayes-with-tears.pdf
• Association for Computational Linguistics (ACL)[Journal anthology]
http://aclanthology.info/
• Quora: How I Learned Natural Language Processing?
https://www.quora.com/How-do-I-learn-Natural-Language-Processing
DIY Projects and Datasets
Source: http://gunshowcomic.com/
• Nicolas Iderhoff has created a public, detailed list of NLP datasets. In addition to these, here are some projects that can be recommended to those who want to get hands-on practice as NLP beginners:
Dataset: https://github.com/niderhoff/nlp-datasets
• Implement Part-of-Speech Tagging (POS tagging) using Hidden Markov Models (HMM).
https://en.wikipedia.org/wiki/Part-of-speech_tagging
https://en.wikipedia.org/wiki/Hidden_Markov_model
• Perform context-free grammar parsing using the CYK algorithm
https://en.wikipedia.org/wiki/CYK_algorithm
https://en.wikipedia.org/wiki/Context-free_grammar
• Calculate the semantic similarity between given two words in a text collection, such as Pointwise Mutual Information (PMI)
https://en.wikipedia.org/wiki/Semantic_similarity
https://en.wikipedia.org/wiki/Pointwise_mutual_information
• Use a Naive Bayes Classifier to filter spam
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering
• Perform spell checking based on the edit distance between words
https://en.wikipedia.org/wiki/Spell_checker
https://en.wikipedia.org/wiki/Edit_distance
• Implement a Markov Chain text generator
https://en.wikipedia.org/wiki/Markov_chain
• Use LDA to implement a topic model
https://en.wikipedia.org/wiki/Topic_model
https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
• Use word2vec to generate word embeddings from large text corpora, such as Wikipedia.
https://code.google.com/archive/p/word2vec/
https://en.wikipedia.org/wiki/Wikipedia:Database_download
NLP on Social Media
• Twitter: #nlproc, a list of articles on NLPers (provided by Jason Baldrige)
https://twitter.com/hashtag/nlproc
https://twitter.com/jasonbaldridge/lists/nlpers
• Reddit social news site: /r/LanguageTechnology
https://www.reddit.com/r/LanguageTechnology
• Medium publishing platform: NLP
https://medium.com/tag/nlp
Original link:
https://medium.com/towards-data-science/how-to-get-started-in-nlp-6a62aa4eaeff
Min Li, HP Enterprise, Senior Project Manager, responsible for global operational data analysis, visualization-assisted decision-making, optimizing operations, and promoting internal improvements. Exploring the mysterious power of big data, disruptive innovation is my interest.
Translation Team Recruitment Information
Job Content:A meticulous heart is needed to translate selected foreign articles into fluent Chinese. If you are an international student in data science/statistics/computer science, or working overseas in related fields, or confident in your language skills, you are welcome to join the translation team.
What You Get:Regular translation training to improve volunteers’ translation skills, enhance understanding of cutting-edge data science, overseas friends can keep in touch with the development of technical applications in China, and the background of THU DataPi’s industry-academia-research provides good development opportunities for volunteers.
Other Benefits:Data scientists from famous companies, students from Peking University, Tsinghua University, and other prestigious universities will become your partners in the translation team.
Click the end of the article “Read the original text” to join the DataPi team~
To ensure the quality of published articles and establish a good reputation, DataPi has now set up a“Typo Fund”, encouragingreaders to actively correct errors.
If you find any errors while reading the article, please leave a message at the end of the article, or feedback to the backstage, after confirmation by the editor, DataPi will send a8.8 yuan red envelope.
Thank you for your continued attention and support, and we hope you can supervise DataPi to produce higher quality content.
Reprint Notice
For reprinting the article, please do 1. Indicate at the beginning of the text: Reprinted from DataPi THU (ID: DatapiTHU);2. Attach the DataPi QR code at the end of the article.
To apply for reprinting, please send an email to [email protected]
The bottom menu of the public account has surprises!
For enterprises and individuals joining the organization, please check “Federation”
For previous wonderful content, please check “Search in the account”
For joining volunteers or contacting us, please check “About Us”
Click “Read the original text” to join the organization~