Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course
Click the above “Language Services” to subscribe.

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

| Course Name |

Language Data Mining and Python Programming

| Course Introduction |

Python is a high-level programming language with a simple and easy-to-learn syntax structure. It emphasizes code readability and clarity, making it an ideal choice for beginners to start programming. Python plays an important role in linguistic research in today’s new liberal arts context, providing rich libraries and tools for linguists to process, analyze, and mine corpus data.

This course aims to teach students and teachers in linguistics-related fields how to use Python programming language for data processing, analysis, and visualization in linguistics. The course will start from the levels of characters, words, phrases, sentences, and semantics, using real corpus from both Chinese and English for theoretical explanations, practical work with real corpus, and projects. Students will learn how to use Python tools and libraries to process corpus data, explore the characteristics and rules of language, and apply them in actual linguistic research.

| Course Features |

(1) The course outline is designed by PhD holders in linguistics and applied linguistics, completely from the perspective of linguistics;

(2) Designed specifically for students and researchers in linguistics with no prior knowledge of Python, taught step by step;

(3) The course improves students’ understanding and practical ability of applying Python in linguistics through real cases and projects;

(4) Provides synchronous Q&A service and original code;

(5) Offers unlimited replay of recorded classes.

| Course Outcomes |

You will learn how to use Python, combined with rich corpus resources, to conduct language research and analysis. You will master the skills of downloading, organizing, cleaning, and analyzing corpus using Python, thus uncovering hidden patterns and trends in language. Specific outcomes include (but are not limited to):

(1) Master the basic knowledge and skills of Python;

(2) Understand the basic methods of corpus research;

(3) Be able to use Python for text organization, cleaning, and analysis;

(4) Use statistical and visualization tools to analyze language data and discover hidden patterns;

(5) Complete a small-scale linguistic research project, cultivating quantitative empirical research thinking;

(6) Conduct your own language research project to enhance academic competitiveness.

| Course Design and Instructors |

(1) Senior Charles, majoring in Artificial Intelligence, focusing on Natural Language Processing; has long been tutoring undergraduate and graduate students in Python; proficient in various uses of Python; first place in the international competition SemEval-2023; first author of papers published in top NLP conferences ACL; member of the China Computer Federation (CCF) | CCF CSP certified; participated in multiple national and provincial projects; technical member of key corpus research groups.

(2) Senior Moa, founder of the “Empirical Linguistics Forum”; PhD candidate in Applied Linguistics; familiar with corpus research paradigms; participated in publishing multiple SSCI and Chinese core journal papers; involved in national social science fund projects and language center projects; course designer for Language Data Mining and Python Programming.

| Target Audience |

(1) Students and researchers in linguistics-related fields;

(2) Researchers using corpus paradigms;

(3) Students and researchers in computational linguistics;

(4) Text analysis practitioners.

| Course Schedule |

Lecture 1: Local Installation Interface and Basic Techniques of Python

1. Introduction to Python and Environment Configuration: Creating Your Own Language Programming World, Starting from “Hello World”

2. Variables, Data Types, and Operators

3. Conditional Statements and Loop Structures

4. Functions and Modules

Time: July 31, 2023, 19:00-21:00

Lecture 2: Reading and Outputting Textual Discourse

1. Reading Network Corpora

2. Reading Self-built Corpora

3. Outputting Formatted Results

Time: August 1, 2023, 19:00-21:00

Lecture 3: Preprocessing Textual Discourse

1. Sentence Segmentation

2. Tokenization

3. Normalization of Words (Lemmatization and Stemming)

4. Removing Chinese and English Stop Words (Deleting Unnecessary Words)

5. Clearing Punctuation, Numbers, and Other Unnecessary Information

Time: August 3, 2023, 19:00-21:00

Lecture 4: Annotation and Statistics at the Word Level of Textual Discourse (Part 1)

1. Introduction and Practice of POS Tagging in Chinese and English

2. Counting Characters, Words, Word Frequencies, and POS

3. Calculating Lexical Density

4. Calculating Lexical Complexity

Time: August 4, 2023, 19:00-21:00

Lecture 5: Annotation and Statistics at the Word Level of Textual Discourse (Part 2)

1. Word Length Statistics and Distribution

2. High-frequency Word Statistics

3. Specified Vocabulary Retrieval

4. Named Entity Recognition in Chinese and English

5. Visualization of Word-level Data (Including Word Clouds)

Time: August 6, 2023, 19:00-21:00

Lecture 6: Statistics and Calculations at the Phrase Level of Textual Discourse

1. Extraction of n-grams

2. NLTK’s n-grams() Method

3. spaCy’s noun_chunks Method

4. TextBlob’s ngrams() and noun_phrases

5. Calculation of Collocation Strength

6. Visualization of Phrase-level Data

Time: August 8, 2023, 19:00-21:00

Lecture 7: Statistics and Calculations at the Sentence Level of Textual Discourse

1. Counting the Number of Sentences

2. Calculating Sentence Lengths and Clause Lengths

3. Sentence Length Distribution Statistics

4. Extracting Specific Syntactic Structures

5. Visualization of Sentence-level Data

Time: August 9, 2023, 19:00-21:00

Lecture 8: Statistics and Calculations at the Semantic Level of Textual Discourse

1. Semantic Role Labeling

2. Sentiment Analysis of Text

3. Text Clustering Analysis

4. Semantic Similarity

Time: August 10, 2023, 19:00-21:00

Lecture 9: Corpus Research Based on ChatGPT

1. Constructing Parallel Corpora

2. Data Augmentation Techniques

3. Text Analysis Based on ChatGPT

Time: August 12, 2023, 19:00-21:00

| Course Detailed Introduction |

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

| Course Consultation |

Consultation Group 2

(If the consultation group QR code is invalid,

please add the course staff’s WeChat below)

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

After purchasing, please be sure to add the staff’s WeChat

to obtain the link for the Q&A communication learning group

Language Data Mining and Python Programming Course

This article is sourced from: Linguistics Empirical Thinking Forum

Meeting, Lecture, Training, and Competition Information Release

The Language Services WeChat public account assists in releasing information about meetings, lectures, training, and various competitions. Responsible persons can contact Yuyanfuwu (WeChat ID: yuyanfuwu2023) to discuss publishing matters (please note “Information Release” when adding).

Language Data Mining and Python Programming Course

Language Data Mining and Python Programming Course

Course Recommendations

Frontier | 2023 Peking University International Chinese Education Graduate Summer Advanced Seminar (Free + Completion Certificate)
Frontier | National Language Commission Language and Character Application Research Advanced Training Class (Session 2)
Frontier | Fudan FIST Free Course: Foreign Language Teaching Theory and Practice Research Frontier
Frontier | Fudan Summer FIST Free Course: Introduction to Contemporary Linguistics Based on Interdisciplinary Studies
Frontier | Fifth Language Application Research and Cognitive Neuroscience Summer Class

Frontier | 2023 Guangdong University of Foreign Studies English Creative Writing Theory and Practice Course (Online Free)

Leave a Comment