Introduction to NLP: Tips on Knowledge Graphs and Learning NLP

Recently, I have received many letters from readers, most of whom are just starting to explore the fields of knowledge graphs and natural language processing, and the unfamiliarity brings some insecurity, leaving them feeling a bit lost.

Therefore, taking this opportunity, this article discusses the topic of “How to Get Started with Knowledge Graphs and Natural Language Processing” based on my practical experience over the past few years, sharing some thoughts for everyone to consider.

Introduction to NLP: Tips on Knowledge Graphs and Learning NLP

1. Some Suggestions for Getting Started with Knowledge Graphs

Starting a new discipline or a new technology actually has its pathways and order. In today’s information-rich environment, resources such as videos, public accounts, papers, and PPTs are readily available.

As a hot topic currently, knowledge graphs are attracting more and more students from both academia and industry.

However, many may fall into a strange loop of doing things just for the sake of doing them, whether it is because of assignments from mentors or directives from supervisors.

Therefore, when we engage in this work, we must have the concept of “starting with the end in mind”; this is the first point.

By working backward from the endpoint, learning a new thing with a purpose can provide a better sense of direction and goal.

Of course, the endpoint is something you need to answer yourself, and it may not be easy to answer at first; careful analysis is required.

For example, as a student, you should start your research from the assigned project, as each project is bound to address a scientific question that needs solving, and the solution to this problem is the endpoint, such as improving the performance of existing models or creating knowledge graph data for specific domains.

Similarly, for developers, the product project requirements are more straightforward; what technology is needed for the product, what strategies to use, and seeking solutions with these questions in mind will often be more direct.

However, students in school may not be so lucky; the topics may be generated from thin air and may not reflect real needs, which can naturally lead to confusion. But at this point, we need to utilize the second point, which is to leverage existing public projects to fill this gap.

For example, there are many competitions like CCKS, Kaggle, and Alibaba Tianchi, which are ways for companies to test or transport existing SOTA solutions through evaluation, so they come with clear project problems. Therefore, we can take advantage of this to explore real needs and abstract them into specific problems.

The benefit of these evaluations is that there are many pre-labeled datasets available, so we don’t need to annotate them ourselves and can learn about labeling methods, formal definitions of problems, and how to evaluate specific issues through dataset analysis, forming our own closed loop.

Of course, when we start participating in evaluation competitions, we must know how to code, for example, using a deep learning framework like PyTorch, Keras, or TensorFlow, to design loss functions and build models to complete modeling and evaluation. This leads to my third point: you must know how to program; programming is the primary productivity.

Of course, one might say, we have GitHub and GitLab, where there are many baselines that can be adjusted easily. However, not all codes can be used directly; they still need adjustments, and sometimes you need to write your own code. Especially for those who have innovative needs, it is even more essential to have the ability to build models; talk is cheap; show me your code.

Following this line of thought, we find that if we want to innovate and design models ourselves, we must know how to design and modify them.

This comes at a cost, which is reflected in foundational knowledge, such as basic theories of natural language processing, foundational deep learning theories, and basic parameter tuning directions. Only after determining feasible optimization methods can we validate them to achieve ideal results. Hence, this leads to my fourth point: build a solid foundation.

If the foundation is weak, the ground will shake. The foundation reflects understanding of the objects being processed, basic matrix operations, gradient descent, and optimization theories, as well as a thorough understanding of models like BERT. The best way to achieve this understanding is to read source code and original texts; there are many foundational knowledge points in the original texts, and source code contains specific implementation details. Looking at both together will yield better insights.

Of course, this leads to another issue: there is too much foundational content to learn—machine learning, deep learning, natural language processing, etc., making it overwhelming. Therefore, I want to emphasize the fifth point: be targeted, prioritize, and master key areas.

Everyone’s experiences are limited, and knowledge graphs themselves are a vast domain that can be divided into subfields such as entity extraction, entity relationship extraction, knowledge representation, entity alignment, and knowledge reasoning.

At this point, we must choose a specific direction to pursue. You can first take a brief look at the field and then choose a point that interests you and suits you, and based on publication needs, find some hotspots or urgent foundational topics to achieve optimal results.

After discussing so much, the summary is: build a solid foundation, combine theory with practice, be targeted, and start with the end in mind.

Some Reference Materials for Getting Started with Knowledge Graphs

Of course, regarding getting started with knowledge graphs, there are many excellent reference materials available for us to consider.

In terms of books, there are currently many popular science books available (hundreds of pages, regarded as popular science books), but there is not yet a well-written practical toolbook.

For courses, you can check Bilibili, where there are many open-source videos of varying quality, and you can choose what you need.

ABOUT

    About Us

Deep Blue Academy is committed to creating a first-class platform for learning and exchanging cutting-edge technology in China. Currently, tens of thousands of partners are studying at Deep Blue Academy, including many from well-known institutions such as Peking University and Tsinghua University.

Introduction to NLP: Tips on Knowledge Graphs and Learning NLP

Thank you for reading. Please choose to share, like, or follow.🙏

Leave a Comment Cancel reply