Source | AI Researcher

How AI Transformed My Work Efficiency: Insights from Nicholas Carlini

Introduction This article showcases the application of large language models (LLMs) in real work through the personal experience of Nicholas Carlini and discusses how AI technology is changing our work methods and enhancing productivity.

Table of Contents:

1. Nuances

2. My Background

3. How to Utilize Language Models

4. Evaluating the Capabilities of LLMs, Not Their Limitations

5. Conclusion

In today’s tech world, the debate over whether artificial intelligence is overhyped has never ceased. However, few experts like Nicholas Carlini, a safety researcher and machine learning scientist at Google DeepMind, provide us with a unique perspective through personal experience. Through his article, we see the powerful capabilities and diversity of large language models (LLMs) in practical applications. These are not hollow marketing claims, but tools that can truly change work methods, enhance productivity, and spark creativity.

Recently, Carlini personally wrote a lengthy article detailing how he uses AI in his daily life and work. This insightful piece not only showcases the broad applications of AI technology but also gives us a glimpse into how future technologies will reshape our lifestyles.

Indeed, Carlini’s experience may herald a broader trend: as AI technology continues to advance and proliferate, we may be standing at the forefront of a technological revolution. Just as personal computers and the internet fundamentally changed our lifestyles and work methods, AI may become the next key force driving social change.

So, in the face of such prospects, how should we view the development of AI technology? With caution, or should we embrace the change?

The Original Text is as Follows:Author: Nicholas Carlini, Safety Researcher and Machine Learning Scientist at DeepMind

I do not believe that “artificial intelligence” models (referring to large language models) are overhyped.

Indeed, any new technology attracts scammers. Many companies like to claim they are “using AI,” just as they previously claimed to rely on “blockchain.” (As we have seen time and again) Similarly, we may be in a bubble. The internet had a bubble burst in 2000, but the applications we have today were once the stuff of science fiction.

However, I think the recent progress we have made is not just hype because over the past year, I have spent several hours each week interacting with various large language models and have been impressed by their ability to solve increasingly difficult tasks I present to them. Therefore, I can say that these models have increased my speed in coding by at least 50%, whether for research projects or personal programming side jobs.

Most of the discussions I found online about the practicality of LLMs are either overly optimistic, claiming that all jobs will be automated within three years, or extremely pessimistic, believing they contribute nothing and never will.

So in this article, I just want to make the discussion more practical. I will not make any predictions about the future. I just want to list a list of 50 conversations I (a programmer and research scientist studying machine learning) had with different large language models that significantly enhanced my research capabilities and helped me with random programming side projects. These include:

Building an entire web application using technologies I had never used before.
Teaching me how to use various frameworks, even though I had never used them before.
Converting dozens of programs to C or Rust for a 10-100x performance increase.
Simplifying large codebases, significantly streamlining projects.
Writing initial experimental code for almost every research paper I authored in the past year.
Automating almost all monotonous tasks or one-off scripts.
Almost completely replacing web searches that help me set up and configure new packages or projects.
Achieving about a 50% replacement rate for web searches when helping me debug error messages.

Classifying these application examples,they can roughly be divided into “assisted learning” and “automating mundane tasks.” The former is crucial for me as it allows me to tackle tasks that I previously found challenging with ease; the latter is equally important as it enables me to focus on what I do best: solving real problems.

The key is that these examples showcase how I actually utilize large language models (LLMs). They are not meant to showcase the astounding capabilities of the technology but are based on my actual work needs. This means that while these examples may not seem particularly impressive, a significant portion of my daily work is also quite mundane, and the LLMs available today have almost automated all of these tasks for me.

Through this article, my goal is to show you, one example after another, how I have effectively utilized LLMs to enhance work efficiency over the past year until you feel somewhat fatigued. You should know that, although you may have seen quite a few examples, what I am showcasing is still less than 2% of my total usage of LLMs.

Nuances

If there is one area where the internet has failed to handle well, it is the grasp of nuances. I do not believe that today’s large language models (LLMs) can dominate the world, nor do I intend to discuss what future models can or cannot do. I just want to explore whether the current models are genuinely useful to me.

You might ask, why do we still need an article to prove the utility of language models? Isn’t this an obvious fact? But in reality, there are many people in academia, software engineering, and the media who claim that LLMs are useless, just a passing fad that will fade away in a few years without any impact on the world. I want to refute these views because current LLMs have already proven their practicality.

At the same time, I must clarify that there is also a strong opposing viewpoint held by some individuals who believe that existing models can replace all programmers, and people should no longer learn programming because soon everyone will face unemployment. While refuting these views is not the aim of this article, I must clarify that I do not support this extreme perspective.

Moreover, I will not advocate for “the ends justify the means,” even though the training of these models does have many negative impacts, I do not believe that this should promote their use.

I am fully aware of the potential negative consequences these models may bring, which can be severe, including but not limited to spreading misinformation, misuse, surveillance, and job displacement (some even worry about human extinction). I will write an article soon that delves into the harms that LLMs may cause, and I will post a link to it at that time. However, this is a separate discussion from whether language models are useful—that is precisely what I want to explore here.

I also understand that you might be hesitant to use language models due to their tendency to generate inaccurate information, repeat known facts, and potentially fail completely when faced with complex situations—perhaps my understanding of these limitations is deeper than yours. But this article will not discuss these aspects. Because I believe that despite these shortcomings, the models are still useful.

Furthermore, I am also well aware that the ethical issues involved in training these models are highly controversial. You may disagree with using people’s data without permission to train them (my understanding of this may be deeper than yours). Perhaps you are concerned about those who receive meager compensation for directly training these models. I acknowledge that these are issues, but this article also does not intend to discuss these.

As I have emphasized multiple times: I am only discussing here whether these models are practically useful in their current state.

My Background

Typically, I am not the kind of person who easily believes in anything. For instance, despite experiencing the cryptocurrency craze in the information security field a decade ago, I never participated in writing any research papers on blockchain. I have never owned any Bitcoin because I believe they have no real value other than for gambling and fraud. I have always been skeptical, and whenever someone claims that “a new technology will change the world,” my response has always been indifference.

Therefore, when someone first told me that AI would greatly enhance my work efficiency and change my daily work methods, I was equally reserved, responding, “I will believe it when I see the actual results.”

Additionally, I am a security researcher. In nearly a decade of work, I have focused on demonstrating how AI models can completely fail in environments they have never been trained on. I have shown that even a slight perturbation to the input of machine learning models can lead to completely erroneous outputs; or that these models often merely memorize specific cases from their training data and simply repeat them in practical applications. I am acutely aware of the limitations of these systems.

However, now I am here to say that I believe current large language models are the greatest enhancement to my productivity since the advent of the internet. Frankly, if I had to choose between using the internet and a cutting-edge language model today to solve a random programming task, I would likely choose the language model more than half the time.

How to Utilize Language Models

Here is how I utilize large language models (LLMs) to enhance work efficiency.

You might not be interested in the use cases I describe, or you might even find them somewhat absurd. Perhaps these cases are not relevant to your needs, which I can also understand. But I can only speak from a personal perspective. These usage examples are directly extracted from my interactions with LLMs over the past year.

1. Developing Complete Applications for Me

Last year, I created a quiz to assess GPT-4’s performance on several specific tasks. This quiz became very popular, garnering over ten million page views. You might not believe it, but I had GPT-4 write almost the entire initial version of this application for me. This process involved a series of questions starting from asking about the basic architecture of the application and gradually expanding its features. The entire conversation spanned 30,000 words, greatly testing the limits of the then-current GPT-4 model’s capabilities. If you look back at these conversations, you will find everything from my describing the requirements and asking the model to complete the entire implementation to my requests for specific modifications (such as “instead of comparing average scores, represent it as which percentile using kernel density estimation”) and even instances where I copied and pasted error messages from asking vague questions (like “plotting error: numpy.linalg.LinAlgError: singular matrix”) to simply asking straightforward single questions (like “how to add an iframe to a page with content loaded from a string using JavaScript?”).

This method is effective primarily because language models excel at handling problems that people have already solved, and 99% of the content of this quiz was merely basic HTML and a Python backend server, which anyone could write. The quiz gained attention and popularity not because of the technology behind it, but because of the content itself. By automating all repetitive parts, I could easily complete this project.

In fact, without the help of language models, I might not have even created this quiz because I was unwilling to spend the time to write an entire web application from scratch, even though I am skilled in programming! I believe that even the existing models are sufficient for most people to solve significant tasks they previously could not, simply by asking questions.

I have more similar examples that I will introduce later, where I had the model write entire applications for me, and when these applications are released, I will clearly state that they were completed with the assistance of language models.

2. Acting as a Guide for New Technologies

I used to be able to keep up with various emerging frameworks. However, one person’s time is limited, and due to my job nature, I spend most of my time keeping up with the latest research developments rather than the latest JavaScript frameworks.

This means that when I start a new project outside my research area, I typically have two choices: either use the technologies I already know, which may be outdated by ten or twenty years but are usually sufficient for small projects, or try to learn new (often better) methods.

This is where language models come into play. For me, newer frameworks or tools like Docker, Flexbox, or React may be familiar to others. There may be thousands of people in the world who are already very familiar with these technologies. Current language models can do this too.

This means I can learn any knowledge needed to solve tasks through interactive learning with language models without relying on static tutorials that assume the reader has specific knowledge and clear objectives.

For instance, earlier this year, when building an LLM evaluation framework, I wanted to run code generated by LLMs in a constrained environment to avoid it deleting random files on my computer and other issues. Docker was the ideal tool for this task, but I had never used it before.

Importantly, the goal of this project was not to learn how to use Docker; Docker was just a tool I needed to achieve my goal. I only needed to grasp 10% of Docker to ensure I could use it safely in the most basic manner.

If it were the 90s, I might have needed to buy a book on Docker and learn from scratch, reading the first few chapters and then trying to skim through to find out how to implement what I wanted. Over the past decade, things have improved; I might have searched online for tutorials on how to use Docker, followed the operations, and then searched for error messages I encountered to see if anyone had faced the same issues.

But today, I only need to ask a language model to teach me how to use Docker.

3. Starting New Projects

Looking back, the first programming language I encountered was Java. I loved programming, but there was one thing I absolutely hated: staring at a blank screen when starting a new project. Especially when using Java! Even just getting a “hello world” program to compile—what does this “public static void main string args” even do? How should the parentheses be placed? Which letters should be capitalized? Why are curly braces used here, while square brackets are used there?

So I did what any child would do—I asked my father for help.

Twenty years have passed, and I still dislike starting new projects with unfamiliar frameworks. Just getting the boilerplate code sorted can take a lot of time, and I have no clue what I’m doing.

For example, I recently wanted to try writing some CUDA code to evaluate the performance of a simple greedy search on a GPU compared to someone’s efficient optimized CPU implementation.

But I don’t understand CUDA programming. I can write in C, understand how GPUs work, the functions of kernels, memory layout, etc., but writing the code to send tasks to the GPU? I don’t know where to start. So I directly asked the model to write a draft of the CUDA program for me. Perfect? Certainly not! But it was a starting point. That’s exactly what I needed. You’ll notice there are many errors in the code! In fact, I can fully accept that. I’m not looking for a perfect solution but rather a starting point from which I can continue. If future models can do better, that would be amazing. But what I have now is already very helpful.

On the other hand, for some other personal projects I’m doing at home, I’m using a Raspberry Pi Pico W. This is my first time using it. I want it to do some things for me, especially some network-related tasks. Again, I’m sure I could find good tutorials online describing how to do what I want. But have you looked at the internet recently? The top 5 results are usually just some garbage content farms with flawed code from 2008, updated just for search engine optimization, but still not working.

So I directly asked a language model to teach me how to do what I wanted. I had worked with microcontrollers before, so I somewhat understand how they work. But I had never used the Pico W before. I just needed some help to get started with all the dependencies, and then I could figure out the rest.

My first “hello world” program for a new microcontroller always makes an LED blink. This lets me test whether I can compile and upload code to the device, whether I’ve set up all the pins correctly, and basically know what I’m doing. So, let’s just request a blinking program. (Again: does this exist on the internet? Almost certainly. But then I would have to search for it.) Once I have that code running, I know what to do from there. I know how Python works (believe it or not!). So I can continue editing things from there, handling the special MicroPython stuff.

When I encounter another issue that I need to address specifically, I can again request help from the model. For example, I just had the model write a script to connect to Wi-Fi.

4. Code Simplification

As a security researcher, I often need to deal with other people’s research projects that contain thousands of lines of code, and I must understand how they work to conduct attacks. This doesn’t sound difficult, and if everyone wrote clear code, it shouldn’t be difficult at all, but the real world is not like that. Researchers often lack the motivation to publish tidy code. So people often publish any messy code they can use. (I’m no exception.)

I cannot share research-related examples, but I can share an example from a personal project I am working on. It’s said that I have an unhealthy obsession with Conway’s Game of Life. Recently, I tried to find a quick way to evaluate some Game of Life patterns using Python. There is a great C++ tool called golly that can do this, but I don’t want to rewrite my Python code in C++.

Golly has a CLI tool that perfectly meets my needs—I just needed a way to call it correctly. The first step was to simplify the C++ code that supports about 50 different command line options to only perform the operations I wanted. So I input all 500 lines of C++ code into the LLM and requested a shorter file to accomplish the same task.

Do you know what? It worked perfectly. Then I requested a Python wrapper around the C++ code. That worked as well. This is one of those annoying tasks that I might never complete if it were up to me. But now I can ask someone else to help me with it, and what I got is 100 times faster than my original Python code.

I find myself doing this often. Here’s another example where I did the same thing with Python.

Once again, these tasks aren’t complicated. But each time I do this, I save a significant amount of time. This is one of the reasons I find large language models amazing today: their utility may not be glamorous, and they won’t win internet acclaim for saying “this is how I use large language models to simplify mundane tasks,” but it’s real help.

5. Handling Monotonous Tasks

Many of the tasks I must complete are monotonous and boring, requiring little thought but must be done.

In fact, I find that I often procrastinate on tasks because I know that completing them will feel tedious and painful. LLMs greatly alleviate this pain, allowing me to know that I only need to solve interesting problems when starting tasks. Therefore, I would like to introduce some very common problems that I solved by requesting help from LLMs.

For instance, recently I needed to decompile a program written in Python 3.9. Most Python decompilers only support up to Python 3.7 and cannot run on the 3.9 version I was dealing with.

Decompiling is not actually difficult; it mainly involves avoiding errors while reconstructing the control flow. So, I didn’t personally spend time converting thousands of opcodes and hundreds of lines of code; instead, I had the LLM help me with it. It did an excellent job! The results exceeded my expectations. Here are three different conversations where I had the model help me with this task. Another example is when I needed to convert some unstructured data into a structured format. For instance, I was working on a project that required listing some book titles and author names. I found some unstructured format data online and had the LLM help me format it. Recently, as part of my work, I often need to generate citations for the resources I use. Google Scholar makes it easy to cite papers, and I can copy and paste directly. However, citing webpages is slightly more troublesome; I recently started requesting the LLM to help me generate citations. (Ensure that this is correct!) I could continue to provide at least a hundred similar examples. But I think you get my point.

I fully understand that such tasks might make one feel like “is that it?” but we must remember that five years ago, these models could hardly write a coherent paragraph, let alone solve entire problems for you.

6. Making Every User a “Power User”

If you have ever watched someone less skilled than you use a tool, it can be somewhat painful. They may spend minutes or even hours completing a task that could have been automated through some macro or clever use of parallel applications for a specific task.

However, learning the skills required to perform these operations takes time and can be challenging.

For example, I recently tried to write a Python program to handle input from an Apple Lisa keyboard. I found some relevant code online written in C that included many statements like #define KEYNAME key_code, and I wanted to convert them into a Python dictionary mapping integer codes to corresponding strings.

I am an Emacs user. I know how to solve this problem in Emacs; it’s not even that difficult. Here are some key operations I just recorded to achieve this:

C-h C-s #def [enter] M-f [delete] C-d M-f C-[space] M-f C-w C-a C-y : ” M-f “, C-g C-] } C-[ {

Although this is almost second nature to me, I have spent more than half of my life becoming proficient enough in Emacs for this to become a natural reaction. But do you know what I do now that I’ve connected LLM to my editor? C-h C-h Please rewrite these #defines as a dictionary format of {keycode: string, …}.

Then, the text was rewritten right before my eyes!

It is in such situations that I believe LLMs have potential utility even greater than that of experts. This model raises the starting point for everyone; if you previously couldn’t do something at all, now you can suddenly do many things.

7. Serving as API Reference

Real programmers read the reference manuals when they want to understand how a tool works. But I’m a lazy programmer; I prefer to get the answer directly. So now I ask language models.

When I show these examples to people, some become somewhat defensive and say, “LLMs haven’t done anything you couldn’t accomplish with existing tools!” You know what? They’re right. But what can be done with a search engine can also be done with a physical book on the subject; what can be done with a physical book can also be done by reading the source code.

However, each method is simpler than the previous one. When things become simpler, you will do them more frequently, and the way you approach them will also change.

That’s how I ask, “Which $ command can pass all remaining parameters” and get the answer. (Followed by another question, “How should I use this thing!”) This is actually one of the ways I most commonly use LLMs. The reason I can’t show you more such examples is that I have tools built into my Emacs and shell to query LLMs. Therefore, when I want to do these things 90% of the time, I don’t even need to leave my editor.

8. Searching for Hard-to-Find Content

Searching for content on the internet used to be a skill that needed to be learned. What specific vocabulary do you want to include in your query? Should they be plural or singular? Past tense? What words do you want to avoid appearing on the page? Am I looking for X and Y, or X or Y?

Now, things are different. I can’t remember the last time I used OR in Google. I also can’t remember the last time I used a minus (-) to remove a subset of results. In most cases, today you just need to write down what you want to find, and the search engine will find it for you.

But search engines are still not 100% natural language queries. It’s still a bit like playing a game of reverse danger, trying to use keywords that would be in the answers rather than the questions. This is a skill I think we’ve all almost forgotten we learned.

For some simple tasks today (and over time, there will be more), language models are simply better. I can directly input, “So I know + corresponding to __add__, but what is it” and it will tell me that the answer is __inv__. This is something that is difficult to search for using standard search engines. Yes, I know there are ways to ask so that I can find the answer. Maybe if I input “python documentation metaclass ‘add'” I can search for it on the page and get the answer. But you know what else works? Just asking the LLM your question.

Doing this saves a few seconds each time, but when you’re in the midst of solving a coding task, trying to remember a million things at once, being able to spill out the problem you’re trying to solve and get a coherent answer is amazing.

This is not to say that they are perfect in this regard today. Language models only know things when they have been repeated online frequently enough. What “frequently enough” means depends on the model, so I do need to spend some effort thinking about whether I should ask the model or the internet. But the models will only get better.

Or, whenever I encounter a random crash, I will dump what I saw to the model and ask for an explanation, just like I did here when I simply input “zsh no matching remote wildcard transfer problem.” Or as a completely separate example, last year when I was writing a blog post, I wanted the first letter capitalized and the rest of the text to wrap around it, just as I did in this sentence. This is now referred to as sinking capitalization. But I didn’t know this. I only knew the effect I wanted, so I asked the language model, “I want it to look like a fancy book, with the text wrapping around O,” and it gave me what I wanted: this task is another one of those “I only did this because of LLMs” categories—I wouldn’t consider it worth spending a lot of time figuring out how to do it. But because I could directly ask the model, I did it, and it made my post look a bit better.

9. Solving One-Off Tasks

There are two types of programs. First, you have some programs you want to get right; they will exist for a while because you need to maintain them for years, so code clarity is important. Then you have those programs that exist for only about 25 seconds; they help you complete certain tasks and are then immediately discarded.

In these cases, I don’t care about the quality of the code at all, and the programs are completely independent, so I now almost exclusively use LLMs to write these programs for me.

Note: Most of these situations arise again, and you’ll look at them and say, “Is that it?” But as I mentioned earlier, I only have so many hours a day to work on a project. If I can save time and effort writing a one-off program, I will choose to do so.

Perhaps the most common example is helping me generate some charts to visualize some data I generated as results from certain research experiments. I have dozens of such examples. Probably close to a hundred instead of zero. They all look basically the same; here’s just one example: or another similar example when I have some data in one format and want to convert it to another format. Usually, this is something I only need to do once, and once completed, I will discard the generated script. But I could also give you a thousand other examples. Generally, when I have a sufficiently simple script I want to write, I will directly request the LLM to write it as a whole. For example, here I asked the LLM to write a script to read my paper aloud so that I could ensure they don’t have any silly grammatical issues. In many cases, when I’m not quite sure what I want, I also start by asking the model to provide some initial code and then iterate from there. For example, here’s a one-off task where I just needed to quickly process some data. In 2022, I would have spent two minutes writing this in Python and waited for hours because it only runs once—the time needed to optimize it would take longer than the Python program’s runtime. But now? You can bet I would spend the same two minutes requesting Rust code to process the data for me. Or here’s another example where I asked the model to download a dataset and do some initial processing on it. Is this easy for me to do? Perhaps. But that’s not the task I want to think about; what I want to consider is the research I will do with the dataset. Eliminating distractions is incredibly valuable, not just saving a few minutes of time.

Another time, I wrote a program so that I could 3D print some pixelated images using small cubes. For this, I wanted to convert PNG files to STL files; but this is not the focus of the project. This is just something that must happen along the way. So I requested the LLM to solve this problem for me. Or as another example, I recently wanted to set up a new project using Docker Compose. I encountered some issues and just wanted to get it running, and then I would figure out what went wrong. So I just went back and forth, copying one error message after another until it finally gave me a valid solution. I also find myself in many situations requesting a complete solution first and then asking for hints on how to modify it. In this conversation, I first requested a program to parse HTML, then requested API references or hints on how to improve it. Recently, I’ve been trying to do some electronics-related tasks; I have a C program running on Arduino, but I want it to run on Raspberry Pi Pico using MicroPython. The conversion process is not interesting; it just needs to be done. So I didn’t do the work myself; I just asked the language model to do it for me. For another project, I needed to classify some images in some interactive loops using some fancy ML models. I could have written it myself, or I could just ask the model to do it for me directly.

10. Explaining Things to Me

I recently started developing an interest in electronics. I did some electronic projects when I was younger and took a few related courses in college. But now that I want to engage in practical electronics projects, I find there are many details I don’t understand, making it difficult to start any project.

I could read a book on practical electronics. I might actually do this at some point to thoroughly understand the subject. But I don’t want to spend my time feeling like I’m learning. Part of the reason I engage in electronics is to step away from reading and writing all day.

This is where LLMs excel. They may not be as knowledgeable as the world’s best experts, but there are likely thousands of people who might know the answers to any electronics question I might ask. This means that language models are likely to know the answers too. They are happy to provide me with answers to all my questions, allowing me to enjoy the fun without getting bogged down in details. While I could certainly find answers by searching online, the convenience of simply having the model do this work for me after a busy day feels very relaxing.

Here are some examples showing how I ask language models about how things work in electronics. Are these answers perfect? Who knows? But you know what they’re better than? Being completely clueless.

11. Solving Tasks with Known Solutions

Almost everything has been done by someone before. The things you want to do are rarely truly novel. Language models are particularly good at providing solutions to things they have seen before.

In a recent project, I needed to enhance the performance of some Python code. So, I (1) requested the LLM to rewrite it in C, and then (2) asked it to build an interface so that I could call the C code from Python.

These tasks are not “difficult.” Converting Python to C is something I am confident I could complete in an hour or two. Although I do not fully understand how the Python to C API works, I believe I could figure it out by reading the documentation. But if it were up to me, I would never do it. It’s not part of the critical path; I would rather let the computer solve the problem than spend time speeding up tasks I don’t frequently need to run.

However, converting Python to C is primarily a technical process for simple programs, and there is a standard Python to C calling convention. So, I directly asked the LLM to help me with it.

Since then, I have begun to expect this to be something I can do; almost any time I need some high-speed code, I describe what I want in Python and request optimized C code. Other times, I do the same thing, but if I think comparing the correctness of Rust output to C output is easier, I will request Rust output. Or, as another example, parallelizing a Python function using the multiprocessing library is not difficult. You need to write some boilerplate code, and it will basically do it for you. But writing code can be a bit painful, which hinders the actual work you want to accomplish. Now, whenever I need to do this, I just request an LLM to help me. There are many times when, while trying to test an API, I initially write a curl request to start. Once it starts working, I want to repeat the task programmatically, and I will convert it to Python. In the past, I would have done something very ugly, directly calling os.popen() to run the curl command, but that’s not ideal. A better way is to convert it to Python’s requests library; but that takes time, so I wouldn’t do it. But now I can simply ask an LLM to help me with it and get a cleaner program in less time. For an upcoming project, I might discuss here what people commonly use as simple radio transmitters. Because what I really want is the answer from most people, LLMs are a perfect choice!

12. Fixing Common Errors

Before 2022, when I encountered error messages from some popular tools or libraries, I generally followed these steps:

Copy the error message.
Paste it into Google search.
Click the top Stack Overflow link.
Confirm whether the problem matches what I encountered; if not, return to step 2.
If ineffective, return to step 2, change search terms, pray, etc.

Now, what is this process like in 2024?

Copy the error message.
Ask the LLM, “How do I fix this error? (error)”
If ineffective, feedback “That didn’t work.”

I don’t have any conversation records to show these examples. (Or rather, I searched for an hour and couldn’t find any.) But there’s a very good reason for this: I have integrated it into my workflow.

I am an Emacs user. I set up my environment so that whenever I run a program and it exits with a non-zero status code (indicating an error occurred), it automatically calls the latest and fastest LLM and asks it to explain the answer while requesting a patch that can be directly applied to fix the bug in the code.

Most of the time, today’s models are not yet good enough to outperform me in this task, but they are gradually improving. Occasionally, when an LLM fixes a bug that I know would be very difficult to trace if I were to do it myself, especially when the error is simply due to a small typo, I am pleasantly surprised.

13. And Countless Other Things

All the conversations I mentioned above account for less than 2% of my total interactions with LLMs over the past year. The reason I haven’t provided links to other examples is not that these are cases of model failure (although there are many such cases), but because: (1) many interactions repeat the patterns I have already mentioned, or (2) they are not easy to explain clearly what happened and why it was useful to me.

I fully expect the frequency with which I use these models to continue increasing. As a reference, my LLM queries through the web interface in 2024 increased by 30% compared to 2023—and I can’t even quantify the increase in API queries, but I guess it has at least doubled or tripled.

Evaluating the Capabilities of LLMs, Not Their Limitations

One of the best pieces of advice I received when interviewing candidates was to evaluate them based on what they can do rather than what they cannot do.

I suspect I could ask you some simple questions that might make you seem incompetent. For example, in an extreme case: there are a billion people in the world who speak Mandarin; I can’t even count to ten. If someone gave me a primary school level Mandarin exam, I would surely fail miserably.

Even within the field of computer science, there are areas I completely do not understand. My knowledge of how to construct SQL is limited to writing effective SELECT statements. That—literally—is the only statement I know how to write.

Therefore, when I see people arguing online that LLMs are just hype because “they can’t even do XXX,” I genuinely feel confused. Here, XX could be:

… counting the number of words in a sentence!
… writing a poem where every word starts with the letter “a”
… multiplying two-digit numbers
… randomly selecting an element from a list

Because when was the last time you genuinely needed to do these things and sincerely thought that an LLM was the right tool for the job?

I wouldn’t think humans are completely useless because we can’t split 64-bit integers in our heads—that’s an extremely simple task for computers—and I also wouldn’t think that just because you can construct tasks that LLMs cannot solve, we should discard LLMs. Clearly, this is easy—what matters is whether you can find tasks they can provide value for.

Programmers are already well aware that certain things can be useful for different purposes. Want to write an operating system? Maybe you should use C instead of Python. No one would say, “Look how stupid Python is; you can’t even force variables to align to 32-byte boundaries!” That’s just at the wrong level of abstraction. Language models are the same. They operate at a very high level of abstraction; you cannot expect them to solve tasks that even the simplest programs can solve. But you can expect them to solve different kinds of tasks.

Conclusion

The intention behind writing this article is twofold. First, as I mentioned at the beginning of the article, I want to demonstrate that LLMs have provided me with significant value. Additionally, I’ve noticed that many people express interest in using LLMs but do not know how they can help themselves. Therefore, if you are one of those people, I hope to provide some examples through my use cases.

Because at least for me, LLMs can do many things. They cannot do everything, nor can they do most things. But the current models, as they exist now, provide me with considerable value.

After showcasing these examples, a common rebuttal I receive is, “But these tasks are simple! Any undergraduate in computer science could learn how to do this!” Indeed, undergraduates can tell me how to properly diagnose CUDA errors and which packages can be reinstalled after a few hours of searching. Undergraduates can rewrite that program in C after a few hours of work. Undergraduates can teach me anything I want to know about that subject after a few hours of studying the relevant textbooks. Unfortunately, I don’t have a magical undergraduate who can answer any question at my disposal. But I have language models. So, while language models are not yet good enough to solve the interesting parts of my work as a programmer, and the current models can only solve simple tasks.

But five years ago, the best LLMs could do was write paragraphs that looked like English. When they could form coherent thoughts from one sentence to the next, we were all amazed. Their practical utility was almost zero. However, today, they have increased my productivity in programming projects by at least 50% and eliminated enough tedious work to allow me to create things I otherwise would never have attempted.

Therefore, when people say, “LLMs are just hype” and “all LLMs have provided no real value to anyone,” it is clear they are mistaken because they have provided me with value. Now, perhaps I am an exception. Maybe I am the only one who has found ways to make these models useful.

I can only speak for myself.

But considering that LLMs have significantly improved my productivity—someone who already has 20 years of programming experience before using LLMs—I believe others can also benefit from using AI.

Original link: https://nicholas.carlini.com/writing/2024/how-i-use-ai.html#intro

Source from official media/network news

Previous Recommendations

Volcano Engine’s Evolution and Practice Based on Large Model ChatBI

AI Changes Work: Build Your Own RAG in One Day

Observability Research Practice of Multi-Agent Systems (OpenAI Swarm)

JD Data Architecture Analysis: Supply Chain Efficiency Improvement and Decision Optimization Strategy

Exploration Practice of Memory Optimization in Large Model Inference

Ant Group and My Bank’s Unified Dynamic Correction Framework: A New Perspective on Non-Random Missing Problem Optimization

Data Security and Utilization in the Era of Large Models

Breaking Down Data Access Barriers: Alluxio’s Applications and Practices in AI/ML Scenarios

Exclusive Interview with Li Feifei’s Disciple, Stanford AI Doctor, a16z Invests Millions, AI Video Monthly Revenue Soars 200%

Dialogue with Nvidia’s Jensen Huang: Machine Learning is Not Just About Software but Involves the Entire Data Pipeline; The Flywheel Effect of Machine Learning is Most Important

Give a look to your best view

SPRING HAS ARRIVED

The Original Text is as Follows:Author: Nicholas Carlini, Safety Researcher and Machine Learning Scientist at DeepMind

4. Code Simplification

Leave a Comment Cancel reply