WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Learning Center/Use Cases of Large Language Models (LLMs)/Lesson 8

Code Generation and Development Support

Introduction/overview

Key ideas

LLMs are highly beneficial to software development as they can suggest code snippets based on programming context, enhancing development speed and efficiency.
LLMs can generate entire functions or modules. However, the output always requires human oversight for verification.
LLMs can also translate a codebase from one language to another to help developers seeking to switch languages gain performance advantages or other benefits.
Beyond coding, LLMs can create and maintain relevant documentation and answer specific queries about the codebase so developers understand different parts of the codebase.

The previous module discussed how LLMs function, learn, and generate new text. On top of an LLM’s capability to generate responses from specific domains into more creative outputs, LLMs can also create code and assist in software development.

Aside from the most prominent example, code generation, LLMs can also take advantage of ingesting information, such as a codebase, and suggest documentation and suggestions for more optimal implementations. You can also add these systems to the continuous integration and deployment (CI/CD) pipelines to speed up your software development workflow.

This lesson will teach you how to use LLMs for code generation and development support. Here, you will see how LLMs can reduce implementation time by avoiding bugs and sifting through documentation.

Types of LLM code generation tasks for software development

One of the most immediate examples of using LLMs is code generation. These models can automate and enhance various aspects of coding, from providing real-time assistance to transforming entire codebases. In this section, we explore these aspects in detail.

Code completion to improve developer productivity

Code completion, or autocompletion, is a staple feature in many integrated development environments (IDEs) that LLMs have more recently powered to help programmers write code. It anticipates the next lines of code and presents the developer with context-aware suggestions.

This functionality guides developers through variable instantiation, functions, class definitions, and using libraries or APIs for faster and error-free coding sessions. This allows developers to focus more on the big picture and less on ensuring each code line outputs correctly. Consider it like having an expert friend over your shoulder.

Imagine typing range in a Python IDE; immediately, a prompt offers correct syntax options: range(start, end, step). It speeds up coding and helps with learning new programming constructs.

Code generation to automate complex coding tasks

LLMs elevate the coding experience by generating entire code segments, functions, or modules based on descriptive prompts. The underlying concept of code generation is that instead of figuring out and writing all lines of code manually, you provide the LLM with a description or objective of what you need, and it generates the corresponding code.

For instance, when prompted to "create a function that sorts a list of names alphabetically," an LLM can produce a complete, functional code snippet, which you can review and perfect if needed.

It's important to remember that while code generation can be tremendously helpful, it also requires human oversight. A developer should always review generated code to ensure it accomplishes the desired task optimally and to check for potential errors and logical inconsistencies.

Code translation to bridge programming language barriers

Code translation refers to using LLMs to convert a codebase written in one programming language to another. It's like having an interpreter for different coding languages. LLMs are invaluable for projects requiring transforming specific modules or a legacy codebase into a modern, efficient programming language.

Developers can also leverage LLMs to convert from more accessible development languages like Python to production-ready languages like C++.

Consider that you've developed a model in Python, taking advantage of its rich collection of ML libraries for rapid development and prototyping. Say you must put the model into production, where efficiency and speed are crucial.

You could use an LLM to translate the model's inference function into C++ and use the C++ equivalents of any acceleration libraries.

Building and maintaining documentation from codebases

Documentation is a crucial aspect of software development that helps developers navigate through codebases. LLMs can craft relevant, easily understandable documentation from codebases—see Google’s recent Gemini 1.5 Pro LLM—and ensure they remain updated over time.

Keeping API documentation updated directly benefits the software development team in both development time and reducing errors. As new software releases become publicly available, you can update the documentation immediately to reflect these changes.

Many platforms are available that can ingest entire codebases to create documentation. These platforms take documentation to the next level. With conversational interfaces, you can prompt it to answer questions with references to the documentation and link them to communication platforms like Slack.

Unit test generation for automating testing efficiency

Unit tests are pivotal for ensuring code reliability, but creating them can often be cumbersome. Many developers frequently view unit tests as tedious, despite their importance in verifying the correct operation of functions. This necessary evil is where LLMs can provide an automated and efficient solution.

LLMs are already helpful in developing various test cases that cover multiple scenarios. This reduces manual effort, minimizes oversights, and increases the thoroughness of the testing process. You can also use LLMs to write and implement complex testing strategies like monkey patching or mocking.

Consider a complex function in your code that interacts with an external service, like a database or a third-party API. To test this function, you should simulate different responses from this external service and see how your function behaves.

Scenarios like this are where concepts like monkey patching come into play to replace some aspects of the code during testing.

You could feed the function requirements and the monkey patching rules into an LLM. The LLM could then generate test scenarios and unit tests to validate your function against various possible responses from the external service.

Current tools

Several tools have emerged that incorporate LLM technology to enhance the entire software development process:

Coding

ChatGPT: The most popular choice. Although ChatGPT is primarily known for its conversational capabilities and is not explicitly advertised for coding and development, it can help write code and debug.
GitHub Copilot: Another popular choice that acts like a coding partner for context-aware code suggestions from entire lines or blocks of code.
Code Llama is a state-of-the-art family of LLMs from Meta AI built on Llama 2 that can generate code using text prompts. They provide stable generations with up to 100,000 tokens of context.
Tabnine: Another coding partner that has code completion, generation, and error handling.
Amazon CodeWhisperer: In addition to code suggestions, Amazon CodeWhisperer provides issue identification and resolution recommendations to elevate code quality and maintainability.
Starcoder 2: Starcoder is an open-source LLM that BigCode and NVIDIA developed to generate entire code modules for effective source code modeling and generation.
Diffblue: Tailored for Java developers, Diffblue automates unit test creation to reduce manual unit testing effort and improve code reliability.

Code review

Sourcery: This tool analyzes code to offer refactoring and optimization suggestions, emphasizing code quality and security enhancements.
What The Diff: Analyzes code changes and generates summaries in plain English.

Documentation tools

Kapa AI: A chatbot that links to your GitHub repository to answer questions regarding bugs, issues, or specific code implementations.
Multimodal: As a documentation generator, Multimodal leverages AI to produce comprehensive and up-to-date documentation to support developer understanding and project onboarding.

While LLMs provide immense value in augmenting the capabilities of software developers, it's vital to note they're not perfect. They may occasionally "hallucinate," generating outputs that don't accurately align with the given input. As such, you should use LLMs as valuable tools that enhance their efficiency rather than complete solutions for their tasks.

In Lesson 9, you will learn how to mitigate these hallucinations using the retrieval-augmented generation (RAG) approach.

Additionally, the potential applications of these powerful tools stretch far beyond the immediate coding and development stages. The array of tools we have explored shows you can integrate LLMs throughout development.

Before incorporating these tools into your workflow, investigate how each platform safeguards your data and prevents accidental intellectual property leaks. You want to maintain the integrity of your work while taking full advantage of the efficiencies that LLMs bring. In module 6, you will learn more about LLM security.