# Source files

# Table of contents

# Overview

A Carbon source file is a sequence of Unicode code points in Unicode Normalization Form C ("NFC"), and represents a portion of the complete text of a program.

Program text can come from a variety of sources, such as an interactive programming environment (a so-called "Read-Evaluate-Print-Loop" or REPL), a database, a memory buffer of an IDE, or a command-line argument.

The canonical representation for Carbon programs is in files stored as a sequence of bytes in a file system on disk. Such files have a .carbon extension.

# Encoding

The on-disk representation of a Carbon source file is encoded in UTF-8. Such files may begin with an optional UTF-8 BOM, that is, the byte sequence EF16,BB16,BF16. This prefix, if present, is ignored.

No Unicode normalization is performed when reading an on-disk representation of a Carbon source file, so the byte representation is required to be normalized in Normalization Form C. The Carbon source formatting tool will convert source files to NFC as necessary.

# Alternatives considered

# References

© 2026 Carbon Language Documentation Hub. This is an unofficial community resource and is not affiliated with the Carbon Language project.