Getting started

Discover GTFS. Develop a basic understanding of the different entities, key concepts, and how to produce a GTFS dataset.

GTFS

The General Transit Feed Specification (GTFS) is an open data standard that allows transit agencies to produce data describing their transit service in a format that can be commonly understood and consumed by a variety of rider-facing software applications. Today GTFS is used by thousands of transit agencies worldwide, and is continually improving the rider experience.

GTFS is split into two components: GTFS Schedule and GTFS Realtime.

To get started let’s look at GTFS Schedule, the basis of GTFS.

GTFS Schedule

GTFS Schedule can be used to describe a transit system (i.e., agency, stops, routes, and trips) and the service schedules associated with it (i.e., operating days of a service, stop times, and frequency of service). Supplementary information, such as the path taken by a vehicle, transfers, fares, text translations, and navigation for in-station pathways, can also be described.

Below is an introduction to GTFS Schedule. For the complete specification, consult the GTFS Schedule reference.

Basics of a transit system

Before building a GTFS Schedule dataset, recall the basic elements that compose a transit system.

A typical transit system is operated by an agency, and has one or more routes with defined stops located along them. These routes are serviced by one or multiple trips per day that are scheduled to stop at the defined stops, at a certain time, on certain days.

The basic elements of a typical transit system can be understood intuitively by reading one of its timetables. See an example below from TriMet (Portland, Oregon, USA).

GTFS Schedule data concepts

Rider-facing software applications leverage GTFS Schedule as the language to describe the basic elements of a transit system to a computer in terms of data, and then to the rider as meaningful information for trip planning. GTFS Schedule accomplishes this by relying on some key data concepts.

Files

A GTFS Schedule dataset is composed of at least 6 text comma-separated values (CSV) files, with a .txt extension. These text files contain tables with headers specified by GTFS Schedule describing what kind of data should live in each column. Each subsequent row describes a unique GTFS Schedule data entry.

The 6 files that are required in a complete GTFS Schedule dataset describe the basic elements of a transit system:

Core GTFS dataset files

File name

Defines

agency.txt

The agency operating the transit service.

stops.txt

Stops where vehicles pick up or drop off riders.

routes.txt

Transit routes. A route is a group of trips that are displayed to riders as a single service.

trips.txt

Trips servicing each route. A trip is a sequence of two or more stops that are serviced by a vehicle at subsequent times.

stop_times.txt

Times that a vehicle arrives at and departs from stops along each trip.

calendar.txt

Days of the week that a service is offered, with start and end dates defining the period.

This file is required unless all dates of service are defined in calendar_dates.txt.

calendar_dates.txt

Exceptions to the service days defined in calendars.txt.

If calendar.txt is omitted, then calendar_dates.txt is required and must contain all dates of service.

Consult the complete GTFS Schedule reference here.

Unique IDs and ID referencing

The core GTFS dataset files describe the basic elements of a transit system in separate files, when in reality, the basic elements need to communicate with each other, across files, to yield meaningful information for how a transit system works. This communication is accomplished with unique IDs, that are associated to the row entries of each file.

A unique ID allows the data described across the entire row entry to be summarized by a single value. The unique ID can then be referenced by other files to “pull in”, or reference, the information described in the row of the referenced file, allowing for a concise and robust description of a transit system for trip planning purposes.

ID referencing structure for a core GTFS Schedule dataset

Using ID referencing, the structure for how GTFS Schedule data files communicate with each other is illustrated in the diagram below. The direction of the arrows indicate that a unique ID is being referenced from the preceding file, to the directed file.



As illustrated above:

  • the stop times reference the stops to which they belong, and the trip that the stop times are associated to;

  • the trips reference the day(s) of the week that they operate, and the route that the trip services; and finally,

  • the routes reference the agency that is providing the transit service.

Setting up a GTFS dataset

Writing a GTFS dataset by hand can be very complicated and tends to introduce errors. A GTFS editor aids the process. There are many open-source tools ranging from simple spreadsheets to tools requiring installation on a local server. Other tools may require payment and are hosted externally.

Find a list of these tools in the Awesome-Transit list maintained by the Center for Urban Transportation research at the University of South Florida (CUTR @ USF).

More resources to get started