Glossary

A guide to common terms used throughout the San Diego Open Data Portal.

guides Beginner 3 min read

API Application Programming Interface

A set of protocols that allows software applications to communicate with each other. APIs enable developers to programmatically access and retrieve data without manually downloading files. Many datasets on this portal offer API endpoints that return data in JSON format.

CSV Comma-Separated Values

A simple file format that stores tabular data (rows and columns) as plain text. Each line represents a row, and values are separated by commas. CSV files can be opened with spreadsheet software like Microsoft Excel, Google Sheets, or LibreOffice Calc.

Data Dictionary

A document that describes the structure, content, and format of a dataset. It typically includes field names, data types, descriptions, and any codes or abbreviations used. Data dictionaries help users understand what each column in a dataset means.

Dataset

A collection of related data organized around a specific topic or purpose. For example, “Traffic Collisions” is a dataset containing records of vehicle accidents in San Diego. A dataset can be available in multiple file formats (distributions).

DCAT (Data Catalog Vocabulary)

An international standard (specifically DCAT-US, based on DCAT version 2) for describing datasets in data catalogs. This vocabulary ensures our metadata is consistent and interoperable with other government data portals.

Distribution

A specific representation of a dataset in a particular format. For example, a dataset might have distributions in CSV, JSON, and GeoJSON formats. Each distribution contains the same data but in a different file type optimized for different uses.

GeoJSON

A format for encoding geographic data structures using JavaScript Object Notation (JSON). GeoJSON supports points, lines, polygons, and other geometry types. It’s widely used in web mapping applications and can be easily visualized in tools like geojson.io.

JSON (JavaScript Object Notation)

A lightweight data format that is easy for both humans to read and machines to parse. JSON organizes data as key-value pairs and arrays. It’s the standard format for API responses and is widely used in web development.

Metadata

Information that describes a dataset, such as its title, description, publisher, update frequency, and license. Metadata helps users discover and understand datasets without having to examine the actual data files. Our portal follows the DCAT-US metadata standard.

Open Data

Data that is freely available for anyone to access, use, modify, and share for any purpose. Open data is typically published in machine-readable formats without access restrictions. The City of San Diego’s Open Data Policy guides our commitment to transparency.

Shapefile

A popular geospatial vector data format developed by Esri. Shapefiles store the location, shape, and attributes of geographic features. They’re commonly used in GIS (Geographic Information System) software like QGIS and ArcGIS. A shapefile actually consists of multiple files (.shp, .shx, .dbf, etc.) that must be kept together.