FileDataContext
class great_expectations.data_context.FileDataContext(project_config: Optional[DataContextConfig] = None, context_root_dir: Optional[PathStr] = None, project_root_dir: Optional[PathStr] = None, runtime_environment: Optional[dict] = None)#
Subclass of AbstractDataContext that contains functionality necessary to work in a filesystem-backed environment.
classmethod _create(project_root_dir: Optional[PathStr] = None, runtime_environment: Optional[dict] = None) SerializableDataContext #
Build a new great_expectations directory and DataContext object in the provided project_root_dir.
create will create a new “great_expectations” directory in the provided folder, provided one does not already exist. Then, it will initialize a new DataContext in that folder and write the resulting config.
- Relevant Documentation Links
- Parameters:
project_root_dir – path to the root directory in which to create a new great_expectations directory
usage_statistics_enabled – boolean directive specifying whether or not to gather usage statistics
runtime_environment – a dictionary of config variables that override both those set in config_variables.yml and the environment
- Returns:
DataContext
add_data_docs_site(site_name: str, site_config: DataDocsSiteConfigTypedDict) None #
Add a new Data Docs Site to the DataContext.
New in version 0.17.2.
Example site config dicts can be found in our “Host and share Data Docs” guides.
- Parameters:
site_name – New site name to add.
site_config – Config dict for the new site.
add_datasource(name: str | None = None, initialize: bool = True, datasource: FluentDatasource | None = None, **kwargs) FluentDatasource | None #
Add a new Datasource to the data context, with configuration provided as kwargs.
- Relevant Documentation Links
- Parameters:
name – the name of the new Datasource to add
initialize – if False, add the Datasource to the config, but do not initialize it, for example if a user needs to debug database connectivity.
datasource –
an existing Datasource you wish to persist
New in version 0.15.49: Pass in an existing Datasource instead of individual constructor arguments
kwargs – the configuration for the new Datasource
- Returns:
Datasource instance added.
add_or_update_datasource(name: str | None = None, datasource: FluentDatasource | None = None, **kwargs) FluentDatasource #
Add a new Datasource or update an existing one on the context depending on whether
New in version 0.15.48.
it already exists or not. The configuration is provided as kwargs.
- Parameters:
name – The name of the Datasource to add or update.
datasource – an existing Datasource you wish to persist.
kwargs – Any relevant keyword args to use when adding or updating the target Datasource named name.
- Returns:
The Datasource added or updated by the input kwargs.
add_store(store_name: str, store_config: StoreConfigTypedDict) Store #
Add a new Store to the DataContext.
- Parameters:
store_name – the name to associate with the created store.
store_config – the config to use to construct the store.
- Returns:
The instantiated Store.
build_data_docs(site_names: list[str] | None = None, resource_identifiers: list | None = None, dry_run: bool = False, build_index: bool = True) dict #
Build Data Docs for your project.
- Relevant Documentation Links
- Parameters:
site_names – if specified, build data docs only for these sites, otherwise, build all the sites specified in the context’s config
resource_identifiers – a list of resource identifiers (ExpectationSuiteIdentifier, ValidationResultIdentifier). If specified, rebuild HTML (or other views the data docs sites are rendering) only for the resources in this list. This supports incremental build of data docs sites (e.g., when a new validation result is created) and avoids full rebuild.
dry_run – a flag, if True, the method returns a structure containing the URLs of the sites that would be built, but it does not build these sites.
build_index – a flag if False, skips building the index page
- Returns:
A dictionary with the names of the updated data documentation sites as keys and the location info of their index.html files as values
- Raises:
ClassInstantiationError – Site config in your Data Context config is not valid.
- delete_data_docs_site(site_name: str)#
Delete an existing Data Docs Site.
New in version 0.17.2.
- Parameters:
site_name – Site name to delete.
- delete_datasource(datasource_name: Optional[str])None #
Delete a given Datasource by name.
Note that this method causes deletion from the underlying DatasourceStore.
- Parameters:
datasource_name – The name of the target datasource.
- Raises:
ValueError – The datasource_name isn’t provided or cannot be found.
- delete_store(store_name: str)None #
Delete an existing Store from the DataContext.
New in version 0.15.48.
- Parameters:
store_name – The name of the Store to be deleted.
- Raises:
StoreConfigurationError if the target Store is not found. –
get_available_data_asset_names(datasource_names: str | list[str] | None = None, batch_kwargs_generator_names: str | list[str] | None = None) dict[str, BlockConfigDataAssetNames | FluentDataAssetNames] #
Inspect datasource and batch kwargs generators to provide available data_asset objects.
- Parameters:
datasource_names – List of datasources for which to provide available data asset name objects. If None, return available data assets for all datasources.
batch_kwargs_generator_names – List of batch kwargs generators for which to provide available data_asset_name objects.
- Returns:
Dictionary describing available data assets
- Return type:
data_asset_names
- Raises:
ValueError – datasource_names is not None, a string, or list of strings.
get_batch_list(datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, batch_request: Optional[BatchRequestBase] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[dict] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, partitioner_method: Optional[str] = None, partitioner_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, batch_parameters: Optional[Union[dict, BatchParameters]] = None, **kwargs: Optional[dict]) List[Batch] #
Get the list of zero or more batches, based on a variety of flexible input types.
get_batch_list is the main user-facing API for getting batches. In contrast to virtually all other methods in the class, it does not require typed or nested inputs. Instead, this method is intended to help the user pick the right parameters
This method attempts to return any number of batches, including an empty list.
- Parameters:
datasource_name – The name of the Datasource that defines the Data Asset to retrieve the batch for
data_connector_name – The Data Connector within the datasource for the Data Asset
data_asset_name – The name of the Data Asset within the Data Connector
batch_request – Encapsulates all the parameters used here to retrieve a BatchList. Use either batch_request or the other params (but not both)
batch_data – Provides runtime data for the batch; is added as the key batch_data to the runtime_parameters dictionary of a BatchRequest
query – Provides runtime data for the batch; is added as the key query to the runtime_parameters dictionary of a BatchRequest
path – Provides runtime data for the batch; is added as the key path to the runtime_parameters dictionary of a BatchRequest
runtime_parameters – Specifies runtime parameters for the BatchRequest; can includes keys batch_data, query, and path
data_connector_query – Used to specify connector query parameters; specifically batch_filter_parameters, limit, index, and custom_filter_function
batch_identifiers – Any identifiers of batches for the BatchRequest
batch_filter_parameters – Filter parameters used in the data connector query
limit – Part of the data_connector_query, limits the number of batches in the batch list
index – Part of the data_connector_query, used to specify the index of which batch to return. Negative numbers retrieve from the end of the list (ex: -1 retrieves the last or latest batch)
custom_filter_function – A Callable function that accepts batch_identifiers and returns a bool
sampling_method – The method used to sample Batch data (see: Partitioning and Sampling)
sampling_kwargs – Arguments for the sampling method
partitioner_method – The method used to partition the Data Asset into Batches
partitioner_kwargs – Arguments for the partitioning method
batch_spec_passthrough – Arguments specific to the ExecutionEngine that aid in Batch retrieval
batch_parameters – Options for FluentBatchRequest
**kwargs – Used to specify either batch_identifiers or batch_filter_parameters
- Returns:
(Batch) The list of requested Batch instances
- Raises:
DatasourceError – If the specified datasource_name does not exist in the DataContext
TypeError – If the specified types of the batch_request are not supported, or if the datasource_name is not a str
ValueError – If more than one exclusive parameter is specified (ex: specifing more than one of batch_data, query or path)
- get_datasource(datasource_name: str = 'default')great_expectations.datasource.fluent.interfaces.Datasource #
Retrieve a given Datasource by name from the context’s underlying DatasourceStore.
- Parameters:
datasource_name – The name of the target datasource.
- Returns:
The target datasource.
- Raises:
ValueError – The input datasource_name is None.
get_validator(datasource_name: Optional[str] = None, data_connector_name: Optional[str] = None, data_asset_name: Optional[str] = None, batch: Optional[Batch] = None, batch_list: Optional[List[Batch]] = None, batch_request: Optional[Union[BatchRequestBase, FluentBatchRequest]] = None, batch_request_list: Optional[List[BatchRequestBase]] = None, batch_data: Optional[Any] = None, data_connector_query: Optional[Union[IDDict, dict]] = None, batch_identifiers: Optional[dict] = None, limit: Optional[int] = None, index: Optional[Union[int, list, tuple, slice, str]] = None, custom_filter_function: Optional[Callable] = None, sampling_method: Optional[str] = None, sampling_kwargs: Optional[dict] = None, partitioner_method: Optional[str] = None, partitioner_kwargs: Optional[dict] = None, runtime_parameters: Optional[dict] = None, query: Optional[str] = None, path: Optional[str] = None, batch_filter_parameters: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, expectation_suite_id: Optional[str] = None, expectation_suite_name: Optional[str] = None, expectation_suite: Optional[ExpectationSuite] = None, create_expectation_suite_with_name: Optional[str] = None, **kwargs) Validator #
Retrieve a Validator with a batch list and an ExpectationSuite.
get_validator first calls get_batch_list to retrieve a batch list, then creates or retrieves an ExpectationSuite used to validate the Batches in the list.
- Parameters:
datasource_name – The name of the Datasource that defines the Data Asset to retrieve the batch for
data_connector_name – The Data Connector within the datasource for the Data Asset
data_asset_name – The name of the Data Asset within the Data Connector
batch – The Batch to use with the Validator
batch_list – The List of Batches to use with the Validator
batch_request – Encapsulates all the parameters used here to retrieve a BatchList. Use either batch_request or the other params (but not both)
batch_request_list – A List of BatchRequest to use with the Validator
batch_data – Provides runtime data for the batch; is added as the key batch_data to the runtime_parameters dictionary of a BatchRequest
query – Provides runtime data for the batch; is added as the key query to the runtime_parameters dictionary of a BatchRequest
path – Provides runtime data for the batch; is added as the key path to the runtime_parameters dictionary of a BatchRequest
runtime_parameters – Specifies runtime parameters for the BatchRequest; can includes keys batch_data, query, and path
data_connector_query – Used to specify connector query parameters; specifically batch_filter_parameters, limit, index, and custom_filter_function
batch_identifiers – Any identifiers of batches for the BatchRequest
batch_filter_parameters – Filter parameters used in the data connector query
limit – Part of the data_connector_query, limits the number of batches in the batch list
index – Part of the data_connector_query, used to specify the index of which batch to return. Negative numbers retrieve from the end of the list (ex: -1 retrieves the last or latest batch)
custom_filter_function – A Callable function that accepts batch_identifiers and returns a bool
sampling_method – The method used to sample Batch data (see: Partitioning and Sampling)
sampling_kwargs – Arguments for the sampling method
partitioner_method – The method used to partition the Data Asset into Batches
partitioner_kwargs – Arguments for the partitioning method
batch_spec_passthrough – Arguments specific to the ExecutionEngine that aid in Batch retrieval
expectation_suite_id – The identifier of the ExpectationSuite to retrieve from the DataContext (can be used in place of expectation_suite_name)
expectation_suite_name – The name of the ExpectationSuite to retrieve from the DataContext
expectation_suite – The ExpectationSuite to use with the validator
create_expectation_suite_with_name – Creates a Validator with a new ExpectationSuite with the provided name
**kwargs – Used to specify either batch_identifiers or batch_filter_parameters
- Returns:
A Validator with the specified Batch list and ExpectationSuite
- Return type:
- Raises:
DatasourceError – If the specified datasource_name does not exist in the DataContext
TypeError – If the specified types of the batch_request are not supported, or if the datasource_name is not a str
ValueError – If more than one exclusive parameter is specified (ex: specifing more than one of batch_data, query or path), or if the ExpectationSuite cannot be created or retrieved using either the provided name or identifier
- list_data_docs_sites()dict[str, DataDocsSiteConfigTypedDict] #
List all Data Docs Sites with configurations.
New in version 0.17.2.
- list_datasources()List[dict] #
List the configurations of the datasources associated with this context.
Note that any sensitive values are obfuscated before being returned.
- Returns:
A list of dictionaries representing datasource configurations.
update_data_docs_site(site_name: str, site_config: DataDocsSiteConfigTypedDict) None #
Update an existing Data Docs Site.
New in version 0.17.2.
Example site config dicts can be found in our “Host and share Data Docs” guides.
- Parameters:
site_name – Site name to update.
site_config – Config dict that replaces the existing.
- update_datasource(datasource: great_expectations.datasource.fluent.interfaces.Datasource)great_expectations.datasource.fluent.interfaces.Datasource #
Updates a Datasource that already exists in the store.
- Parameters:
datasource – The Datasource object to update.
- Returns:
The updated Datasource.
- update_project_config(project_config: DataContextConfig | Mapping)DataContextConfig #
Update the context’s config with the values from another config object.
- Parameters:
project_config – The config to use to update the context’s internal state.
- Returns:
The updated project config.
- view_validation_result(result: CheckpointResult)None #
New in version 0.16.15.
Opens a validation result in a browser.
- Parameters:
result – The result of a Checkpoint run.