![]() |
Matillion Data Model for Google BigQuery
|
string
"Auto"
string
""
When a query refers to a table it can leave the project implicit, or qualify
the project directly as the catalog portion of the table:
/* Implicit, resolved against connection string */ SELECT FirstName, LastName FROM `Northwind`.`customers` /* Explicit, project specified as catalog */ SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`
If the query contains unqualified table references then they are resolved this way:
string
""
The DatasetId of the dataset you wish to connect to and view tables of. You can find this value in Google BigQuery, after selecting a project.
After connecting to an initial dataset, you can query the Datasets view to discover the Ids for all datasets in a project. Along with ProjectId.
SELECT * FROM choff04.TestDataset.join_left INNER JOIN join_rightIn this case, the provider assumes that join_right comes from TestDataset. Or you can connect with DatasetId=TestDataset;ProjectId=choff04:
SELECT * FROM join_left INNER JOIN join_rightNote that the query is not consulted when QueryPassthrough is enabled. So you either must set the connection ProjectId and DatasetId or qualify each individual table; otherwise the SELECT query fails.
string
""
The ProjectId of the billing project for executing jobs. You can obtain the project Id in the Google APIs console: In the main menu, click API Project and copy the Id. For example: psychic-valve-137816. (Note that your domain name is not part of the Id.)
The billing project is determined based on the value of this property, the ProjectId and the query:
SELECT FirstName, LastName FROM `psychic-valve-137816`.`Northwind`.`customers`
bool
true
Whether or not to allow large datasets to be stored in temporary tables for large datasets.
string
""
Google BigQuery queries have a maximum amount of data they are allowed to return directly. If this limit is exceeded, then queries will fail with an error message like Response too large to return. When this option is enabled the response limit does not apply, because all query responses are stored in a Google BigQuery table before being returned.
This option is set differently depending upon whether your connection is using UseLegacySQL or not. By default this option is set using the standard SQL syntax:
DestinationTable=project-name.dataset-name.table-name
When UseLegacySQL is enabled, this option is set using the legacy table syntax:
DestinationTable=project-name:dataset-name.table-name
When using this option with multiple connections, make sure that each connection has its own destination table. Sharing a table between connections can lead to results getting lost because parallel queries can overwrite each others results.
bool
true
Google BigQuery will cache the results of recent queries, and will use this cache for queries by default. Google BigQuery automatically updates the cache when a table is modified, so performance is generally better without any risk of queries returning stale data.
If this is set to false, the query is always run against the table directly.
string
"100000"
The pagesize can control the number of results returned per page from Google BigQuery. Setting a higher pagesize will cause more data to come back in a single HTTP request, but may take longer to execute. Setting a smaller pagesize will increase the number of HTTP requests to get all the data, but is generally recommended to ensure timeout exceptions do not occur.
Note that this option does not have an effect if RSBGoogleBigQuery_p_UseStorageAPI is enabled and the queries being executed can be executed on the Storage API. See StoragePageSize for more information.
string
"1"
This only applies to queries which are stored to a table instead of streamed directly to the driver. This applies in only three cases:
This property determines how long to wait between checking whether or not the query's results are ready. Very large resultsets or complex queries may take longer to process, and a low polling interval may result in may unnecessary requests being made to check the query status.
bool
false
Whether or not to allow update without primary keys.
string
""
Remember setting `AllowUpdatesWithoutKey` to true before you could use this property:
Set the property like this:
`filterColumns=col1[,col2[,col3]];`
bool
false
If set to true, the query will use BigQuery's Legacy SQL dialect to rebuild the query.
If set to false, the query will use BigQuery's standard SQL: https://cloud.google.com/bigquery/sql-reference/.
When UseLegacySQL is set to false, the values of AllowLargeResultSets is ignored. The query will be run as if AllowLargeResultSets is true.
bool
true
By default the driver will use the Storage API instead of the default REST API. Depending upon the complexity of the query, the driver may execute the query in one of two ways:
The BigQuery Storage API can read data faster and more efficiently than the REST API (accessible by setting this option to false), but is priced differently and requires extra OAuth permissions when using your own OAuth app. It also uses the separate StoragePageSize property instead of PageSize.
The BigQuery REST API requires no extra permissions and uses standard pricing, but is slower than the Storage API.
bool
false
This property only has an effect when RSBGoogleBigQuery_p_UseStorageAPI is enabled. When performing reads against the Storage API, the driver can request data in different formats. By default it uses Avro but enabling this option makes it use Arrow.
This option should be enabled when working with time series data or other datasets that have many date, time, datetime or timestamp fields. For these datasets using Arrow can have noticable improvements over using Avro. Otherwise Avro and Arrow read times are very close and switching between them is unlikely to make a significant difference.
string
"100000"
When the driver receives a query too complex to be run directly in the Storage API, it creates a query job and uses the Storage API to read from the query results table. If the query job returns fewer than the number of rows provided in this option, then the results are returned directly and the Storage API is not used.
This value should be set between 1 and 100000. Higher values will use the Storage API only for large resultsets, but will be delayed by reading more results from the query job. Lower values will result in smaller delays but will use the Storage API for more queries.
Note that this option only has an effect if RSBGoogleBigQuery_p_UseStorageAPI is enabled and the queries being executed cannot be executed directly on the Storage API. Queries which run directly on Storage never create query jobs.
string
"10000"
When RSBGoogleBigQuery_p_UseStorageAPI is enabled and the query being executed can be run on the Storage API, this option controls how many rows the driver is allowed to buffer on the client.
A higher value will generally make queries faster at the expense of consuming more memory, while lower values will conserve memory but make queries slower.
string
"Streaming"
This section provides only a summary of the mechanisms that each of these modes use. Please see Advanced Integrations for more details on how to use each of these modes.
When UseLegacySQL is true only Streaming and Upload modes are allowed. The Legacy SQL dialect does not support DML statements.
bool
true
This property determines whether the driver will wait for batch jobs to report their status. By default property is true and INSERT queries will complete only once Google BigQuery has finished executed them. When this property is false the INSERT query will complete as soon as a job is submitted for it.
The default mode is recommended for reliability:
You can disable this option to achieve lower delays when inserting, but you must also make sure to obey the Google BigQuery rate limits and check the status of each job to track their status and determine whether they have succeeded or failed.
string
""
Only applies when InsertMode is set to GCSStaging, and if that option is set to use staging then this option is required.
string
""
Only applies when InsertMode is set to GCSStaging. If not set the driver defaults to writing to the root of the bucket.
string
"_CDataTempTableDataset"
Internally bulk UPDATE and DELETE use Google BigQuery MERGE queries, which require creating a table to hold all the update operations. This option is used along with the target table's region to determine the name of the dataset where these temporary tables are created. Each region must have its own temporary dataset so that the temporary table and the MERGE table can be stored in the same project/dataset. This avoids unnecessary data transfer charges.
For example, the driver would create a dataset called "_CDataTempTableDataset_US" for tables in the US region and a dataset called "_CDataTempTableDataset_asia_southeast_1" for tables in the Singapore region.
string
"OFF"
The following options are available:
string
""
As part of registering an OAuth application, you will receive the OAuthClientId value, sometimes also called a consumer key, and a client secret, the OAuthClientSecret.
string
""
As part of registering an OAuth application, you will receive the OAuthClientId, also called a consumer key. You will also receive a client secret, also called a consumer secret. Set the client secret in the OAuthClientSecret property.
string
""
The OAuthAccessToken property is used to connect using OAuth. The OAuthAccessToken is retrieved from the OAuth server as part of the authentication process. It has a server-dependent timeout and can be reused between requests.
The access token is used in place of your user name and password. The access token protects your credentials by keeping them on the server.
string
"%APPDATA%\\CData\\GoogleBigQuery Data Provider\\OAuthSettings.txt"
When InitiateOAuth is set to GETANDREFRESH or REFRESH, the driver saves OAuth values to avoid requiring the user to manually enter OAuth connection properties and allowing the credentials to be shared across connections or processes.
Alternatively to specifying a file path, memory storage can be used instead. Memory locations are specified by using a value starting with 'memory://' followed by a unique identifier for that set of credentials (ex: memory://user1). The identifier can be anything you choose but should be unique to the user. Unlike with the file based storage, you must manually store the credentials when closing the connection with memory storage to be able to set them in the connection when the process is started again. The OAuth property values can be retrieved with a query to the sys_connection_props system table. If there are multiple connections using the same credentials, the properties should be read from the last connection to be closed.
If left unspecified, the default location is "%APPDATA%\\CData\\GoogleBigQuery Data Provider\\OAuthSettings.txt" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Mac | ~/Library/Application Support |
Linux | ~/.config |
string
""
By default the driver requests only the scope https://www.googleapis.com/auth/bigquery, but in some cases more scopes are required. For example, when reading data using an external table connected to Google Drive, the additional scope https://www.googleapis.com/auth/drive is also required.
string
""
The verifier code returned from the OAuth authorization URL. This can be used on systems where a browser cannot be launched such as headless systems.
See Establishing a Connection to obtain the OAuthVerifier value.
Set OAuthSettingsLocation along with OAuthVerifier. When you connect, the driver exchanges the OAuthVerifier for the OAuth authentication tokens and saves them, encrypted, to the specified file. Set InitiateOAuth to GETANDREFRESH automate the exchange.
Once the OAuth settings file has been generated, you can remove OAuthVerifier from the connection properties and connect with OAuthSettingsLocation set.
To automatically refresh the OAuth token values, set OAuthSettingsLocation and additionally set InitiateOAuth to REFRESH.
string
""
The OAuthRefreshToken property is used to refresh the OAuthAccessToken when using OAuth authentication.
string
""
Pair with OAuthTokenTimestamp to determine when the AccessToken will expire.
string
""
Pair with OAuthExpiresIn to determine when the AccessToken will expire.
string
""
The name of the certificate store for the client certificate.
The OAuthJWTCertType field specifies the type of the certificate store specified by OAuthJWTCert. If the store is password protected, specify the password in OAuthJWTCertPassword.
OAuthJWTCert is used in conjunction with the OAuthJWTCertSubject field in order to specify client certificates. If OAuthJWTCert has a value, and OAuthJWTCertSubject is set, a search for a certificate is initiated. Please refer to the OAuthJWTCertSubject field for details.
Designations of certificate stores are platform-dependent.
The following are designations of the most common User and Machine certificate stores in Windows:
MY | A certificate store holding personal certificates with their associated private keys. |
CA | Certifying authority certificates. |
ROOT | Root certificates. |
SPC | Software publisher certificates. |
In Java, the certificate store normally is a file containing certificates and optional private keys.
When the certificate store type is PFXFile, this property must be set to the name of the file. When the type is PFXBlob, the property must be set to the binary contents of a PFX file (i.e. PKCS12 certificate store).
string
"GOOGLEJSON"
This property can take one of the following values:
USER | For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note: This store type is not available in Java. |
MACHINE | For Windows, this specifies that the certificate store is a machine store. Note: this store type is not available in Java. |
PFXFILE | The certificate store is the name of a PFX (PKCS12) file containing certificates. |
PFXBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in PFX (PKCS12) format. |
JKSFILE | The certificate store is the name of a Java key store (JKS) file containing certificates. Note: this store type is only available in Java. |
JKSBLOB | The certificate store is a string (base-64-encoded) representing a certificate store in Java key store (JKS) format. Note: this store type is only available in Java. |
PEMKEY_FILE | The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate. |
PEMKEY_BLOB | The certificate store is a string (base64-encoded) that contains a private key and an optional certificate. |
PUBLIC_KEY_FILE | The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate. |
PUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains a PEM- or DER-encoded public key certificate. |
SSHPUBLIC_KEY_FILE | The certificate store is the name of a file that contains an SSH-style public key. |
SSHPUBLIC_KEY_BLOB | The certificate store is a string (base-64-encoded) that contains an SSH-style public key. |
P7BFILE | The certificate store is the name of a PKCS7 file containing certificates. |
PPKFILE | The certificate store is the name of a file that contains a PPK (PuTTY Private Key). |
XMLFILE | The certificate store is the name of a file that contains a certificate in XML format. |
XMLBLOB | The certificate store is a string that contains a certificate in XML format. |
GOOGLEJSON | The certificate store is the name of a JSON file containing the service account information. Only valid when connecting to a Google service. |
GOOGLEJSONBLOB | The certificate store is a string that contains the service account JSON. Only valid when connecting to a Google service. |
string
""
If the certificate store is of a type that requires a password, this property is used to specify that password in order to open the certificate store.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys are not encrypted.
string
"*"
When loading a certificate the subject is used to locate the certificate in the store.
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value "*" picks the first certificate in the certificate store.
The certificate subject is a comma separated list of distinguished name fields and values. For instance "CN=www.server.com, OU=test, C=US, E=support@cdata.com". Common fields and their meanings are displayed below.
Field | Meaning |
CN | Common Name. This is commonly a host name like www.server.com. |
O | Organization |
OU | Organizational Unit |
L | Locality |
S | State |
C | Country |
E | Email Address |
If a field value contains a comma it must be quoted.
string
""
The issuer of the Java Web Token. This is typically either the Client Id or Email Address of the OAuth Application.
This is not required when using the GOOGLEJSON OAuthJWTCertType. Google JSON keys contain a copy of the issuer account.
string
""
The user subject for which the application is requesting delegated access. Typically, the user account name or email address.
string
""
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Certificates are validated as trusted by the machine based on the System's trust store. The trust store used is the 'javax.net.ssl.trustStore' value specified for the system. If no value is specified for this property, Java's default trust store is used (for example, JAVA_HOME\lib\security\cacerts).
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
string
"NONE"
This property specifies the protocol that the driver will use to tunnel traffic through the FirewallServer proxy. Note that by default, the driver connects to the system proxy; to disable this behavior and connect to one of the following proxy types, set ProxyAutoDetect to false.
Type | Default Port | Description |
TUNNEL | 80 | When this is set, the driver opens a connection to Google BigQuery and traffic flows back and forth through the proxy. |
SOCKS4 | 1080 | When this is set, the driver sends data through the SOCKS 4 proxy specified by FirewallServer and FirewallPort and passes the FirewallUser value to the proxy, which determines if the connection request should be granted. |
SOCKS5 | 1080 | When this is set, the driver sends data through the SOCKS 5 proxy specified by FirewallServer and FirewallPort. If your proxy requires authentication, set FirewallUser and FirewallPassword to credentials the proxy recognizes. |
To connect to HTTP proxies, use ProxyServer and ProxyPort. To authenticate to HTTP proxies, use ProxyAuthScheme, ProxyUser, and ProxyPassword.
string
""
This property specifies the IP address, DNS name, or host name of a proxy allowing traversal of a firewall. The protocol is specified by FirewallType: Use FirewallServer with this property to connect through SOCKS or do tunneling. Use ProxyServer to connect to an HTTP proxy.
Note that the driver uses the system proxy by default. To use a different proxy, set ProxyAutoDetect to false.
int
0
This specifies the TCP port for a proxy allowing traversal of a firewall. Use FirewallServer to specify the name or IP address. Specify the protocol with FirewallType.
string
""
The FirewallUser and FirewallPassword properties are used to authenticate against the proxy specified in FirewallServer and FirewallPort, following the authentication method specified in FirewallType.
string
""
This property is passed to the proxy specified by FirewallServer and FirewallPort, following the authentication method specified by FirewallType.
bool
false
This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
NOTE: When this property is set to True, the proxy used is determined as follows:
To connect to an HTTP proxy, see ProxyServer. For other proxies, such as SOCKS or tunneling, see FirewallType.
string
""
The hostname or IP address of a proxy to route HTTP traffic through. The driver can use the HTTP, Windows (NTLM), or Kerberos authentication types to authenticate to an HTTP proxy.
If you need to connect through a SOCKS proxy or tunnel the connection, see FirewallType.
By default, the driver uses the system proxy. If you need to use another proxy, set ProxyAutoDetect to false.
int
80
The port the HTTP proxy is running on that you want to redirect HTTP traffic through. Specify the HTTP proxy in ProxyServer. For other proxy types, see FirewallType.
string
"BASIC"
This value specifies the authentication type to use to authenticate to the HTTP proxy specified by ProxyServer and ProxyPort.
Note that the driver will use the system proxy settings by default, without further configuration needed; if you want to connect to another proxy, you will need to set ProxyAutoDetect to false, in addition to ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
The authentication type can be one of the following:
If you need to use another authentication type, such as SOCKS 5 authentication, see FirewallType.
string
""
The ProxyUser and ProxyPassword options are used to connect and authenticate against the HTTP proxy specified in ProxyServer.
You can select one of the available authentication types in ProxyAuthScheme. If you are using HTTP authentication, set this to the user name of a user recognized by the HTTP proxy. If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:
user@domain domain\user
string
""
This property is used to authenticate to an HTTP proxy server that supports NTLM (Windows), Kerberos, or HTTP authentication. To specify the HTTP proxy, you can set ProxyServer and ProxyPort. To specify the authentication type, set ProxyAuthScheme.
If you are using HTTP authentication, additionally set ProxyUser and ProxyPassword to HTTP proxy.
If you are using NTLM authentication, set ProxyUser and ProxyPassword to your Windows password. You may also need these to complete Kerberos authentication.
For SOCKS 5 authentication or tunneling, see FirewallType.
By default, the driver uses the system proxy. If you want to connect to another proxy, set ProxyAutoDetect to false.
string
"AUTO"
This property determines when to use SSL for the connection to an HTTP proxy specified by ProxyServer. This value can be AUTO, ALWAYS, NEVER, or TUNNEL. The applicable values are the following:
AUTO | Default setting. If the URL is an HTTPS URL, the driver will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option. |
ALWAYS | The connection is always SSL enabled. |
NEVER | The connection is not SSL enabled. |
TUNNEL | The connection is through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows back and forth through the proxy. |
string
""
The ProxyServer is used for all addresses, except for addresses defined in this property. Use semicolons to separate entries.
Note that the driver uses the system proxy settings by default, without further configuration needed; if you want to explicitly configure proxy exceptions for this connection, you need to set ProxyAutoDetect = false, and configure ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
string
""
Once this property is set, the driver will populate the log file as it carries out various tasks, such as when authentication is performed or queries are executed. If the specified file doesn't already exist, it will be created.
Connection strings and version information are also logged, though connection properties containing sensitive information are masked automatically.
If a relative filepath is supplied, the location of the log file will be resolved based on the path found in the Location connection property.
For more control over what is written to the log file, you can adjust the Verbosity property.
Log contents are categorized into several modules. You can show/hide individual modules using the LogModules property.
To edit the maximum size of a single logfile before a new one is created, see MaxLogFileSize.
If you would like to place a cap on the number of logfiles generated, use MaxLogFileCount.
Java logging is also supported. To enable Java logging, set Logfile to:
Logfile=JAVALOG://myloggername
As in the above sample, JAVALOG:// is a required prefix to use Java logging, and you will substitute your own Logger.
The supplied Logger's getLogger method is then called, using the supplied value to create the Logger instance. If a logging instance already exists, it will reference the existing instance.
When Java logging is enabled, the Verbosity will now correspond to specific logging levels.
string
"1"
The verbosity level determines the amount of detail that the driver reports to the Logfile. Verbosity levels from 1 to 5 are supported. These are detailed in the Logging page.
string
""
Only the modules specified (separated by ';') will be included in the log file. By default all modules are included.
See the Logging page for an overview.
string
"100MB"
When the limit is hit, a new log is created in the same folder with the date and time appended to the end. The default limit is 100 MB. Values lower than 100 kB will use 100 kB as the value instead.
Adjust the maximum number of logfiles generated with MaxLogFileCount.
int
-1
When the limit is hit, a new log is created in the same folder with the date and time appended to the end and the oldest log file will be deleted.
The minimum supported value is 2. A value of 0 or a negative value indicates no limit on the count.
Adjust the maximum size of the logfiles generated with MaxLogFileSize.
string
"%APPDATA%\\CData\\GoogleBigQuery Data Provider\\Schema"
The path to a directory which contains the schema files for the driver (.rsd files for tables and views, .rsb files for stored procedures). The folder location can be a relative path from the location of the executable. The Location property is only needed if you want to customize definitions (for example, change a column name, ignore a column, and so on) or extend the data model with new tables, views, or stored procedures.
If left unspecified, the default location is "%APPDATA%\\CData\\GoogleBigQuery Data Provider\\Schema" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Mac | ~/Library/Application Support |
Linux | ~/.config |
string
""
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
string
""
Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the driver.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the tables you want in a comma-separated list. Each table should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
string
""
Listing the views from some databases can be expensive. Providing a list of views in the connection string improves the performance of the driver.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the views you want in a comma-separated list. Each view should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
bool
true
When using BigQuery views, BigQuery stores a copy of the view schema with the view itself. However, these stored view schemas are not updated when the tables used by the view change. This means that the stored view schema can easily become out of date and cause queries using the view to fail.
By default, the driver will not use the stored view schema and will instead query the view to determine the available columns. This guarantees that the schema will be up to date although it requires the driver to start a query job.
You can disable this option to force the driver to use the stored view schemas. This prevents the driver from running any queries when getting a view schema, but also means that queries using the view will fail if the schema is out of date.
bool
false
By default table descriptions are not shown, since the Google BigQuery API requires an extra request beyond what is usually required for reading tables.
Enabling this option will show table descriptions, but will cost an extra API request for every table when a table list is fetched. This can slow down metadata operations on large datasets.
string
""
Google BigQuery does not natively support primary keys, but for certain DML operations or database tools you may need to define them. By default this option is disabled and no tables will have primary keys except for the ones defined in schema files (if you set Location).
Primary keys are defined using a list of rules which match tables and provide a list of key columns. For example, PrimaryKeyIdentifiers="*=key;transactions=tx_date,tx_serial;user_comments=" has three rules separated by semicolons:
Note that the table names can include just the table, the table and dataset or the table, dataset and project.
Both column and table names may be quoted using SQL quotes:
/* Rules with just table names use the connection ProjectId (or DataProjectId) and DatasetId. All these rules refer to the same table with a connection where ProjectId=someProject;DatasetId=someDataset */ someTable=a,b,c someDataset.someTable=a,b,c someProject.someDataset.someTable=a,b,c /* Any table or column name may be quoted */ `someProject`."someDataset".[someTable]=`a`,[b],"c"
string
"TABLE,EXTERNAL,VIEW,MATERIALIZED_VIEW"
This option is a comma-separated list of the table type values that the driver displays. Any table-like or view-like entity that doesn't have a matching type will not be reported when listing tables.
For example, to restrict the driver to listing only simple tables and views, this option would be set to TABLE,VIEW
bool
false
When AutoCache = true, the driver automatically maintains a cache of your table's data in the database of your choice.
When AutoCache = true, the driver caches to a simple, file-based cache. You can configure its location or cache to a different database with the following properties:
string
""
You can cache to any database for which you have a JDBC driver, including CData JDBC drivers.
The cache database is determined based on the CacheDriver and CacheConnection properties. The CacheDriver is the name of the JDBC driver class that you want to use to cache data.
Note that you must also add the CacheDriver JAR file to the classpath.
The following examples show how to cache to several major databases. Refer to CacheConnection for more information on the JDBC URL syntax and typical connection properties.
The driver simplifies Derby configuration. Java DB is the Oracle distribution of Derby. The JAR file is shipped in the JDK. You can find the JAR file, derby.jar, in the db subfolder of the JDK installation. In most caching scenarios, you need to specify only the following, after adding derby.jar to the classpath:
jdbc:googlebigquery:CacheLocation='c:/Temp/cachedir';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;To customize the Derby JDBC URL, use CacheDriver and CacheConnection. For example, to cache to an in-memory database, use a JDBC URL like the following:
jdbc:googlebigquery:CacheDriver=org.apache.derby.jdbc.EmbeddedDriver;CacheConnection='jdbc:derby:memory';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;
The following is a JDBC URL for the SQLite JDBC driver:
jdbc:googlebigquery:CacheDriver=org.sqlite.JDBC;CacheConnection='jdbc:sqlite:C:/Temp/sqlite.db';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;
The following is a JDBC URL for the included CData JDBC Driver for MySQL:
jdbc:googlebigquery:Cache Driver=cdata.jdbc.mysql.MySQLDriver;Cache Connection='jdbc:mysql:Server=localhost;Port=3306;Database=cache;User=root;Password=123456';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;
The following JDBC URL uses the Microsoft JDBC Driver for SQL Server:
jdbc:googlebigquery:Cache Driver=com.microsoft.sqlserver.jdbc.SQLServerDriver;Cache Connection='jdbc:sqlserver://localhost\sqlexpress:7437;user=sa;password=123456;databaseName=Cache';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;
The following is a JDBC URL for the Oracle Thin Client:
jdbc:googlebigquery:Cache Driver=oracle.jdbc.OracleDriver;CacheConnection='jdbc:oracle:thin:scott/tiger@localhost:1521:orcldb';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;
NOTE: If using a version of Oracle older than 9i, the cache driver will instead be oracle.jdbc.driver.OracleDriver .
The following JDBC URL uses the official PostgreSQL JDBC driver:
jdbc:googlebigquery:CacheDriver=cdata.jdbc.postgresql.PostgreSQLDriver;CacheConnection='jdbc:postgresql:User=postgres;Password=admin;Database=postgres;Server=localhost;Port=5432;';InitiateOAuth=GETANDREFRESH;ProjectId=NameOfProject;DatasetId=NameOfDataset;
string
""
The cache database is determined based on the CacheDriver and CacheConnection properties. Both properties are required to use the cache database. Examples of common cache database settings can be found below. For more information on setting the caching database's driver, refer to CacheDriver.
The connection string specified in the CacheConnection property is passed directly to the underlying CacheDriver. Consult the documentation for the specific JDBC driver for more information on the available properties. Make sure to include the JDBC driver in your application's classpath.
The driver simplifies caching to Derby, only requiring you to set the CacheLocation property to make a basic connection.
Alternatively, you can configure the connection to Derby manually using CacheDriver and CacheConnection. The following is the Derby JDBC URL syntax:
jdbc:derby:[subsubprotocol:][databaseName][;attribute=value[;attribute=value] ... ]
For example, to cache to an in-memory database, use the following:
jdbc:derby:memory
To cache to SQLite, you can use the SQLite JDBC driver. The following is the syntax of the JDBC URL:
jdbc:sqlite:dataSource
The installation includes the CData JDBC Driver for MySQL. The following is an example JDBC URL:
jdbc:mysql:User=root;Password=root;Server=localhost;Port=3306;Database=cache
The following are typical connection properties:
The JDBC URL for the Microsoft JDBC Driver for SQL Server has the following syntax:
jdbc:sqlserver://[serverName[\instance][:port]][;database=databaseName][;property=value[;property=value] ... ]
For example:
jdbc:sqlserver://localhost\sqlexpress:1433;integratedSecurity=true
The following are typical SQL Server connection properties:
To use integrated security, you will also need to add sqljdbc_auth.dll to a folder on the Windows system path. This file is located in the auth subfolder of the Microsoft JDBC Driver for SQL Server installation. The bitness of the assembly must match the bitness of your JVM.
The following is the conventional JDBC URL syntax for the Oracle JDBC Thin driver:
jdbc:oracle:thin:[userId/password]@[//]host[[:port][:sid]]
For example:
jdbc:oracle:thin:scott/tiger@myhost:1521:orcl
The following are typical connection properties:
Data Source: The connect descriptor that identifies the Oracle database. This can be a TNS connect descriptor, an Oracle Net Services name that resolves to a connect descriptor, or, after version 11g, an Easy Connect naming (the host name of the Oracle server with an optional port and service name).
The following is the JDBC URL syntax for the official PostgreSQL JDBC driver:
jdbc:postgresql:[//[host[:port]]/]database[[?option=value][[&option=value][&option=value] ... ]]
For example, the following connection string connects to a database on the default host (localhost) and port (5432):
jdbc:postgresql:postgres
The following are typical connection properties:
string
"%APPDATA%\\CData\\GoogleBigQuery Data Provider"
The CacheLocation is a simple, file-based cache. The driver uses Java DB, Oracle's distribution of the Derby database. To cache to Java DB, you will need to add the Java DB JAR file to the classpath. The JAR file, derby.jar, is shipped in the JDK and located in the db subfolder of the JDK installation.
If left unspecified, the default location is "%APPDATA%\\CData\\GoogleBigQuery Data Provider" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Mac | ~/Library/Application Support |
Linux | ~/.config |
int
600
The tolerance for stale data in the cache specified in seconds. This only applies when AutoCache is used. The driver checks with the data source for newer records after the tolerance interval has expired. Otherwise, it returns the data directly from the cache.
bool
false
When Offline = true, all queries execute against the cache as opposed to the live data source. In this mode, certain queries like INSERT, UPDATE, DELETE, and CACHE are not allowed.
bool
false
As you execute queries with this property set, table metadata in the Google BigQuery catalog are cached to the file store specified by CacheLocation if set or the user's home directory otherwise. A table's metadata will be retrieved only once, when the table is queried for the first time.
The driver automatically persists metadata in memory for up to two hours when you first discover the metadata for a table or view and therefore, CacheMetadata is generally not required. CacheMetadata becomes useful when metadata operations are expensive such as when you are working with large amounts of metadata or when you have many short-lived connections.
string
"60"
Google BigQuery and many proxies/firewalls restrict the amount of time that idle connections stay alive before they are forcibly closed. This can be a problem when using the Storage API because the driver may stream data faster than it can be consumed. While the consumer is catching up, the driver does not use its connection and it may be closed by the next time the driver uses it.
To avoid this the driver will automatically close and reopen the connection if it has been idle for too long. This property controls how many seconds the connection has to be idle for the driver to reset it. To disable these resets this property can also set to 0 or a negative value.
bool
false
This option affects how string parameters are handled when using direct queries through QueryPassthrough. For example, consider this query:
INSERT INTO proj.data.tbl(x) VALUES (@x)
By default, this option is disabled and string parameters are quoted and escaped into SQL strings. That means that any value can be safely used as a string parameter, but it also means that parameters cannot be used as raw aggregate values:
/* * If @x is set to: test value ' contains quote * * Result is a valid query */ INSERT INTO proj.data.tbl(x) VALUES ('test value \' contains quote') /* * If @x is set to: ['valid', ('aggregate', 'value')] * * Result contains string instead of aggregate: */ INSERT INTO proj.data.tbl(x) VALUES ('[\'valid\', (\'aggregate\', \'value\')]')
When this option is enabled, string parameters are inserted directly into the query. This means that raw aggregates can be used as parameters, but it also means that all simple strings must be escaped:
/* * If @x is set to: test value ' contains quote * * Result is an invalid query */ INSERT INTO proj.data.tbl(x) VALUES (test value ' contains quote) /* * If @x is set to: ['valid', ('aggregate', 'value')] * * Result is an aggregate */ INSERT INTO proj.data.tbl(x) VALUES (['valid', ('aggregate', 'value')])
string
"1000"
When auditing is enabled with the AuditMode option, this property is used to determine how many rows will be allowed in the audit table at once.
By default this property is 1000, meaning that only the 1000 most recent audit events will be available within the audit table.
This property can also be set to -1, which places no limits on the size of the audit
table. In this mode, the audit table should be periodically cleared to prevent the
driver from using excessive memory.
DELETE FROM AuditJobs#TEMP
string
""
The driver can record certain internal actions taken when it runs queries. For each of those actions listed in this option, the driver will create a temproary audit table which logs when the action took place, what query caused the action and any other relevant information.
By default this option is set to 'none' and the driver does not record any audit information. This option can also be set to a comma-separated list of the following actions:
Mode Name | Audit Table | Description | Columns |
start-jobs | AuditJobs#TEMP | Records all jobs started by the driver | Timestamp,Query,ProjectId,Location,JobId |
Refer to AuditLimit for more information on how to limit the size of these tables.
string
""
A list of Google BigQuery options:
Option | Description |
gbqoImplicitJoinAsUnion | This option will prevent the driver from converting an IMPLICIT JOIN into a CROSS JOIN as expected by SQL92. Instead, it will leave it as an IMPLICIT JOIN, which Google BigQuery will execute as a UNION ALL. |
int
0
The maximum lifetime of a connection in seconds. Once the time has elapsed, the connection object is disposed. The default is 0 which indicates there is no limit to the connection lifetime.
bool
false
When set to true, a connection will be made to Google BigQuery when the connection is opened. This property enables the Test Connection feature available in various database tools.
This feature acts as a NOOP command as it is used to verify a connection can be made to Google BigQuery and nothing from this initial connection is maintained.
Setting this property to false may provide performance improvements (depending upon the number of times a connection is opened).
string
"Never"
This property outputs schemas to .rsd files in the path specified by Location.
Available settings are the following:
When you set GenerateSchemaFiles to OnUse, the driver generates schemas as you execute SELECT queries. Schemas are generated for each table referenced in the query.
When you set GenerateSchemaFiles to OnCreate, schemas are only generated when a CREATE TABLE query is executed.
Another way to use this property is to obtain schemas for every table in your database when you connect. To do so, set GenerateSchemaFiles to OnStart and connect.
string
""
Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default. If your query is too compute intensive for BigQuery to complete at the standard per TB pricing tier, BigQuery returns a billingTierLimitExceeded error and an estimate of how much the query would cost. To run the query at a higher pricing tier, pass a new value for maximumBillingTier as part of the query request. The maximumBillingTier is a positive integer that serves as a multiplier of the basic price per TB. For example, if you set maximumBillingTier to 2, the maximum cost for that query will be 2x basic price per TB.
string
""
When this value is provided, all jobs will use this value as their default billing cap. If a job uses more than this many bytes, BigQuery will cancel it and it will not be billed. By default there is no cap and all jobs will be billed for however many bytes they consume.
This only has an effect when using DestinationTable or when using the InsertJob stored procedure. BigQuery does not allow standard query jobs to have byte limits.
int
-1
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
string
""
The properties listed below are available for specific use cases. Normal driver use cases and functionality should not require these properties.
Specify multiple properties in a semicolon-separated list.
CachePartial=True | Caches only a subset of columns, which you can specify in your query. |
QueryPassthrough=True | Passes the specified query to the cache database instead of using the SQL parser of the driver. |
DefaultColumnSize | Sets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000. |
ConvertDateTimeToGMT | Determines whether to convert date-time values to GMT, instead of the local time of the machine. |
RecordToFile=filename | Records the underlying socket data transfer to the specified file. |
int
60
The allowed idle time a connection can remain in the pool until the connection is closed. The default is 60 seconds.
int
100
The maximum connections in the pool. The default is 100. To disable this property, set the property value to 0 or less.
int
1
The minimum number of connections in the pool. The default is 1.
int
60
The max seconds to wait for a connection to become available. If a new connection request is waiting for an available connection and exceeds this time, an error is thrown. By default, new requests wait forever for an available connection.
string
""
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
bool
false
When this is set, queries are passed through directly to Google BigQuery.
bool
false
If this property is set to true, the driver will allow only SELECT queries. INSERT, UPDATE, DELETE, and stored procedure queries will cause an error to be thrown.
string
""
The RTK property may be used to license a build. See the included licensing file to see how to set this property. The runtime key is only available if you purchased an OEM license.
string
""
This option can be set to make the driver use the TABLESAMPLE for each
table referenced by a query. The value determines what percent is provided to the
PERCENT clause. That clause will only be generated if this property's value is above
zero.
-- Input SQL SELECT * FROM `tbl` -- Generated Google BigQuery SQL when TableSamplePercent=10 SELECT * FROM `tbl` TABLESAMPLE SYSTEM (10 PERCENT)
This option is subject to a few limitations:
string
"300"
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the driver throws an exception.
bool
false
This property enables connection pooling. The default is false. See Connection Pooling for information on using connection pools.
Lists all the accessible datasets for a given project.
Name | Type | Description |
Id [KEY] | String | The fully qualified, unique, opaque Id of the dataset. |
Kind | String | The resource type. |
FriendlyName | String | A descriptive name for the dataset |
DatasetReference_ProjectId | String | A unique reference to the container project. |
DatasetReference_DatasetId | String | A unique reference to the dataset, without the project name. |
Lists the partitioning definitions for tables
Name | Type | Description |
Id [KEY] | String | A unique identifier for the partition. |
ProjectId | String | The project that the table belongs to. |
DatasetId | String | The dataset that the table belongs to. |
TableName | String | The name of the table. |
ColumnName | String | The name of the column used for partitioning. |
ColumnType | String | The type of the partitioning column. |
Kind | String | The type of partitioning used by the table. One of DATE, RANGE or INGESTION. |
RequireFilter | Boolean | Whether a filter on the partition column is required to query the table. |
Lists the partitioning ranges for tables
Name | Type | Description |
Id | String | A unique identifier for the partition. |
RangeLow | String | The lowest value of the partition column. Either an integer when Kind is RANGE, or a date otherwise. |
RangeHigh | String | The highest value of the partition column. Either an integer when Kind is RANGE, or a date otherwise. |
RangeInterval | String | The range of values which are included in each partition. Only valid when Kind is RANGE |
DateResolution | String | How much of the date is significant to a TIME or INGESTION partition column. One of DAY, HOUR, MONTH or YEAR. |
Lists all the projects for the authorized user.
Name | Type | Description |
Id [KEY] | String | The unique identifier of the Project |
Kind | String | The resource type. |
FriendlyName | String | A descriptive name for the project. |
NumericId | String | The numeric Id of the project. |
ProjectReference_ProjectId | String | A unique reference to the project. |