Matillion ETL Data Model for Splunk
|
string
"Auto"
string
""
The URL to your Splunk endpoint; for example, https://yoursitename.splunk.com:8089.
The port should be set to the Splunk management port (default 8089).
string
""
Together with Password, this field is used to authenticate against the Splunk server.
string
""
The User and Password are together used to authenticate with the server.
string
""
If using a TLS/SSL connection, this property can be used to specify the TLS/SSL certificate to be accepted from the server. Any other certificate that is not trusted by the machine is rejected.
This property can take the following forms:
Description | Example |
A full PEM Certificate (example shortened for brevity) | -----BEGIN CERTIFICATE----- MIIChTCCAe4CAQAwDQYJKoZIhv......Qw== -----END CERTIFICATE----- |
A path to a local file containing the certificate | C:\cert.cer |
The public key (example shortened for brevity) | -----BEGIN RSA PUBLIC KEY----- MIGfMA0GCSq......AQAB -----END RSA PUBLIC KEY----- |
The MD5 Thumbprint (hex values can also be either space or colon separated) | ecadbdda5a1529c58a1e9e09828d70e4 |
The SHA1 Thumbprint (hex values can also be either space or colon separated) | 34a929226ae0819f2ec14b4a3d904f801cbb150d |
If not specified, any certificate trusted by the machine is accepted.
Certificates are validated as trusted by the machine based on the System's trust store. The trust store used is the 'javax.net.ssl.trustStore' value specified for the system. If no value is specified for this property, Java's default trust store is used (for example, JAVA_HOME\lib\security\cacerts).
Use '*' to signify to accept all certificates. Note that this is not recommended due to security concerns.
string
"NONE"
This property specifies the protocol that the driver will use to tunnel traffic through the FirewallServer proxy. Note that by default, the driver connects to the system proxy; to disable this behavior and connect to one of the following proxy types, set ProxyAutoDetect to false.
Type | Default Port | Description |
TUNNEL | 80 | When this is set, the driver opens a connection to Splunk and traffic flows back and forth through the proxy. |
SOCKS4 | 1080 | When this is set, the driver sends data through the SOCKS 4 proxy specified by FirewallServer and FirewallPort and passes the FirewallUser value to the proxy, which determines if the connection request should be granted. |
SOCKS5 | 1080 | When this is set, the driver sends data through the SOCKS 5 proxy specified by FirewallServer and FirewallPort. If your proxy requires authentication, set FirewallUser and FirewallPassword to credentials the proxy recognizes. |
To connect to HTTP proxies, use ProxyServer and ProxyPort. To authenticate to HTTP proxies, use ProxyAuthScheme, ProxyUser, and ProxyPassword.
string
""
This property specifies the IP address, DNS name, or host name of a proxy allowing traversal of a firewall. The protocol is specified by FirewallType: Use FirewallServer with this property to connect through SOCKS or do tunneling. Use ProxyServer to connect to an HTTP proxy.
Note that the driver uses the system proxy by default. To use a different proxy, set ProxyAutoDetect to false.
int
0
This specifies the TCP port for a proxy allowing traversal of a firewall. Use FirewallServer to specify the name or IP address. Specify the protocol with FirewallType.
string
""
The FirewallUser and FirewallPassword properties are used to authenticate against the proxy specified in FirewallServer and FirewallPort, following the authentication method specified in FirewallType.
string
""
This property is passed to the proxy specified by FirewallServer and FirewallPort, following the authentication method specified by FirewallType.
bool
false
This takes precedence over other proxy settings, so you'll need to set ProxyAutoDetect to FALSE in order use custom proxy settings.
NOTE: When this property is set to True, the proxy used is determined as follows:
To connect to an HTTP proxy, see ProxyServer. For other proxies, such as SOCKS or tunneling, see FirewallType.
string
""
The hostname or IP address of a proxy to route HTTP traffic through. The driver can use the HTTP, Windows (NTLM), or Kerberos authentication types to authenticate to an HTTP proxy.
If you need to connect through a SOCKS proxy or tunnel the connection, see FirewallType.
By default, the driver uses the system proxy. If you need to use another proxy, set ProxyAutoDetect to false.
int
80
The port the HTTP proxy is running on that you want to redirect HTTP traffic through. Specify the HTTP proxy in ProxyServer. For other proxy types, see FirewallType.
string
"BASIC"
This value specifies the authentication type to use to authenticate to the HTTP proxy specified by ProxyServer and ProxyPort.
Note that the driver will use the system proxy settings by default, without further configuration needed; if you want to connect to another proxy, you will need to set ProxyAutoDetect to false, in addition to ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
The authentication type can be one of the following:
If you need to use another authentication type, such as SOCKS 5 authentication, see FirewallType.
string
""
The ProxyUser and ProxyPassword options are used to connect and authenticate against the HTTP proxy specified in ProxyServer.
You can select one of the available authentication types in ProxyAuthScheme. If you are using HTTP authentication, set this to the user name of a user recognized by the HTTP proxy. If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:
user@domain domain\user
string
""
This property is used to authenticate to an HTTP proxy server that supports NTLM (Windows), Kerberos, or HTTP authentication. To specify the HTTP proxy, you can set ProxyServer and ProxyPort. To specify the authentication type, set ProxyAuthScheme.
If you are using HTTP authentication, additionally set ProxyUser and ProxyPassword to HTTP proxy.
If you are using NTLM authentication, set ProxyUser and ProxyPassword to your Windows password. You may also need these to complete Kerberos authentication.
For SOCKS 5 authentication or tunneling, see FirewallType.
By default, the driver uses the system proxy. If you want to connect to another proxy, set ProxyAutoDetect to false.
string
"AUTO"
This property determines when to use SSL for the connection to an HTTP proxy specified by ProxyServer. This value can be AUTO, ALWAYS, NEVER, or TUNNEL. The applicable values are the following:
AUTO | Default setting. If the URL is an HTTPS URL, the driver will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option. |
ALWAYS | The connection is always SSL enabled. |
NEVER | The connection is not SSL enabled. |
TUNNEL | The connection is through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows back and forth through the proxy. |
string
""
The ProxyServer is used for all addresses, except for addresses defined in this property. Use semicolons to separate entries.
Note that the driver uses the system proxy settings by default, without further configuration needed; if you want to explicitly configure proxy exceptions for this connection, you need to set ProxyAutoDetect = false, and configure ProxyServer and ProxyPort. To authenticate, set ProxyAuthScheme and set ProxyUser and ProxyPassword, if needed.
string
""
Once this property is set, the driver will populate the log file as it carries out various tasks, such as when authentication is performed or queries are executed. If the specified file doesn't already exist, it will be created.
Connection strings and version information are also logged, though connection properties containing sensitive information are masked automatically.
If a relative filepath is supplied, the location of the log file will be resolved based on the path found in the Location connection property.
For more control over what is written to the log file, you can adjust the Verbosity property.
Log contents are categorized into several modules. You can show/hide individual modules using the LogModules property.
To edit the maximum size of a single logfile before a new one is created, see MaxLogFileSize.
If you would like to place a cap on the number of logfiles generated, use MaxLogFileCount.
Java logging is also supported. To enable Java logging, set Logfile to:
Logfile=JAVALOG://myloggername
As in the above sample, JAVALOG:// is a required prefix to use Java logging, and you will substitute your own Logger.
The supplied Logger's getLogger method is then called, using the supplied value to create the Logger instance. If a logging instance already exists, it will reference the existing instance.
When Java logging is enabled, the Verbosity will now correspond to specific logging levels.
string
"1"
The verbosity level determines the amount of detail that the driver reports to the Logfile. Verbosity levels from 1 to 5 are supported. These are detailed in the Logging page.
string
""
Only the modules specified (separated by ';') will be included in the log file. By default all modules are included.
See the Logging page for an overview.
string
"100MB"
When the limit is hit, a new log is created in the same folder with the date and time appended to the end. The default limit is 100 MB. Values lower than 100 kB will use 100 kB as the value instead.
Adjust the maximum number of logfiles generated with MaxLogFileCount.
int
-1
When the limit is hit, a new log is created in the same folder with the date and time appended to the end and the oldest log file will be deleted.
The minimum supported value is 2. A value of 0 or a negative value indicates no limit on the count.
Adjust the maximum size of the logfiles generated with MaxLogFileSize.
string
"%APPDATA%\\CData\\Splunk Data Provider\\Schema"
The path to a directory which contains the schema files for the driver (.rsd files for tables and views, .rsb files for stored procedures). The folder location can be a relative path from the location of the executable. The Location property is only needed if you want to customize definitions (for example, change a column name, ignore a column, and so on) or extend the data model with new tables, views, or stored procedures.
If left unspecified, the default location is "%APPDATA%\\CData\\Splunk Data Provider\\Schema" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Mac | ~/Library/Application Support |
Linux | ~/.config |
string
""
Listing the schemas from databases can be expensive. Providing a list of schemas in the connection string improves the performance.
string
""
Listing the tables from some databases can be expensive. Providing a list of tables in the connection string improves the performance of the driver.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the tables you want in a comma-separated list. Each table should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
string
""
Listing the views from some databases can be expensive. Providing a list of views in the connection string improves the performance of the driver.
This property can also be used as an alternative to automatically listing views if you already know which ones you want to work with and there would otherwise be too many to work with.
Specify the views you want in a comma-separated list. Each view should be a valid SQL identifier with any special characters escaped using square brackets, double-quotes or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.
Note that when connecting to a data source with multiple schemas or catalogs, you will need to provide the fully qualified name of the table in this property, as in the last example here, to avoid ambiguity between tables that exist in multiple catalogs or schemas.
bool
false
When AutoCache = true, the driver automatically maintains a cache of your table's data in the database of your choice.
When AutoCache = true, the driver caches to a simple, file-based cache. You can configure its location or cache to a different database with the following properties:
string
""
You can cache to any database for which you have a JDBC driver, including CData JDBC drivers.
The cache database is determined based on the CacheDriver and CacheConnection properties. The CacheDriver is the name of the JDBC driver class that you want to use to cache data.
Note that you must also add the CacheDriver JAR file to the classpath.
The following examples show how to cache to several major databases. Refer to CacheConnection for more information on the JDBC URL syntax and typical connection properties.
The driver simplifies Derby configuration. Java DB is the Oracle distribution of Derby. The JAR file is shipped in the JDK. You can find the JAR file, derby.jar, in the db subfolder of the JDK installation. In most caching scenarios, you need to specify only the following, after adding derby.jar to the classpath:
jdbc:splunk:CacheLocation='c:/Temp/cachedir';user=MyUserName;password=MyPassword;URL=MyURL;To customize the Derby JDBC URL, use CacheDriver and CacheConnection. For example, to cache to an in-memory database, use a JDBC URL like the following:
jdbc:splunk:CacheDriver=org.apache.derby.jdbc.EmbeddedDriver;CacheConnection='jdbc:derby:memory';user=MyUserName;password=MyPassword;URL=MyURL;
The following is a JDBC URL for the SQLite JDBC driver:
jdbc:splunk:CacheDriver=org.sqlite.JDBC;CacheConnection='jdbc:sqlite:C:/Temp/sqlite.db';user=MyUserName;password=MyPassword;URL=MyURL;
The following is a JDBC URL for the included CData JDBC Driver for MySQL:
jdbc:splunk:Cache Driver=cdata.jdbc.mysql.MySQLDriver;Cache Connection='jdbc:mysql:Server=localhost;Port=3306;Database=cache;User=root;Password=123456';user=MyUserName;password=MyPassword;URL=MyURL;
The following JDBC URL uses the Microsoft JDBC Driver for SQL Server:
jdbc:splunk:Cache Driver=com.microsoft.sqlserver.jdbc.SQLServerDriver;Cache Connection='jdbc:sqlserver://localhost\sqlexpress:7437;user=sa;password=123456;databaseName=Cache';user=MyUserName;password=MyPassword;URL=MyURL;
The following is a JDBC URL for the Oracle Thin Client:
jdbc:splunk:Cache Driver=oracle.jdbc.OracleDriver;CacheConnection='jdbc:oracle:thin:scott/tiger@localhost:1521:orcldb';user=MyUserName;password=MyPassword;URL=MyURL;
NOTE: If using a version of Oracle older than 9i, the cache driver will instead be oracle.jdbc.driver.OracleDriver .
The following JDBC URL uses the official PostgreSQL JDBC driver:
jdbc:splunk:CacheDriver=cdata.jdbc.postgresql.PostgreSQLDriver;CacheConnection='jdbc:postgresql:User=postgres;Password=admin;Database=postgres;Server=localhost;Port=5432;';user=MyUserName;password=MyPassword;URL=MyURL;
string
""
The cache database is determined based on the CacheDriver and CacheConnection properties. Both properties are required to use the cache database. Examples of common cache database settings can be found below. For more information on setting the caching database's driver, refer to CacheDriver.
The connection string specified in the CacheConnection property is passed directly to the underlying CacheDriver. Consult the documentation for the specific JDBC driver for more information on the available properties. Make sure to include the JDBC driver in your application's classpath.
The driver simplifies caching to Derby, only requiring you to set the CacheLocation property to make a basic connection.
Alternatively, you can configure the connection to Derby manually using CacheDriver and CacheConnection. The following is the Derby JDBC URL syntax:
jdbc:derby:[subsubprotocol:][databaseName][;attribute=value[;attribute=value] ... ]
For example, to cache to an in-memory database, use the following:
jdbc:derby:memory
To cache to SQLite, you can use the SQLite JDBC driver. The following is the syntax of the JDBC URL:
jdbc:sqlite:dataSource
The installation includes the CData JDBC Driver for MySQL. The following is an example JDBC URL:
jdbc:mysql:User=root;Password=root;Server=localhost;Port=3306;Database=cache
The following are typical connection properties:
The JDBC URL for the Microsoft JDBC Driver for SQL Server has the following syntax:
jdbc:sqlserver://[serverName[\instance][:port]][;database=databaseName][;property=value[;property=value] ... ]
For example:
jdbc:sqlserver://localhost\sqlexpress:1433;integratedSecurity=true
The following are typical SQL Server connection properties:
To use integrated security, you will also need to add sqljdbc_auth.dll to a folder on the Windows system path. This file is located in the auth subfolder of the Microsoft JDBC Driver for SQL Server installation. The bitness of the assembly must match the bitness of your JVM.
The following is the conventional JDBC URL syntax for the Oracle JDBC Thin driver:
jdbc:oracle:thin:[userId/password]@[//]host[[:port][:sid]]
For example:
jdbc:oracle:thin:scott/tiger@myhost:1521:orcl
The following are typical connection properties:
Data Source: The connect descriptor that identifies the Oracle database. This can be a TNS connect descriptor, an Oracle Net Services name that resolves to a connect descriptor, or, after version 11g, an Easy Connect naming (the host name of the Oracle server with an optional port and service name).
The following is the JDBC URL syntax for the official PostgreSQL JDBC driver:
jdbc:postgresql:[//[host[:port]]/]database[[?option=value][[&option=value][&option=value] ... ]]
For example, the following connection string connects to a database on the default host (localhost) and port (5432):
jdbc:postgresql:postgres
The following are typical connection properties:
string
"%APPDATA%\\CData\\Splunk Data Provider"
The CacheLocation is a simple, file-based cache. The driver uses Java DB, Oracle's distribution of the Derby database. To cache to Java DB, you will need to add the Java DB JAR file to the classpath. The JAR file, derby.jar, is shipped in the JDK and located in the db subfolder of the JDK installation.
If left unspecified, the default location is "%APPDATA%\\CData\\Splunk Data Provider" with %APPDATA% being set to the user's configuration directory:
Platform | %APPDATA% |
Windows | The value of the APPDATA environment variable |
Mac | ~/Library/Application Support |
Linux | ~/.config |
int
600
The tolerance for stale data in the cache specified in seconds. This only applies when AutoCache is used. The driver checks with the data source for newer records after the tolerance interval has expired. Otherwise, it returns the data directly from the cache.
bool
false
When Offline = true, all queries execute against the cache as opposed to the live data source. In this mode, certain queries like INSERT, UPDATE, DELETE, and CACHE are not allowed.
bool
false
As you execute queries with this property set, table metadata in the Splunk catalog are cached to the file store specified by CacheLocation if set or the user's home directory otherwise. A table's metadata will be retrieved only once, when the table is queried for the first time.
The driver automatically persists metadata in memory for up to two hours when you first discover the metadata for a table or view and therefore, CacheMetadata is generally not required. CacheMetadata becomes useful when metadata operations are expensive such as when you are working with large amounts of metadata or when you have many short-lived connections.
int
0
When BatchSize is set to a value greater than 0, the batch operation will split the entire batch into separate batches of size BatchSize. The split batches will then be submitted to the server individually. This is useful when the server has limitations on the size of the request that can be submitted.
Setting BatchSize to 0 will submit the entire batch as specified.
int
0
The maximum lifetime of a connection in seconds. Once the time has elapsed, the connection object is disposed. The default is 0 which indicates there is no limit to the connection lifetime.
bool
false
When set to true, a connection will be made to Splunk when the connection is opened. This property enables the Test Connection feature available in various database tools.
This feature acts as a NOOP command as it is used to verify a connection can be made to Splunk and nothing from this initial connection is maintained.
Setting this property to false may provide performance improvements (depending upon the number of times a connection is opened).
bool
false
Whether or not the CData JDBC Driver for Splunk should push the internal fields. These fields include: user, eventtype, etc.
int
-1
Limits the number of rows returned rows when no aggregation or group by is used in the query. This helps avoid performance issues at design time.
string
"5"
This property allows you to issue multiple requests simultaneously, thereby improving performance. Default value is 5 threads. Setting a higher value can result in OutOfMemory issues.
string
""
The properties listed below are available for specific use cases. Normal driver use cases and functionality should not require these properties.
Specify multiple properties in a semicolon-separated list.
CachePartial=True | Caches only a subset of columns, which you can specify in your query. |
QueryPassthrough=True | Passes the specified query to the cache database instead of using the SQL parser of the driver. |
DefaultColumnSize | Sets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000. |
ConvertDateTimeToGMT | Determines whether to convert date-time values to GMT, instead of the local time of the machine. |
RecordToFile=filename | Records the underlying socket data transfer to the specified file. |
int
10000
The Pagesize property affects the maximum number of results to return per page from Splunk. Setting a higher value may result in better performance at the cost of additional memory allocated per page consumed.
int
60
The allowed idle time a connection can remain in the pool until the connection is closed. The default is 60 seconds.
int
100
The maximum connections in the pool. The default is 100. To disable this property, set the property value to 0 or less.
int
1
The minimum number of connections in the pool. The default is 1.
int
60
The max seconds to wait for a connection to become available. If a new connection request is waiting for an available connection and exceeds this time, an error is thrown. By default, new requests wait forever for an available connection.
string
""
This setting is particularly helpful in Entity Framework, which does not allow you to set a value for a pseudo column unless it is a table column. The value of this connection setting is of the format "Table1=Column1, Table1=Column2, Table2=Column3". You can use the "*" character to include all tables and all columns; for example, "*=*".
bool
false
If this property is set to true, the driver will allow only SELECT queries. INSERT, UPDATE, DELETE, and stored procedure queries will cause an error to be thrown.
string
"50"
Determines the number of rows used to determine the column data types.
Setting a high value may decrease performance. Setting a low value may prevent the data type from being determined properly, especially when there is null data.
string
""
The RTK property may be used to license a build. See the included licensing file to see how to set this property. The runtime key is only available if you purchased an OEM license.
bool
true
When SupportEnhancedSQL = true, the driver offloads as much of the SELECT statement processing as possible to Splunk and then processes the rest of the query in memory. In this way, the driver can execute unsupported predicates, joins, and aggregation.
When SupportEnhancedSQL = false, the driver limits SQL execution to what is supported by the Splunk API.
The driver determines which of the clauses are supported by the data source and then pushes them to the source to get the smallest superset of rows that would satisfy the query. It then filters the rest of the rows locally. The filter operation is streamed, which enables the driver to filter effectively for even very large datasets.
The driver uses various techniques to join in memory. The driver trades off memory utilization against the requirement of reading the same table more than once.
The driver retrieves all rows necessary to process the aggregation in memory.
int
60
If Timeout = 0, operations do not time out. The operations run until they complete successfully or until they encounter an error condition.
If Timeout expires and the operation is not yet complete, the driver throws an exception.
string
"RowScan"
None | Setting TypeDetectionScheme to None will return all columns as the string type. |
RowScan | Setting TypeDetectionScheme to RowScan will scan rows to heuristically determine the data type. The RowScanDepth determines the number of rows to be scanned. |
bool
false
This property enables connection pooling. The default is false. See Connection Pooling for information on using connection pools.
bool
false
Whether to use the jobs endpoint instead of the export endpoint. While Jobs generally provide higher performance, the initial response time may be longer. If a Timeout error occurs, set the Timeout connection property to a higher value.
Create, query, update, and delete data models in Splunk.
The driver will use the Splunk API to process search criteria that refer to the Id column. This column supports server-side processing for the = operator. The driver processes other filters client-side within the driver.
For example, the following query is processed server side by the Splunk APIs:
SELECT * FROM DataModels WHERE Id = 'SampleModel'You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
The Id column is the minimum requirement for an insert. In an insert, the DataModels table allows only the Id and Acceleration columns.
INSERT INTO DataModels (Id, Acceleration) VALUES ('initialname', '{"enabled":false,"earliest_time":"","hunk.file_format":"","hunk.dfs_block_size":0,"hunk.compression_codec":""}' )
The DataModels table allows updates for the Acceleration column when Id is specified. You can also set the Provisional pseudocolumn.
UPDATE DataModels SET Provisional = 'true', Acceleration = '{"enabled":false,"earliest_time": "-1mon", "cron_schedule": "0 */12 * * *","hunk.file_format":"","hunk.dfs_block_size":0,"hunk.compression_codec":""}' WHERE Id = 'initialname'
The DataModels table allows deleting a record when Id is specified.
DELETE FROM Datamodels WHERE Id = 'initialname'
Name | Type | ReadOnly | References | Description |
Id [KEY] | String | False |
Id of the data model. | |
LinkId | String | True |
Link of the data model. | |
Disabled | Boolean | True |
Indicates if the data model is disabled/enabled. | |
UpdatedAt | Datetime | True |
Datetime of the last update of the data model. | |
Description | String | True |
Description of the data model. | |
Name | String | True |
The name displayed for the data model in Splunk. | |
Author | String | True |
Splunk user who created the data model. | |
App | String | True |
Splunk app where the data model is shared. | |
Owner | String | True |
Splunk user who owns the data model. | |
CanShareApp | Boolean | True |
Boolean indicating whether the data model can be shared in an app. | |
CanShareGlobal | Boolean | True |
Boolean indicating whether the data model can be shared globally. | |
CanShareUser | Boolean | True |
Boolean indicating whether the data model can be shared by the user. | |
CanWrite | Boolean | True |
Boolean indicating whether the data model can be extended by the user. | |
Modifiable | Boolean | True |
Boolean indicating whether the data model can be modified. | |
Removable | Boolean | True |
Boolean indicating whether the data model can be removed. | |
Acceleration | String | False |
Acceleration settings for the data model. Supply JSON to specify any or all of the following settings: enabled (true or false), earliest_time (time modifier), or cron_schedule (cron string). | |
AccelerationAllowed | Boolean | True |
Boolean indicating that acceleration is allowed or not for the data model. | |
AccelerationHunkCompression | String | True |
Specifies the compression codec to be used for the accelerated orc or parquet format files. | |
DatasetCommands | String | True |
Data model commands. | |
DatasetDescription | String | True |
The JSON describing the data model. | |
DatasetCurrentCommand | Integer | True |
Current command of the data model. | |
DatasetEarliestTime | Datetime | True |
Earliest time of data model events being processed. | |
DatasetLatestTime | Datetime | True |
Latest time of data model events being processed. | |
DatasetDiversity | String | True |
Diversity of events being processed. | |
DatasetLimiting | Integer | True |
Limitations of events being processed. | |
DatasetMode | String | True |
Search mode events being processed. | |
DatasetSampleRatio | String | True |
Sample ratio of the data model. | |
DatasetFields | String | True |
Indexed fields the data model has. | |
DatasetType | String | True |
Dataset type. | |
Type | String | True |
Data model type. | |
Digest | String | True |
Content digest type. | |
TagsWhitelist | String | True |
Whitelist of data model tags. | |
ReadPermitions | String | True |
Permissions to read this data model. | |
WritePermitions | String | True |
Permissions to write to this data model. | |
Sharing | String | True |
Data model sharing type. | |
Username | String | True |
Username of the Splunk user. |
Pseudo column fields are used in the WHERE clause of SELECT statements and offer a more granular control over the tuples that are returned from the data source.
Name | Type | Description |
Provisional | Boolean |
Indicates whether the data model is provisional. Provisional data models are not saved. Specify true to validate a data model before saving it. |
Create, query, update, and delete datasets in Splunk.
The Datasets table requires DataModelId in the WHERE clause. The DataModelId column supports server-side processing for the = operator. The driver processes other search criteria client-side within the driver.
SELECT * FROM DataSets WHERE DataModelId = 'SampleModel'You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
Splunk allows inserts only when DataModelId, ParentName, and ObjectName are all specified.
INSERT INTO [Datasets] (ObjectName, ParentName, DataModelId) VALUES ('SampleSet', 'BaseEvent','SampleModel')
The Datasets table allows updates when DataModelId is specified. The columns that can be updated in this case are the following: Description and DisplayName.
When ObjectName is also specified, you can update the following columns: ObjectDisplayName, ParentName, Comment, Fields, Calculations, Constraints, Lineage, ObjectSearchNoFields, ObjectSearch, AutoextractSearch, PreviewSearch, AccelerationSearch, BaseSearch, and TsidxNamespace.
UPDATE Datasets SET Description = 'model description' , DisplayName = 'Model Display Name' WHERE DataModelId = 'SampleModel' UPDATE Datasets SET ParentName = 'BaseEvent', BaseSearch = '| search (index=* OR index=_*) | fields _time, RootObject', AccelerationSearch = ' search (index=* OR index=_*) ' WHERE DataModelId = 'SampleModel' AND ObjectName = 'SampleSet'
Datasets can be deleted by providing the DataModelId and the ObjectName of the dataset.
DELETE FROM Datasets WHERE DataModelId = 'SampleModel' AND ObjectName = 'SampleSet'
Name | Type | ReadOnly | References | Description |
ObjectName [KEY] | String | False |
Name of the dataset object. | |
DatamodelId [KEY] | String | False |
DataModels.Id |
Id of the data model the object belongs to. |
DisplayName | String | False |
Name of the data model the object belongs to. | |
Description | String | False |
Dataset description. | |
ObjectNameList | String | True |
List of the objects in the data model. | |
ObjectDisplayName | String | False |
Name displayed in Splunk for the object. | |
ParentName | String | False |
Name of the Parent Event. | |
Comment | String | False |
Dataset comments. | |
Fields | String | False |
Dataset events indexed fields. | |
Calculations | String | False |
Saved calculations for dataset fields. | |
Constraints | String | False |
Saved constraints for dataset fields. | |
Lineage | String | False |
Dataset lineage. | |
ObjectSearchNoFields | String | False |
Object search query without fields. | |
ObjectSearch | String | False |
Saved search query for the object. | |
AutoextractSearch | String | False |
Search query for autoextraction. | |
PreviewSearch | String | False |
Search preview query. | |
AccelerationSearch | String | False |
Search query including acceleration. | |
BaseSearch | String | False |
Basic search query. | |
TsidxNamespace | String | False |
Allocated namespace. | |
EventBased | Integer | True |
Number of Event-Based objects in the data model. | |
TransactionBased | Integer | True |
Number of Transaction-Based objects in the data model. | |
SearchBased | Integer | True |
Number of Search-Based objects in the data model. |
Create, query, update, and delete search jobs in Splunk.
The driver will use the Splunk APIs to process the search Id (Sid) criteria specified in the WHERE clause. The Sid column supports server-side processing for the = operator. The driver processes other search criteria client-side within the driver.
SELECT * FROM SearchJobs SELECT * FROM SearchJobs WHERE Sid = '123456789.1234'You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
Splunk allows inserts only when EventSearch is specified. You can insert the Custom, EarliestTime, LatestTime, Label, and StatusBuckets columns and all pseudocolumns.
INSERT Into SearchJobs (Custom, EventSearch, LatestTime, Timeout) VALUES ('custom1=test1, custom2=test2', ' from datamodel SampleModel', 'now', '60')
The SearchJobs table allows updates of the Custom column only when Sid is specified.
UPDATE SearchJobs SET Custom = 'custom1=test3, custom2=test4' WHERE sid = '123456789.1234'
SearchJobs can be deleted by providing the Sid.
DELETE FROM SearchJobs WHERE Sid = '123456789.1234'
Name | Type | ReadOnly | References | Description |
Sid [KEY] | String | False |
The search Id number. | |
EventSearch | String | False |
Subset of the entire search that is before any transforming commands. | |
Custom | String | False |
Custom job property. In an INSERT operation, pass the values as a comma-separated list of pairs of keys and values. | |
EarliestTime | String | False |
The earliest time a search job is configured to start. | |
LatestTime | String | False |
The latest time a search job is configured to start. | |
CursorTime | String | True |
The earliest time from which no events are later scanned. Can be used to indicate progress. | |
Delegate | String | True |
For saved searches, specifies jobs that were started by the user. Defaults to scheduler. | |
DiskUsage | Long | True |
The total amount of disk space used, in bytes. | |
DispatchState | String | True |
The state of the search. Can be any of QUEUED, PARSING, RUNNING, PAUSED, FINALIZING, FAILED, or DONE. | |
DoneProgress | Double | True |
A number between 0 and 1.0 that indicates the approximate progress of the search. doneProgress = (latestTime-cursorTime) / (latestTime-earliestTime) | |
DropCount | Integer | True |
For real-time searches only, the number of possible events that were dropped due to the rt_queue_size (defaults to 100000). | |
EventAvailableCount | Integer | True |
The number of events that are available for export. | |
EventCount | Integer | True |
The number of events returned by the search. | |
EventFieldCount | Integer | True |
The number of fields found in the search results. | |
EventIsStreaming | Boolean | True |
Indicates if the events of this search are being streamed. | |
EventIsTruncated | Boolean | True |
Indicates if the events of the search are not stored, making them unavailable from the events endpoint for the search. | |
EventPreviewableCount | Integer | True |
Number of in-memory events that are not yet committed to disk. | |
EventSorting | String | True |
Indicates if the events of this search are sorted, and in which order. | |
IsDone | Boolean | True |
Indicates if the search has completed. | |
IsEventsPreviewEnabled | String | True |
Indicates if the timeline_events_preview setting is enabled in limits.conf. | |
IsFailed | Boolean | True |
Indicates if there was a fatal error executing the search. For example, invalid search string syntax. | |
IsFinalized | Boolean | True |
Indicates if the search was finalized (stopped before completion). | |
IsPaused | Boolean | True |
Indicates if the search is paused. | |
IsPreviewEnabled | Boolean | True |
Indicates if previews are enabled. | |
IsRealTimeSearch | Boolean | True |
Indicates if the search is a real-time search. | |
IsRemoteTimeline | Boolean | True |
Indicates if the remote timeline feature is enabled. | |
IsSaved | Boolean | True |
Indicates that the search job is saved on disk. Search artifacts are saved on disk for 7 days from the last time that the job was viewed or touched. | |
IsSavedSearch | Boolean | True |
Indicates if this is a saved search run using the scheduler. | |
IsZombie | Boolean | True |
Indicates if the process running the search died without finishing the search. | |
Keywords | String | True |
All positive keywords used by this search. A positive keyword is a keyword that is not in a NOT clause. | |
Label | String | False |
Custom name created for this search. | |
Messages | String | True |
Errors and debug messages. | |
NumPreviews | Integer | True |
Number of previews generated so far for this search job. | |
Performance | String | True |
A representation of the execution costs. | |
Priority | Integer | True |
An integer between 0-10 that indicates the search priority. | |
RemoteSearch | String | True |
The search string that is sent to every search peer. | |
ReportSearch | String | True |
If reporting commands are used, the reporting search. | |
ResultCount | Integer | True |
The total number of results returned by the search. In other words, this is the subset of scanned events (represented by the ScanCount) that actually matches the search terms. | |
ResultIsStreaming | Boolean | True |
Indicates if the final results of the search are available using streaming (for example, no transforming operations). | |
ResultPreviewCount | Integer | True |
The number of result rows in the latest preview results. | |
RunDuration | Decimal | True |
Time in seconds that the search took to complete. | |
ScanCount | Integer | True |
The number of events that are scanned or read off disk. | |
SearchEarliestTime | Datetime | True |
Specifies the earliest time for a search, as specified in the search command rather than the EarliestTime parameter. It does not snap to the indexed data time bounds for all-time searches. | |
SearchLatestTime | Datetime | True |
Specifies the latest time for a search, as specified in the search command rather than the LatestTime parameter. It does not snap to the indexed data time bounds for all-time searches. | |
SearchProviders | String | True |
A list of all the search peers that were contacted. | |
StatusBuckets | Integer | False |
Maximum number of timeline buckets. | |
TTL | String | True |
The time to live, or the time before the search job expires after it completes. |
Pseudo column fields are used in the WHERE clause of SELECT statements and offer a more granular control over the tuples that are returned from the data source.
Name | Type | Description |
SearchMode | String |
Searching mode, realtime or normal. If set to realtime, the search runs over the live data. The allowed values are normal, realtime. |
EnableLookups | Boolean |
Indicates whether lookups should be applied to events. |
AutoPause | Integer |
If specified, the search job pauses after this many seconds of inactivity. (0 means never autopause.) |
AutoCancel | Integer |
If specified, the job automatically cancels after this many seconds of inactivity. (0 means never autocancel.) |
AdhocSearchLevel | Integer |
Specify a search mode. Use one of the following search modes: verbose, fast, or smart. The allowed values are verbose, fast, smart. |
ForceBundleReplication | Boolean |
Specifies whether this search should cause (and wait depending on the value of SyncBundleReplication) for bundle synchronization with all search peers. |
IndexEarliest | String |
Specify a time string. Sets the earliest inclusive time bounds for the search, based on the index time bounds. |
IndexLatest | String |
Specify a time string. Sets the latest exclusive time bounds for the search, based on the index time bounds. |
IndexedRealtime | Boolean |
Indicates whether or not to use the indexed-realtime mode for real-time searches. |
IndexedRealtimeOffset | Integer |
Sets disk sync delay for indexed real-time search (seconds). |
MaxCount | Integer |
The number of events that can be accessible in any given status bucket. |
MaxTime | Integer |
Comma-separated list of (possibly wildcarded) servers from which raw events should be pulled. |
Namespace | String |
The application namespace in which to restrict searches. |
Now | String |
Specify a time string to set the absolute time used for any relative time specifier in the search. Defaults to the current system time. You can specify a relative time modifier for this parameter. For example, specify +2d to specify the current time plus two days. |
ReduceFrequency | Integer |
Determines how frequently to run the MapReduce reduce phase on accumulated map values. |
ReloadMacros | Boolean |
Specifies whether to reload macro definitions from the configuration file. |
RemoteServerList | Integer |
The number of seconds to run this search before finalizing. Specify 0 to never finalize. |
ReplaySpeed | Integer |
Indicate a real-time search replay speed factor. For example, 1 indicates normal speed, 0.5 indicates half of normal speed, and 2 indicates twice as fast as normal. |
ReplayStartTime | String |
Relative wall-clock start time for the replay. |
ReplayEndTime | String |
Relative end time for the replay clock. The replay stops when the clock time reaches this time. |
ReuseMaxSecondsAgo | Integer |
Specifies the number of seconds ago to check when an identical search is started and return the search Id of the job instead of starting a new job. |
RequiredField | String |
Adds a required field to the search. |
RealTimeBlocking | Boolean |
For a real-time search, indicates if the indexer blocks if the queue for this search is full. |
RealTimeIndexFilter | Boolean |
For a real-time search, indicates if the indexer prefilters events. |
RealTimeMaxBlockSecs | Integer |
For a real-time search with RealTimeBlocking set to true, the maximum time to block. Specify 0 to indicate no limit. |
RealTimeQueueSize | Integer |
For a real-time search, the queue size (in events) that the indexer should use for this search. |
Timeout | Integer |
The number of seconds to keep this search after processing has stopped. |
SyncBundleReplication | String |
Specifies whether this search should wait for bundle replication to complete. |
A dataset object in the example InternalServer data model.
This is an example of a dataset view. These views are generated from dataset objects inside a data model. The driver will use the Splunk APIs to process the following query components; the driver processes other parts of the query client-side in memory.
All columns support server-side processing for the following operators and functions:
LIMIT, ORDER BY, GROUP BY, and HAVING are also processed server-side. An exception is the case when in the selected columns, there are fields that are not in the GROUP BY, and GROUP BY, criteria, and limiting are handled client-side.
In the case when an unsupported criteria or function is used, all processing will be completed client-side (except selecting specified fields). This is also the case when a SELECT statement has a column that is not in the GroupBy clause.
For example, the driver uses the Splunk APIs to process the following queries.
SELECT Component, Timeendpos as Timeend FROM [AlertsInInternalServer] WHERE Component = 'Saved' OR EventType != '' AND Priority IS NOT NULL AND Linecount NOT IN ('1', '2') ORDER BY Priority DESC LIMIT 5 SELECT AVG(Suppressed), Priority FROM [AlertsInInternalServer] GROUP BY Priority HAVING AVG(Suppressed) > 0You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
Name | Type | Description |
_time | Datetime | |
component | String | |
date_hour | Int | |
date_mday | Int | |
date_minute | Int | |
date_month | String | |
date_second | Int | |
date_wday | String | |
date_year | Int | |
date_zone | Int | |
digest_mode | Int | |
dispatch_time | Int | |
host | String | |
linecount | Int | |
log_level | String | |
priority | String | |
punct | String | |
savedsearch_id | String | |
scheduled_time | Int | |
search_type | String | |
server_alert_actions | String | |
server_app | String | |
server_message | String | |
server_result_count | Int | |
server_run_time | Double | |
server_savedsearch_name | String | |
server_sid | String | |
server_status | String | |
server_user | String | |
source | String | |
sourcetype | String | |
splunk_server | String | |
suppressed | Int | |
thread_id | String | |
timeendpos | Int | |
timestartpos | Int | |
window_time | Int |
An example lookup report representing a view based on a saved report in Splunk.
This is an example of a report view. These views are generated from saved reports in Splunk.
The driver will use the Splunk APIs to process the following query components; the driver processes other parts of the query client-side in memory. You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
Runs a saved search, or report, and returns the search results of a saved search. If the search contains replacement placeholder terms, such as $replace_me$, the search processor replaces the placeholders with the strings you specify.
For example:
Will generate the following search statement:
All replacement placeholder terms will be dynamic and saved as Pseudo-Columns.
All columns support server-side processing for the following operators and functions:
LIMIT, ORDER BY, GROUP BY, and HAVING are also processed server-side. An exception is the case when in the selected columns, there are fields that are not in the GROUP BY, and GROUP BY, criteria, and limiting are handled client-side.
In the case when an unsupported criteria or function is used, all processing will be completed client-side (except selecting specified fields). This is also the case when a SELECT statement has a column that is not in the GROUP BY clause.
For example, the driver processes the following queries server-side:
SELECT Country, Subregion as Sub FROM LookUpReport WHERE Iso2 != '123' OR continent = 'Europe' AND iso3 NOT IN ('example_1', 'example_2') ORDER BY Country DESC LIMIT 5 SELECT AVG(Iso2), Subregion FROM LookUpReport GROUP BY Subregion HAVING AVG(Iso2) > 0You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
Name | Type | Description |
continent | String | |
country | String | |
iso2 | String | |
iso3 | String | |
region_un | String | |
region_wb | String | |
subregion | String |
An example of a table object inside a data model.
This is an example of a view generated from a table object inside a data model. The driver will use the Splunk APIs to process the following query components; the driver processes other parts of the query client-side in memory.
All columns support server-side processing for the following operators and functions.
LIMIT, ORDER BY, GROUP BY, and HAVING are also processed server-side. An exception is the case when in the selected columns, there are fields that are not in the GROUP BY, and GROUP BY, criteria, and limiting are handled client-side.
In the case when an unsupported criteria or function is used, all processing will be completed client-side (except selecting specified fields). This is also the case when a SELECT statement has a column that is not in the GROUP BY clause.
For example, the following queries are processed server side:
SELECT Component, Timeendpos as Timeend FROM [UploadedModel] WHERE Component = 'Saved' OR DEST_CITY_MARKET_ID != '' AND DEST_AIRPORT_ID NOT IN ('1', '2') ORDER BY ORIGIN_AIRPORT_ID DESC LIMIT 5 SELECT AVG(DEST_AIRPORT_ID), ORIGIN_AIRPORT_ID FROM [UploadedModel] GROUP BY ORIGIN_AIRPORT_ID HAVING AVG(DEST_AIRPORT_ID) > 0You can turn off the client-side execution of the query by setting SupportEnhancedSQL to false in which case any search criteria that refers to other columns will cause an error or inconsistent data.
Name | Type | Description |
_time | Datetime | |
DEST_AIRPORT_ID | Int | |
DEST_AIRPORT_SEQ_ID | Int | |
DEST_CITY_MARKET_ID | Int | |
host | String | |
linecount | Int | |
ORIGIN_AIRPORT_ID | Int | |
ORIGIN_AIRPORT_SEQ_ID | Int | |
ORIGIN_CITY_MARKET_ID | Int | |
punct | String | |
source | String | |
sourcetype | String | |
splunk_server | String | |
timestamp | String |