Common function description
When using custom queries, the following types of functions are commonly used:
- Date and Time Functions
- String Functions
- Math Functions
Date and Time Functions
The time-related fields in custom queries are divided into the following three types:
1. The time field in the events table
The time field is of the millisecond-level Timestamp type, and all date and time functions can be used directly.
2. The date field in the events table
The date field is of the day-level Timestamp type. If you do not need the information about the hours, minutes, and seconds, using this field will be more efficient. The date field is also an indexed field, so you should use this field as much as possible to filter the date range. Please refer to the "Date Filter" section for details.
Note: Before version 1.10, the date field did not support custom functions and could be replaced by the time field.
3. Other custom Date/Datetime type attributes
This type of attribute is displayed as a millisecond-level Unix timestamp in custom queries. When using date and time functions, you need to convert it to a Timestamp type using the EPOCH_TO_TIMESTAMP function. Please refer to the "Data Type" section for details.
adddate(timestamp startdate, int days), adddate(timestamp startdate, bigint days)
Purpose: Adds a given number of days to a TIMESTAMP value.
Parameters:
- startdate: The starting timestamp of type Timestamp.
- days: The number of days to add. Positive values represent the number of days after the start date, and negative values represent the number of days before the start date.
Return Value: The timestamp after adding the number of days, of the type Timestamp.
datediff(timestamp enddate, timestamp startdate)
Purpose: Returns the number of days between two timestamps, for example:
Parameters:
- enddate: The end time
- startdate:The start time
Return Value: The number of days between the end time and the start time, in integer. If the date of the first parameter time is later than that of the second parameter time, a positive value is returned; conversely, if the date of the first parameter time is earlier than that of the second parameter time, a negative value is returned.
extract(unit FROM timestamp), extract(timestamp, string unit)
Purpose: To extract a numerical time domain from a TIMESTAMP value, such as year, month, day, hour, minute, and second/microsecond.
Parameters:
- The time unit unit string can take values such as: year, month, day, hour, minute, second, millisecond.
Return Value: The integer value of the time domain.
For example: Querying the number of payment orders so far by year and month.
SELECT extract(Year from time) AS Year, extract(Month from time) AS Month, COUNT(*) FROM events
WHERE event = 'payOrder'
GROUP BY Year, Month
ORDER BY Year, Month
trunc(timestamp, string unit)
Purpose: To extract a time domain from the given timestamp time stamp.
Parameters:
- unit: Time unit
- SYYYY, YYYY, YEAR, SYEAR, YYY, YY, Y:YEAR
- Q:QUARTERLY
- MONTH, MON, MM, RM: MONTH
- WW, W: The date of the corresponding week's first day.
- DDD, DD, J: DATE
- DAY, DY, D: The date of the corresponding week's first day.
- HH, HH12, HH24: HOUR
- MI: MINUTE
Return Value: The date after the time domain has been extracted.
For example: The number of events that occurred every day in the last 100 days and the number of days elapsed between the event occurrence time and the current date.
SELECT datediff(now(), trunc(time, 'DD')), COUNT(*) FROM events
WHERE date >= CURRENT_DATE() - INTERVAL '100' day
GROUP BY 1
String Functions
concat(string a, string b…)
Purpose: Concatenates all string-type parameters into one string type.
Parameter:
- string (unlimited number): The string to be joined
Return Value: The concatenated string
Example: Querying address of users who were born after 2000, where the address is a concatenation of province and area.
SELECT concat($province, $city) As Address
FROM users
WHERE yearofbirth > 2000
regexp_like(string source, string pattern[, string options])
Purpose: Determines whether the string source contains a content that matches the regular expression pattern
Parameters:
- source: The string to check
- pattern: The regular expression
- option(Optional): Options
- c: Case-sensitive matching
- i: Case-insensitive matching
- m: Multiple matches. The ^ and $ operators will match the start and end of each line instead of the start and end of the string as a whole.
- n:Newline matching: the dot (.) operator will match a newline. Repetition operators like . can match multiple lines in the source string. (Newlines can be skipped using. skip a few lines)
Return Value: Boolean type indicating whether it matches or not
Example: The number of users with QQ mailbox as the email
SELECT COUNT(*) FROM users
WHERE regexp_like(email, '@qq.com$')
parse_url(string urlString, string partToExtract [, string keyToExtract])
Purpose: Extract specified part of the URL to return the truncated value
Parameters:
- urlString: URL
- partToExtract: The part of the URL to truncate. Valid values are 'PROTOCOL', 'HOST', 'PATH', 'REF', 'AUTHORITY', 'FILE', ‘USERINFO', ‘QUERY'
- PROTOCOL: The protocol, such as HTTP, HTTPS, FTP, etc.
- HOST: The host name
- PATH: The path
- REF: The anchor, which is the string after the # in the URL
- AUTHORITY: Authorization
- FILE: File Name
- USERINFO: User Information
- QUERY: Query Parameters, i.e. the string after the URL
- keyToExtract (optional): When partToExtract is 'QUERY', specify the key of query key-value pairs to get the specified parameter value
Return Value: The extracted value of the specified part in the URL
For example: The access distribution of various paths in the page view event of the day
SELECT parse_url(url, 'PATH'), COUNT(*) FROM events
WHERE date = CURRENT_DATE() AND event = '$pageview'
GROUP BY 1
Mathematical Functions
Mathematical functions are used for some numerical operations. In particular, when doing exponentiation, please use the pow() function instead of the exponentiation operator '**'.
pow(double a, double p), power(double a, double p), dpow(double a, double p), fpow(double a, double p)
Usage: Taking exponentiation, for example:
Parameters:
- a: the base number
- b: the exponent
Return Value: a to the power of b
For example: Query the number of users whose total principal and interest after the maturity of financial products exceeds 100,000.
SELECT count(distinct(user_id)) FROM events
WHERE event = 'buyProduct' AND (capital + capital * pow(rateofinterest,duration)) > 100000
round(double a), round(double a, int d), round(decimal a, int_type d), dround(double a), dround(double a, int d)
Usage: Return the rounded value, for example:
Parameters:
- a: the value to be rounded
- d (optional): the number of decimal places to keep. If there is no such parameter, keep up to the integer part.
Return Value: The rounded value
For example: Query the number of users whose yield rate of financial products exceeds 0.45 percentage points.
SELECT count(distinct(user_id)) FROM events
WHERE event = 'buyProduct' AND round((income/capital),4) * 100 > 0.45
truncate(double_or_decimal a[, digits_to_leave]), dtrunc(double_or_decimal a[, digits_to_leave])
Usage: Removing the decimal part of a number, for example:
Parameters:
- a: The value to be truncated
- digits_to_leave (optional): The number of decimal places to keep, if this parameter is not present, it will be rounded to the integer part
Return value: The truncated value
Impala function
Please refer to: Impala Function Reference Document
Custom Impala Functions of Sensors Analytics
percentile_count function
SELECT day,percentile_count(perf_metric_ic_m_copy, 0.3) FROM events
group by 1