mysqlsla Documentation

mysqlsla is a MySQL statement log analyzer. That is, it reads one or more logs produced by MySQL and tells you something useful about the queries. Since you're reading this you probably already have a reason in mind for needing to analyze a MySQL log. Perhaps you want to know the most frequent queries in the slow log, or which queries in the general log are the slowest to execute. mysqlsla can tell you these kind of things. Currently, mysqlsla can produce the following analyses:
For all log types: For slow logs (mimicking mysqldumpslow):
Each of these analyses outputs the "worst" queries in descending order. For example, the frequency analysis outputs the most frequently occurring queries in descending order. As mysqlsla is generally used in conjunction with query optimization, these analyses quickly tell you which queries need to be optimized first.

mysqlsla has many bells and whistles (between version v1.0 and v1.1 the script doubled in size). These features, which are all discussed below, are used to further refine the analyses results. The most important feature is the --correlation option which tells mysqlsla to use the queries from one analysis in all subsequent analyses. This allows you to do intuitive things like determine how long the most frequently queries take to execute. Without correlation, "how long" and "the most frequently" would be disparate analyses that would probably not correlate naturally. That is, the most frequent queries are not always the slowest, and vice versa.

Overall, mysqlsla allows you to quickly determine useful information about a large set of queries which would otherwise be infeasible to do by hand. This information is most often used to direct one's efforts as to what queries need to be optimized first. In the remainder of this documentation we'll cover all the options in mysqlsla and how, basically, to use them. For a demonstrative example of how mysqlsla can be used, read the mysqlsla how to. But first, a list of the limitations and future features of mysqlsla so, in case you were hoping to analyze a lot of stored procedures for example, you don't get to the end and realize mysqlsla can't do what you want it to.

Limitations

Future Features

A Note About Options

Technically, command line options are in the form --option, but -option works too. All options can be abbreviated if the abbreviation is unique. For example, option --top can be abbreviated --to but not --t because --t is ambiguous: it could mean --top, --time-each-query, or --time-all-queries. Some options also have explicit aliases, such as --rows-read being aliased --rr. Where explicit aliases exist they are noted like "--rows-read (--rr)." Two options, --correlate and --order, require their values be the explicit alias of the other options (this is explained in section 'Analysis Options'). Options that accept multiple values, like --general, expect the values to be comma separated.

General Options

--user USER
--password
--host ADDRESS
--port PORT
--socket SOCKET
--no-mycnf
--db (--Database)
--help
These options mimic most standard applications. --no-mycnf makes mysqlsla not read ~/.my.cnf which it does by default otherwise. --Database can be abbreviated -D (like the mysql cli).
 
--general LOG
--slow LOG
--raw LOG
One of these options is required. Each option refers to its respective type of log. To specify multiple logs, comma separate the list of log files. For example, to have mysqlsla read two general logs: --general file1.log,file2.log. mysqlsla can read any amount and combination of logs so something like this is valid: --general file1.log --slow file2.log,file3.log --raw file4.log. If mysqlsla can't read a log it prints a warning and continues reading the other logs. At present, all valid queries from all logs are combined; the analyses cannot be ran on a per-log basis. General and slow logs are made by MySQL and mysqlsla should have no problem reading any variation of them, but raw logs are made by other means, therefore a specific format is required of them: SQL statements must be semicolon terminated and new-line separated. A single SQL statement can be split across lines, but its last line must be semicolon terminated and the next SQL statement must start on a new line. Blank lines and lines that start with # or -- are skipped.
 
--beautify As of mysqlsla v1.3, all SQL statements are "flattened," which means they are set to all lowercase. This helps improve query abstraction because otherwise "SELECT * FROM foo;" and "select * from foo;" appear to be different statements in Perl (because Perl hash keys are case-sensitive). Unfortunately, flattened statements aren't pretty read, so this options causes mysqlsla to capitalize most important SQL keywords like SELECT, FROM, WHERE, ORDER BY, etc.
 
--examples For all analyses except --print-queries and --time-each-query, mysqlsla prints the abstracted form of queries. This options causes mysqlsla to print random, not-abstracted, examples of the queries instead.
 

Analyses

--print-queries (--pq) Print all valid queries from all logs. This is useful if you want to see what mysqlsla actually read from the logs. This option is not affected by --top and is not a valid option to --correlate.
 
--frequency (--fq) Print the most frequently occurring queries in descending order. The frequency of each query is listed by count and percentage of all queries. This option is affected by --top and is a valid option to --correlate.
 
--explain (--ex) Print the queries with the largest potential result set of rows in descending order. A query's potential result set of rows is the product of all rows for all tables given by EXPLAIN, therefore this is only an estimate based on table statistics. ANALYZE updates a table's statistics. A query with a large result set often indicates improper or absent indexes. This option is affected by --rows-read and --top and is a valid option to --correlate.
 
--time-each-query (--te) Time how long it takes each query to execute and print in descending order. No status indicator is given so if a query takes a really long time mysqlsla might appear to be frozen but it's really just waiting for the query to finish. This option is affected by --avg and --top and is a valid option to --correlate.
 
--time-all-queries (--ta) Time how long it takes all queries to execute. This analysis has only one result: the average total execution time of all queries. This option is affected by --avg and --percent: the current time run (or percentage complete of all time runs) is printed while the queries are being timed. The resulting average is an average of all the time runs. This option is not affected by --top and is not a valid option to --correlate.
 
--slow-time (--st, --at)
--slow-lock (--sl, --al)
--slow-rows-exam (--sre, --ar)
--slow-rows-sent (--srs)
These analyses work only with slow logs and they directly mimic the sort (-s) options of mysqldumpslow (except --slow-rows-sent). If used with the --mysqldumpslow (--mds) option, the output is nearly identical to mysqldumpslow. Missing here is the mysqldumpslow sort option 'c' (count); that option is analogous to mysqlsla analysis --frequency (--fq). For all these analyses, the values are averages for each unique query, printed in descending order. These analyses are affected by --top and are valid options to --correlate (as st, sl, sre, and srs, not at, al, ar).
 
--slow-total-time (--stt, --t)
--slow-total-lock (--stl, --l)
--slow-total-rows-exam (--stre, --r)
--slow-total-rows-sent (--strs)
These analyses work only with slow logs and they directly mimic the sort (-s) options of mysqldumpslow (except --slow-total-rows-sent). If used with the --mysqldumpslow (--mds) option, the output is nearly identical to mysqldumpslow. For all these analyses, the values are totals for each unique query, printed in descending order. These analyses are affected by --top and are valid options to --correlate (as stt, stl, stre, strs, not t, l, r).
 

Analysis Options

--order A The default order the analyses are ran and printed is: pq, fq, ex, te, ta. A new order can be specified with this option. The option's value A refers to the analyses' aliases, comma separated. For example, to reverse the default order: --order ta,te,ex,fq,pq. If an analyses is not specified in the new order it is appended respective to its place in the default order, however this unspecified analysis is not ran unless explicitly specified by its own option. For example the following runs only analyses fq and ex: --order fq,ex. Whereas the following run fq, ex, then pq, te because pq and te are unspecified in the order, but explicitly specified, and in the default order pq comes before te: --order fq,ex --pq --te. (Internally mysqlsla makes the order fq,ex,pq,te,ta.) When an analyses is given in --order its own option is implied so the following is redundant: --order fq,ex --fq --ex. Specifying an analysis more than once in the order works but is not supported. For example, the following causes 3 frequency analyses followed by 2 explain analyses: --order fq,fq,fq,ex,ex.
 
--correlate A By default mysqlsla runs each analysis on the queries independently. Therefore each analyses usually lists the queries differently. For example query X can be the most frequent, but query Y produces the most rows in its result set, and yet query Z is the slowest. To correlate analyses means to use the results from one analysis in subsequent analyses. This preserves the order of queries across subsequent analyses. For example, if the analyses are correlated to the frequency analysis, with the default order of analyses, subsequent analyses ex and te will use the results of fq. So if analysis fq results in queries X, Y, Z, analysis ex will list results for queries X, Y, Z although Y has the largest result set, and analysis te will also list results for queries X, Y, Z although Z has the longest execution time. Correlating results makes it easy to determine things like "how long do the top 3 most frequent queries take to execute" and "how frequently do the top 3 slowest queries occur." These questions can be answered without correlation if you can manage to find the same query in each analyses since the analyses will list the queries in different orders. With correlation, the Nth query in the correlated analyses is the Nth query in subsequent analyses. — This option's value A refers to fq, ex, or te. If --correlate is specified but the analysis not explicitly specified by its own option or in --order, the analysis is implied and set first in the order overriding --order. If the analysis is specified in --order, mysqlsla follows --order. For example, the following correlates on ex and the order is ex, fq: --correlate ex --order fq. But the follow correlates on ex and the order is fq, ex: --correlate ex --order fq, ex. The order is important because only subsequent analyses are correlated. If the correlating analysis is last nothing correlates to it. For example, the following run analyses ex and te independently then fq: --correlate fq --order ex,te,fq. In general, an analysis cannot correlate to another analysis which hasn't ran yet.
 
--hide This option takes a comma-separated list of analysis aliases (fq, te, st, etc.) and causes their output to be suppressed but still executes them (if the analysis is actually executed by its own option or --order or --correlate). This can be used to show only the final analysis. For example, to show only the execution time of the top 3 most frequent queries: --correlate fq --order fq,te --hide fq --top 3.
 
--percent For --time-each-query and --time-all-queries mysqlsla by default makes 1 time run. If more time runs are specified with --avg, these time runs can be counted as a percentage complete with this option. This only affects --time-all-queries (--time-each-query doesn't indicate which time run its currently on). The percentage complete is listed as %25, 50%, 75%. This option is implicitly invoked if 20 or more time runs are specified with --avg.
 
--rows-read (--rr) By default --explain calculates rows produced; this option makes it calculate rows read. Read the article JOIN Rows Produced vs. Rows Read for more information about this distinction.
 
--avg N (--n) For --time-each-query and --time-all-queries mysqlsla by default makes 1 time run. This option causes mysqlsla to make N time runs and average the results. If N is 20 or more, --percent is automatically invoked.
 
--flush-qc This option causes mysqlsla to FLUSH QUERY CACHE; before starting the analyses.
 
--top N By default mysqlsla lists all queries for all analysis (except --time-all-queries). This option limits the output to the top N queries.
 
--filter S By default mysqlsla allows the following SQL statements: DELETE, DO, INSERT, REPLACE, SELECT, TRUNCATE, UPDATE, USE, CALL, SHOW, ROLLBACK, COMMIT. It discards SET, START. This is new as of v1.2; previous versions only allowed SELECT and USE. Any SQL statement not recognized (whether filtered or not) is discarded. This is to prevent reading junk statements. The default filter can be changed with this option. The parameter S is a comma-separated list of above SQL statements or * preceded by + to allow the statement or - to discard the statement. For example, to allow only INSERT and UPDATE: --filter -*,+INSERT,+UPDATE. To discard everything: --filter -*. To discard only CALL: --filter -CALL,+SET. To discard DELETE and DO and allow SET: --filter -DELETE,-DO,+SET.
 
--safe This overrides any --filter with: -*,+SELECT,+USE. This option can be used to analyze an unknown log safely because it will discard any statements that may modify any database or table.
 
--grep P This option causes mysqlsla to keep only statements that match the Perl regular expression pattern P (case insensitive). It is applied after statement filtering. Since P is put directly into a pattern match (m/P/io), you may need to escape some special characters in the pattern like parenthesis.
 
--mysqldumpslow (--mds) The --slow-* analyses use a mysqlsla format by default. This option changes the format of the --slow-* analyses and --frequency to look like mysqldumpslow.
 

What To Do About Bug and Errors

I suspect there are issues to be worked out with other variants of MySQL general and slow logs. I know, for example, that MySQL formats the general log slightly differently between major versions. If mysqlsla breaks, send me a message with the error and exepect that I will ask for a portion of your log file that mysqlsla is having problems with.

(Doc rev: Sep 9 2006)