Spreadsheets: Excel, Google Spreadsheet, Numbers

functions and operators: base conversion | date and time | financial | information | logic | matrix | numeric | reference | relational operators | statistical | distributions | tests | linear regression | text | trigonometric

other topics: cell references | cell ranges | copying | errors | importing and exporting | sorting | filtering | joining | aggregation | charts | linear optimization | user defined functions | notes and presentation

implementations: microsoft excel | google spreadsheet | apple numbers

base conversion
usage description
BIN2DEC(number)
BIN2HEX(number)
BIN2OCT(number)
DEC2BIN(number)
DEC2HEX(number)
DEC2OCT(number)
HEX2BIN(number) The hexidecimal number must be provided as a string and be double quoted.

The letters {A, …, F} or {a, …, f} can be used for digits 10 through 15.
HEX2DEC(number)
HEX2OCT(number)
OCT2BIN(number)
OCT2DEC(number)
OCT2HEX(number)
date and time
usage description
DATE(year, month, day) Convert integers year, month, day to a date value.

month must be in {1,…,12} and day must be in {1,…,31}.

The VALUE function converts a date to the number of days since December 31, 1899, which is how dates are represented internally.

Numbers, when added or subtracted from dates, are taken to represent days. Subtracting two dates yields the difference between them in days.
DATEVALUE(date_text) Converts text representation of a date to a date value.
DAY(serial_number) Extracts day of month as integer {1,…,31} from date.

The argument can be a date. "Serial number" is Excel terminology for a number representing days since December 31, 1899.
HOUR(serial_number) Extract hour as {0, …, 23} from date.
MINUTE(serial_number) Extract minute as {0, …, 59} from date.
MONTH(serial_number) Extract month as {1,…,12} from date.
NOW() Return current date and time.
SECOND(serial_number) Extract second as {0,…,59} from date.

Do any spreadsheets handle leap seconds?
TIME(hour, minute, second) Construct a time value.

Time values are represented as a fraction of a day.
TODAY() Return current date.
WEEKDAY(serial_number) Extract day of week as {1, …, 7} where 1 represents Sunday and 7 represents Saturday.
YEAR(serial_number) Extracts 4 digit year from date.
financial
usage description
DB(cost, salvage, life, period)
FV(rate, nper, pmt) Present value of an annuity with num_periods payments.

The first payment is not discounted at all. The sign of the value will be opposite the sign of the payments.
IRR(values)
NPV(rate, value1, [value2], …)
PMT(rate, nper, pv)
PV(rate, nper, pmt)
information
usage description
ISBLANK(value) Usually the argument is a cell reference, and the function is used to determine if the cell is empty.

Returns false if the value in the cell is 0, FALSE(), or text consisting of a space character. Returns false if the argument is an empty string instead of a cell reference.
ISERROR(value)
logical
usage description
AND(logical1, [logical2], …)
FALSE
FALSE()
In a numeric context FALSE is treated as zero.
IF(logical, if_true, [if_false])
NOT(logical)
OR(logical1, [logical2], …)
TRUE
TRUE()
In a numeric context TRUE is treated as one.
matrix
usage description
scalar + array
array + scalar
Add scalar to each element of array.
scalar - array
array - scalar
scalar * array
array * scalar
scalar / array
array / scalar
scalar ^ array
array ^ scalar
MDETERM(array)
MINVERSE(array)
MMULT(array, array)
SUMPRODUCT(array, …) Multiply corresponding elements of arrays which are the same shape, then sum them.
TRANSPOSE(array)
numeric
usage description
number + number
number - number Can also be used as unary operator.
number * number
number / number Performs floating point division.
MOD(number, divisor)
base ^ exponent
POWER(base, exponent)
ABS(number)
CEILING(number[, significance]) Smallest value which is greater than number and a multiple of significance. Default significance is 1.
EXP(number)
FACT(number) Factorial of n.
FLOOR(number[, significance])
LN(number)
LOG10(number)
PERMUT(n, k) The number of ways to choose k objects from n objects when order is significant: n! / (n-k)!
PI() Returns π.
QUOTIENT(numerator, denominator)
RAND() Random number between 0.0 and 1.0.
RANDBETWEEN(bottom, top) Random integer between bottom and top, inclusive.
ROUND(number, num_digits)
SIGN(number) The sign of number: -1, 0, or 1.
SQRT(number)
TRUNC(number, num_digits)
reference
usage description
ADDRESS(row_num, column_num) Return reference for row number and column number.
COLUMN Returns the column of a cell reference (e.g. A3) as a number.
COLUMNS Returns the number of columns in a range.
HLOOKUP Like VLOOKUP but uses the top row as the index.
INDIRECT(ref_text) Convert string, e.g. "A1" to a reference.
MATCH
OFFSET
ROW
ROWS
VLOOKUP(lookup_value, table_array, col_index_num, range_match) Perform a lookup in a table using the leftmost column as the index. The return value will be the same as lookup_value when tuple_index is 1.

Set range_match to FALSE to get get exact matches only. Otherwise the lookup returns the column value whose record has the greatest value in the join column which is less than or equal to the lookup value.
relational operators
usage description
expr1 < expr2 Comparisons can be performed on numbers, text, dates, and booleans.

When comparisons are performed between types, dates are treated as days since December 31, 1899, and TRUE and FALSE are treated as 1 and 0. Text values are always greater than number values.
expr1 <= expr2
expr1 = expr2
expr1 <> expr2
expr1 > expr2
expr1 >= expr2
statistical
usage description
AVERAGE(number1, [number2], …) Average of arguments, which can be ranges. Non-numeric values are excluded.
CORREL
COUNT Counts number of cells containing numeric values.
COUNTIF
COVAR
FREQUENCY
GEOMEAN The geometric mean.
HARMEAN The harmonic mean.
KURT
LARGE(array, k) Select the kth largest element from a range.

When k is one, this is identical to MAX.
MAX(number1, [number2, …) Select largest element from a one or more values or ranges.

Non-numeric values in ranges are ignored.
MEDIAN(number1, [number2, …)
MIN(number1, [number2, …) Select smallest element from a one or more values or ranges.

Non-numeric values in ranges are ignored.
MODE
PERCENTILE
PERCENTRANK
PROB
PRODUCT
QUARTILE
RANK
SKEW
SMALL(array, k) Select the kth smallest element from a range.

When k is one, this is identical to MIN.
STDEV
STDEVP
SUM
SUMIF
VAR
VARP
statistical: distributions
usage description
BETA.DIST The .DIST and .INV functions are Excel only
BETA.INV
BINOM.DIST
BINOM.INV
CHISQ.DIST
CHISQ.INV
EXPON.DIST
F.DIST
F.INV
GAMMA.DIST
GAMMA.INV
NORM.DIST
NORM.INV
POISSON.DIST
T.DIST
T.INV
statistical: tests
usage description
CHISQ.TEST(actual, expected) The Chi-squared test is used to determine whether a sample was drawn from a multinomial distribtuion.

Actual and expected are ranges of k cells which both sum up to n.

The return value is a probability, with a higher value indicating that it is more likely the sample comes from the hypothesized distribution.
CONFIDENCE.NORM
CONFIDENCE.T
FISHER
FISHERINV
STANDARDIZE
T.TEST
Z.TEST(array, x, [sigma])
statitistical: linear regression
usage description
FORECAST
GROWTH
INTERCEPT
SLOPE
STEYX
TREND
text
usage description
text1 & text2 Concatenates text1 and text2.
CHAR(num) Return the character for a num in the range {1, …, 255}. An 8-bit encoding such as Win-1252, Mac OS Roman, or ISO 8859-1 is used.
CODE(text) Return ASCII code for first character in text.
CONCATENATE(text1, [text2], ) Concatenate text1, text2, …
FIND(needle, haystack, [start]) Return index of needle in haystack. If needle is at the start of haystack 1 is returned.

Optionally begin search at start.

#VALUE! if needle is not found.
LEFT(text, num) Extract num leftmost characters.
LEN(text) Length of text in characters.
LOWER(text) Return text in lowercase.
MID(text,start, num) Extract substring.
PROPER(text) Return string in title case.
REPLACE(old, start, num, new) Replace part of old with new. Num characters from start are deleted, and new takes their place.
REPT(text, num) Return text repeated num times.
RIGHT(text, num_chars)
SEARCH(haystack, needle, [start]) Case-insensitive search for needle in haystack. Returns 1-based index of first substrig, otherwise #VALUE!. Optionally begin search at index start.
SUBSTITUTE(text, old, new[, num]) Replace first num occurrences of old with new. If num is not specified, replace all occurrences.
TEXT(value, format) Convert value to text. Format governs the conversion.

TEXT(1234, "000000"): zero pad integer on left
TEXT(42922, "MM/DD/YYYY"): format serial date
TRIM(text)
UNICHAR()
UNICODE()
UPPER(text) Return text in uppercase.
VALUE(text) Convert text to numeric value, or #VALUE! if conversion fails.
trigonometric
usage description
ACOS(number) Inverse trigonometric functions return a value in radians.
ASIN(number)
ATAN(number)
ATAN2(x_num, y_num)
COS(number) Trigonometric functions expect an value in radians.
DEGREES(angle) Convert radians to degrees.
RADIANS(angle)
SIN(number)
TAN(number)
_________________________________ ___________________________________________________________________________________________________

Cell References

rows and columns

Excel, Numbers, and Google Spreadsheet number rows from one: 1, 2, 3, ….

Columns are given one, two, and three letter names A, B, C, …, Y, Z, AA, AB, AC, …, ZY, ZZ, AAA, AAB, AAC, … The column naming scheme imposes a limit of $26 + 26^2 + 26^3 = 18,278$ columns on a spreadsheet.

Since Numbers supports more than 18278 columns it presumably also uses column names of the form AAAA, AAAB, ….

cells

A cell is referenced by its column and row: e.g. B7. When the contents of a cell are copied to a new cell, any cell references it contains are modified by the offset between the target and source cell. This can be prevent with the dollar sign: $B7 to prevent the column from being changed, B$7 to prevent the row from being changed, and $B$7 to prevent either from being changed.

multiple sheets

If an Excel workbook has two sheets: Sheet1 and Sheet2, then a cell in Sheet2 can refer to a cell in Sheet1 with the notation

Sheet1!A1

Google Spreadsheets uses the same notation as Excel.

A Numbers sheet can have more than one table in it. The notation for referring to a cell from Sheet2 is thus:

Sheet1::Table1::A1

spreadsheet size limits

excel (2011) google spreadsheet numbers
cells 1,048,576 * 16,384 200,000 none
rows 1,048,576 200,000 / 256 none
columns 16,384 256 none

the dependency graph

If a cell contains a reference to another cell, then it depends on that cell. If the other cell is updated, then the dependent cell is immediately updated as well. If the user introduces a cycle in the dependency graph then it is not possible for the spreadsheet to calculate the values of the cells in the cycle. The simplest example of a cycle would be to set A0 to =B0 and B0 to =A0.

When Excel, Numbers, and Google Spreadsheet detect a cycle, they will set all the cells in the cycle to an error value.

TODO: how to transpose a sheet.

TODO: sheet organization

Cell Ranges

Rectangular regions of cells can be used as arguments to functions and operands to operators. A rectangular region of cells can also be the output of a single formula.

input:

A portion of a column can be specified with the range notation B1:B7. A portion of a row can be specified with B1:F1. An arbitrary rectangular region of cells can be specified with B1:F7.

named cell range:

A cell range can be given a name. In Excel:

Insert | Name | Define...

In Google Spreadsheet:

Data | Named Ranges...

Cell range names must start with a letter and can contain letters and numbers. It is not permitted to created a cell range name which looks like a cell reference: e.g. A1, B2, …

output:

Excel and Google Spreadsheet support formulas which output a range of cells.

In Excel, one selects the range of cells where the output will go, types in the formula, and uses Ctrl+Shift+Enter to enter it. Failure to use Ctrl+Shift+Enter results in a #VALUE! error. The shape of the selected range of cells must match the size of the output matrix. If it is too small, some values will be lost. If it is too big, the extra values will have #N/A values written to them.

In Google Spreadsheet, one selects the upper left corner of the range of cells that the output will go and then types in the formula. Typing Enter is sufficient for entering the formula. Google Spreadsheet fills in cells by expanding down or to the right.

If you inspect the output cells in Excel, you will see that all of them have the formula in them. However, in the Google Spreadsheet output, only the upper left cell has the formula, and all the other cells contain the value that was computed for that cell.

Copying

Data or formulas can be copied from one cell to another in three ways:

  • using the copy (or cut) and paste short cuts
  • dragging the the lower right corner of the cell to be copied. This is called auto fill.
  • double clicking the lower right corner of the cell to be copied. This is also called auto fill.

There are keyboard shortcuts for Copy, Cut, and Paste. These are Ctrl+C, Ctrl+X, and Ctrl+V on Windows and ⌘C, ⌘X, ⌘V on Mac. These functions can also be performed by selecting from the Edit menu.

When a formula containing cell references is copied, the column and row of the cell reference is incremented by the difference between the source and the target unless dollar signs are used in the cell reference.

If more than one cell was copied or cut, then the same number of cells will have values pasted into them. The selected cell when

Errors

When a cell contains a formula that results in an error, one of the error values below is put in the cell.

When error values are used as values in other formulas they generally propagate.

Error values can also be created by simply typing the value into a cell.

excel

error condition
#NULL!
#DIV/0! division by zero
#VALUE! function didn't like one of its arguments
#REF!
#NAME? unrecognized function or constant name
#NUM! numeric overflow
#N/A error created by NA()

google spreadsheet

error condition
#ERROR! parse error
#NULL!
#DIV/0!
#VALUE!
#REF!
#NAME?
#NUM!
#N/A

Importing and Exporting

Cutting and pasting data from an HTML table to Excel often works correctly. There will be problems if cells contain newlines, however.

In general, cutting and pasting TSV text (tab and newline delimited) into Excel works. It does not seem to work in Numbers.

A good way to export data from Excel is to use Save As… (⇧⌘S) and set the format the UTF-16 Unicode Text .txt. This exports the data in a tab delimited format. Unicode characters are handled correctly. iconv can be used to convert the file to UTF-8. The various "Windows formatted" and "MS-DOS formatted" .txt and .csv options all use 8-bit encodings. Since Excel spreadsheets can contain Unicode data they should be avoided.

Sorting

excel

Select the region that is to be sorted. If you use ⌘A when a filled cell is selected, then the largest possible rectangular region of cells with filled data will be selected. If an empty cell is selected, then ⌘A causes the entire sheet to be selected.

Then select the menu "Data | Sort…" and select the column to use for sorting from the dialog box.

Pay attention to the "my list has headers" check box.

google spreadsheet

Select the label of the column to sort by. Then pull down the "Data" menu and and choose whether to sort the sheet A->Z or Z->A.

numbers

To sort the rows by a column, right click on the column label and select "Sort Ascending" or "Sort Descending".

Filtering

Before filtering, make sure the sheet has a header.

In Excel or Google Spreadsheet, select the data range and then select

Data | Filter

The column header will acquire a drop down menu that one can use to select the values to be displayed.

If the number of distinct values is small, they will be listed and one can use checkboxes to select the desired values.

Excel also permits building expressions using relational and logical operators.

In Numbers, right click the column header and select Show More Options. One will get a pop-up window which can perform sorting, filtering, and aggregation.

Joining

The VLOOKUP function can be used to perform relation join between data in two sheets. We suppose that the data is oriented so that each row corresponds to a record. If it were columns that corresponded to records, one would use the HLOOKUP function.

Imagine that we had a sheet with numbers and their English names:

english
1 one
2 two
3 three
5 five

In another sheet we have numbers with their Spanish names:

spanish
1 uno
2 dos
3 tres
4 quatro

In a third sheet we can join the two sheets on the first column by putting the following in the top row and then copying the values down the columns:

=english!A1 =VLOOKUP(A1, english!$A$1:$B$4, 2, FALSE) =VLOOKUP(A1, spanish!$A$1:$B$4, 2, FALSE)
1 one uno
2 two dos
3 three tres
5 five #N/A

The fourth argument of the VLOOKUP function indicates that an exact match should be used. If TRUE, and there are no exact matches, the record with the highest value less than the lookup value will be used.

To use VLOOKUP, the join column must be first in the sheets being joined. The join columns do not need to be sorted. Be careful of leading or trailing whitespace in strings. Also the type (numeric or text) must match.

By copying the join column from the first sheet, we get a left outer join. If we had copied the join column from the second sheet, we would get a right outer join.

To get an inner join one must manually find the inner join of the two join columns. One technique is to copy the first column and do a left outer join. Sort the result by a column from the second sheet and then delete all rows which have an #N/A in that value.

To get an outer join, copy both join columns into the join sheet, one below the other. Then use Data | Advanced Filter… to and select the "Unique records only" checkbox.

Aggregation

excel:

Select the region containing the data and:

Data | Summarize with PivotTable...

The first pop-up of the wizard looks like this:

pivot-table-wizard1.png

The second pop-up looks like this. Drag the fields which define the "grain" of the aggregation into the "Rows" box. Drag the fields which are going to be summed into the "Values" box.

pivot-table-wizard2.png

Summation is not the only aggregation function which is available. Right click on a field in the "Values" box and select "Field Settings" to change the aggregation function.

If two fields are dropped into the "Rows" box, then the data is summarized in a hierarchical manner by the first field and the combination of the first and second field.

It is also possible to the second field defining the "grain" of the aggregation into the "Columns" box. This can be nice when there is a single field in the "Values" box and putting both "grain" fields in the "Rows" box would create too many rows. Also, one gets summaries by each "grain" field in isolation.

google spreadsheet:

Data | Pivot table report...

numbers:

Right click the column header and select Categorize By This Column.

Charts

To create a chart, highlight the data region and select

Insert | Chart...

The chart is a view on the highlighted data, so if the data is changed, the chart will update itself to reflect the change.

The basic chart types are

  • column
  • bar
  • pie
  • line
  • area
  • scatter

column and bar charts

Column and bar charts are the same chart, except that bar charts are rotated clockwise 90°. The bar chart orientation might be useful for a chart that otherwise would be too wide to fit on a page. When a column chart is converted to a bar chart, the x-axis labels become the y-axis labels. Long text will read better on the y-axis than on the x-axis.

Column charts can be used for multivariate data, which is a range of data with more than one column. The data in each row can be stacked on top of each other or displayed side by side in clusters which are separated from other clusters by space.

If the columns and the rows have headers, these can be included in the selected range and they will be used to create labels. Row labels go on the x-axis of a column chart and the y-axis of a bar chart. Column labels go into a legend. Spreadsheets automatically detect headers if they contain text data, but one can also explicitly tell the spreadsheet to treat the first row or first column as headers.

Excel and Google Spreadsheet also have an option to make the chart treat rows as columns and columns as rows.

pie charts

A pie chart shows a univariate data set as percentages of their sum. It is a somewhat weak chart, since the data could easily be presented in a table, and experiments show that people are poor at estimating percentages from a pie chart.

Pie charts put column headers in the legend and ignore row headers.

Excel has a variant of the pie chart called the donut chart which can be used for multivariate data. Each column is represented by a concentric ring.

line charts

When the row labels are numeric, line charts are more appropriate that column charts. If no header column is selected, the labels are {1, 2, 3, … }.

Line charts can also be used when the labels are date values, in which case the chart is a time series chart. The underlying representation of dates, i.e. days since December 31, 1899, is used.

Line charts don't seem to require that the row header be sorted. If it isn't, the data points will be plotted equidistant from each other in row order.

Column headers are put in a legend.

Line charts can be multivariate; column becomes a line in a different color.

area charts

Area charts are line charts with the area below each line filled in with a solid color.

Stack area charts are useful if the sum of each row is a meaningful quantity. For example if the columns are sales by company, then an stacked area chart will also show the total sales. It tends to be difficult, however, to read off the individual sales for companies that are not in the bottom position of the stack.

Excel has a 100% stacked area chart which can be used to show how market share changes over time. To create this type of chart in the other spreadsheets, one would have to add columns which calculate the percentages.

Google Spreadsheet, by default, uses semi-transparency for non-stacked area charts. In Excel and Numbers one can click on the area and get a popup box which allows one to adjust the opacity.

scatter plots

In Excel, the change the point marker from the default solid diamond, double click on it to get a dialog window.

In Numbers one can change the point marker in the Series pane of the Chart window.

Google Spreadsheet appears to always use solid circles, though the size and the color can be changed.

TODO: how to add another series to a scatterplot with a different color or symbol.

Linear Optimization

To solve a linear optimization problem in Excel, the Solver.Xlam add-in must be selected. Go to:

Tools | Add-Ins...

This provides a Solver button in the Data section of the ribbon.

To use it, allocate a row or column to contain the decision variables, and allocate a cell for the objective function which is defined in terms of those decision variables. The SUMPRODUCT function is useful when defining the objective function.

Click the Solver button, which brings up a dialog.

solver.png

Enter the cell reference of the objective function in the topmost box.

Indicate whether the problem is a minimization or maximization problem using the radio buttons.

Enter the cell range of the decision variables in input box below that.

One enters the constaints one at a time by clicking the Add button, which brings up another dialog.

Note that there is a checkbox that can be used to indicate that all decision variables are non-negative.

User Defined Functions

  • excel: visual basic module
  • google spreadsheet: javascript
  • numbers: doesn't have them

Notes and Presentation

  • notes
  • cell formats: number, alignment, font, border, fill
  • column width and row height
  • frozen rows and columns

To freeze a row in Mac Excel 2011, select under the bottom row to be frozen and use Window | Freeze Panes

Microsoft Excel

  • can't export Unicode characters to CSV

Google Spreadsheet

Google Spreadsheets is implemented in JavaScript and runs in the browser. The spreadsheets are stored on Google's data servers. A Google spreadsheet can easily be published on the web; the document will be assigned a public URL which can be shared. It is also possible to grant read-only or read-write access to others if you know their Google email address.

  • can't handle large data sets (slow with more than 2k rows)

Apple Numbers

A difference between Numbers and Excel concerns the nature of a spreadsheet. In Excel a spreadsheet is always filled with a grid of cells, but in Numbers a spreadsheet is a canvas which by default contains a single table containing a grid of cells. A Numbers canvas can contain multiple tables, or it can contain graphical items such as charts.

When a chart is created in Numbers it is placed outside of the table with the data. In Excel, by contrast, when a chart is created it floats over the spreadsheet at its grid of cells.

A Numbers table has controls at the right edge of the of the column labels and the bottom edge of the row labels to expand the size of the table by a column or a row.

  • no extension language
  • can't import large xlsx files (64k row limit?)
  • lack of online help
  • no pivot table or column grouping
  • no auto fill?
  • canvases with multiple tables
  • export to PDF
  • doesn't trim tables when exporting as CSV
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License