header

TreeMap

header
Loading data

Loading data

TreeMap offers the possibility of loading data in various formats and from multiple data sources. The most common ways of importing your own data is to use tab-delimited orcomma-separated files, as well asExcel workbooks. Connectivity to common relational databases and some on-line data providers is also provided.

File-based data sources

To load data files, either

  • use the FileOpen... (Ctrl+O) menu entry. This will open a dialog to select the file to open:

    Figure 1.1. File chooser dialog for selecting a data file

    File chooser dialog for selecting a data file

  • drag and drop a file with a known file extension onto the TreeMap application frame,

  • or double-click on the file if its extension is registered to TreeMap.

Macrofocus TreeMap (*.mtm)

This is the native format used by TreeMap. It can be used to store both a copy of the actual data, its original data source, as well as all the configurations made using the TreeMap application. The data are stored in a highly compressed binary format to reduce the file size and all the configuration information in XML format. For a detailed technical specification of the data format, please contact us.

Text (Tab delimited) (*.txt;*.tsv;*.tab;*.raw)

Loading data from tab-delimited text files should be pretty straightforward. TreeMap expects the first line to contain the name of each column, using the tab character to separate each column.

The tab-separated values format is a popular method of data interchange among databases and spreadsheets. It stores tabular data (numbers and text) in plain-text form. While it is a loosely defined format (even though IANA attempts to standardize it), TreeMap automatically detects its encoding, the type of data values, and handles smoothly all the most common causes of errors. Tab-delimited files are processed similarly to comma-delimited files, except that they use the tabulator character to separate each column.

TreeMap expects the first line as a header to contain names corresponding to the columns in the file. These values will be used to name each of the variables. Each record is then located on a separate line. The values between each column are delimited by tabs. Each record "should" contain the same number of tab-separated fields. Any field may be quoted (with double quotes). Fields containing a line-break, double-quote, and/or tab should be quoted. A (double) quote character in a field must be represented by two (double) quote characters.

After the file has been loaded, TreeMap will attempt to detect the data type of each column. Automatically recognized types are text (String), numbers (Integer and Double) and some more specialized types such as dates (supported formats are "MM/dd/yyyy", "MM/dd/yy", "yyyy-MM-dd", "dd.MM.yyyy HH:mm:ss"), URLs, geometries (in WKT format), and binary data (in Base64 format).

Example 1.1. Data file in tab-delimited format

As an example, the following text file

Planet	Region	Spherical area	Radius in km	Discovery date	Wikipedia article
Mercury	Inner Solar System	18688458.19	2439		http://en.wikipedia.org/wiki/Mercury_(planet)
Venus	Inner Solar System	115066184.2	6052		http://en.wikipedia.org/wiki/Venus
Earth	Inner Solar System	127796483.1	6378		http://en.wikipedia.org/wiki/Earth
Mars	Inner Solar System	36274097.98	3398		http://en.wikipedia.org/wiki/Mars
Jupiter	Outer Solar System	16014816458	71398		http://en.wikipedia.org/wiki/Jupiter
Saturn	Outer Solar System	11309733553	60000		http://en.wikipedia.org/wiki/Saturn
Uranus	Outer Solar System	2026829916	25400	3/13/1781	http://en.wikipedia.org/wiki/Uranus
Neptune	Outer Solar System	1855079046	24300	9/23/1846	http://en.wikipedia.org/wiki/Neptune
Pluto	Outer Solar System	7547676.35	1550	2/18/1930	http://en.wikipedia.org/wiki/Pluto

will result in the following table being loaded in TreeMap:

Planet Region Spherical area Radius in km Discovery date Wikipedia article
String String Double Integer Date URL
Mercury Inner Solar System 18688458.19 2439 http://en.wikipedia.org/wiki/Mercury_(planet)
Venus Inner Solar System 115066184.2 6052 http://en.wikipedia.org/wiki/Venus
Earth Inner Solar System 127796483.1 6378 http://en.wikipedia.org/wiki/Earth
Mars Inner Solar System 36274097.98 3398 http://en.wikipedia.org/wiki/Mars
Jupiter Outer Solar System 16014816458 71398 http://en.wikipedia.org/wiki/Jupiter
Saturn Outer Solar System 11309733553 60000 http://en.wikipedia.org/wiki/Saturn
Uranus Outer Solar System 2026829916 25400 3/13/1781 http://en.wikipedia.org/wiki/Uranus
Neptune Outer Solar System 1855079046 24300 9/23/1846 http://en.wikipedia.org/wiki/Neptune
Pluto Outer Solar System 7547676.35 1550 2/18/1930 http://en.wikipedia.org/wiki/Pluto

While TreeMap will autodetect the character encoding used for representing international and special characters beyond ASCII characters, it is recommended to use the Unicode standards (typically UTF-8 or UTF-16).

To force TreeMap to parse values for a specific data type, an optional second header line can be inserted. The second line can optionally contain information about the type of values to be expected for each column. Possible types are "String" for any type of textual information, "Integer" for numbers without a fractional or decimal component, "Float" and "Double" for single and double precision floating-point numbers, and "Color" to provide color information. Each subsequent lines should contain the respective values for each of the columns.

As an example, you can download the Forbes Global 2000 dataset in this format.

After the data file has been loaded into TreeMap, it will automatically attempt to create a default configuration.

CSV (Comma delimited) (*.csv)

The comma-separated values (CSV) format stores tabular data (numbers and text) in plain-text form. Most spreadsheet and data management software are able to export data in this format. While it is a loosely defined format (even though RFC 4180 attempts to standardize it), TreeMap automatically detects its encoding, the type of data values, and handles smoothly all the most common causes of errors. Comma-delimited files are processed similarly to tab-delimited files, except that they use a comma (or semicolon) to separate each column.

TreeMap expects the first line as a header to contain names corresponding to the columns in the file. These values will be used to name each of the variables. Each record is then located on a separate line. The values between each column are delimited by commas (or semicolons). Each record "should" contain the same number of comma-separated fields. Any field may be quoted (with double quotes). Fields containing a line-break, double-quote, and/or commas should be quoted. A (double) quote character in a field must be represented by two (double) quote characters.

After the file has been loaded, TreeMap will attempt to detect the data type of each column. Automatically recognized types are text (String), numbers (Integer and Double) and some more specialized types such as dates (supported formats are "MM/dd/yyyy", "MM/dd/yy", "yyyy-MM-dd", "dd.MM.yyyy HH:mm:ss"), URLs, geometries (in WKT format), and binary data (in Base64 format).

Example 1.2. Data file in comma delimited format

As an example, the following text file

Planet,Region,Spherical area,Radius in km,Discovery date,Wikipedia article
Mercury,Inner Solar System,18688458.19,2439,,http://en.wikipedia.org/wiki/Mercury_(planet)
Venus,Inner Solar System,115066184.2,6052,,http://en.wikipedia.org/wiki/Venus
Earth,Inner Solar System,127796483.1,6378,,http://en.wikipedia.org/wiki/Earth
Mars,Inner Solar System,36274097.98,3398,,http://en.wikipedia.org/wiki/Mars
Jupiter,Outer Solar System,16014816458,71398,,http://en.wikipedia.org/wiki/Jupiter
Saturn,Outer Solar System,11309733553,60000,,http://en.wikipedia.org/wiki/Saturn
Uranus,Outer Solar System,2026829916,25400,3/13/1781,http://en.wikipedia.org/wiki/Uranus
Neptune,Outer Solar System,1855079046,24300,9/23/1846,http://en.wikipedia.org/wiki/Neptune
Pluto,Outer Solar System,7547676.35,1550,2/18/1930,http://en.wikipedia.org/wiki/Pluto

will result in the following table being loaded in TreeMap:

Planet Region Spherical area Radius in km Discovery date Wikipedia article
String String Double Integer Date URL
Mercury Inner Solar System 18688458.19 2439 http://en.wikipedia.org/wiki/Mercury_(planet)
Venus Inner Solar System 115066184.2 6052 http://en.wikipedia.org/wiki/Venus
Earth Inner Solar System 127796483.1 6378 http://en.wikipedia.org/wiki/Earth
Mars Inner Solar System 36274097.98 3398 http://en.wikipedia.org/wiki/Mars
Jupiter Outer Solar System 16014816458 71398 http://en.wikipedia.org/wiki/Jupiter
Saturn Outer Solar System 11309733553 60000 http://en.wikipedia.org/wiki/Saturn
Uranus Outer Solar System 2026829916 25400 3/13/1781 http://en.wikipedia.org/wiki/Uranus
Neptune Outer Solar System 1855079046 24300 9/23/1846 http://en.wikipedia.org/wiki/Neptune
Pluto Outer Solar System 7547676.35 1550 2/18/1930 http://en.wikipedia.org/wiki/Pluto

While TreeMap will autodetect the character encoding used for representing international and special characters beyond ASCII characters, it is recommended to use the Unicode standards (typically UTF-8 or UTF-16).

To force TreeMap to parse values for a specific data type, an optional second header line can be inserted. The second line can optionally contain information about the type of values to be expected for each column. Possible types are "String" for any type of textual information, "Integer" for numbers without a fractional or decimal component, "Float" and "Double" for single and double precision floating-point numbers, and "Color" to provide coloring information. Each subsequent line should contain the respective values for each of the columns.

As an example, you can download the Forbes Global 2000 dataset in this format.

After the data file has been loaded into TreeMap, it will automatically attempt to create a default configuration.

Microsoft Excel Workbook (*.xls;*.xlsx; *.xlsm)

TreeMap can read files produced by Microsoft Excel, including the recent Office Open XML format, even without having Excel installed on the local computer. The first row is expected to contain the name of each column. If the workbook contains multiple sheets, a dialog allows to choose which one should be loaded by TreeMap.

To force TreeMap to parse values for a specific data type, an optional second header line can be inserted. The second line can optionally contain information about the type of values to be expected for each column. Possible types are "String" for any type of textual information, "Integer" for numbers without a fractional or decimal component, "Float" and "Double" for single and double precision floating-point numbers, and "Color" to provide color information. Each subsequent line should contain the respective values for each of the columns.

As an example, you can download the Forbes Global 2000 dataset in this format.

ODF Spreadsheet (*.ods)

TreeMap can read files in the native OpenOffice and LibreOffice format.

SPSS (*.sav)

TreeMap can read files in the native SPSS format.

SAS (*.sas7bdat)

TreeMap can read files in the native SAS format.

HCIL TM3 (*.tm3)

For backward compatibility, files can be loaded in the data format used by the University of Maryland Treemap. A description of the actual data format is provided on the HCIL website.

TreeML (*.treeml)

For backward compatibility, files can be loaded in the TreeML file format. A description of the actual data format is provided on the HCIL website.

Microsoft Project (*.mpp)

Microsoft Project Exchange (*.mpx)

Handled identically to the Microsoft Project format.

Microsoft Project Data Interchange (*.xml)

Handled identically to the Microsoft Project format.

Zip Archive (*.zip)

This will analyze the content of the archive, including its hierarchical structure. A typical use is to find which files and directories are taking up space in the archive.

Java Archive (*.jar;*.war)

Handled identically to the Zip Archive format.

Text Document (*.text)

This will analyze the text, compute word count statistics, and open the results as a tag cloud.

Web page (*.html)

Handled identically to the Text Document format.

ESRI Shapefile (*.shp)

This is a popular geospatial vector data format for geographic information systems (GIS) software. Shapefiles spatially describe features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

Microsoft Access (*.mdb;*.accdb)

Access database tables can directly be loaded into TreeMap. However, this is only supported on the Windows platform and requires Microsoft Access or the Microsoft Access Database Engine to be installed.

Directory-based data sources

Because a treemap representation is an excellent visualization for finding out which files and directories take space on a computer hard disk, TreeMap provides the possibility of scanning the directory structure of your file system by choosing FileOpen Directory... (Ctrl+F) and then specifying the location of the root directory. The following dialog is presented to select the directory:

Figure 1.2. File chooser dialog for selecting a root directory

File chooser dialog for selecting a root directory

Database connectivity

TreeMap can directly import data from popular relational database servers installed on the local computer or on a remote machine. Currently supported are:

  • MySQL

  • Oracle

  • Microsoft SQL Server

  • PostgreSQL

  • IBM DB2

  • SAP MaxDB

  • PostGIS

Please contact support if your database system is not currently supported. Any data source queryable through a JDBC driver can easily be integrated into TreeMap.

Microsoft Access is also supported, but as a file-based data source.

To start importing data from a database, go to FileOpen Database... (Ctrl+D) . This will open a dialog to define the required parameters:

Figure 1.3. Database query dialog

Database query dialog

On-line data sources

Stock quotes data from Yahoo Finance can directly be access through the FileOpen Dataset submenu, as well as all the example datasets provided on our website. This menu entry also provides integration withTreeMap Server.

Automatic default configuration

By default, TreeMap automatically assigns the first categorical variable to the label, the second categorical variable (if available) to the grouping, the first numerical variable to the size, and the second numerical variable (if available) to the color.

Data types

All data types support null (blank) values. Supported types are:

Text

String

Represents character strings such as "abc".

StringPath

Represents an array of character strings. Values should be delimited by commas.

HtmlString

Represents a tagged string in HTML format.

Numbers

Byte

The Byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).

Short

The short data type is a 16-bit signed two's complement integer. It has a minimum value of -32,768 and a maximum value of 32,767 (inclusive).

Integer

The Integer data type is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive). For integral values, this data type is generally the default choice unless there is a reason (like the above) to choose something else. This data type will most likely be large enough for the numbers your program will use, but if you need a wider range of values, use Long instead.

Long

The Long data type is a 64-bit signed two's complement integer. It has a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive). Use this data type when you need a range of values wider than those provided by Integer.

Float

The Float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but it can typically handle more than 7 decimal digits. This data type should never be used for precise values, such as currency. For that, you will need to use the BigDecimal type instead.

Double

The Double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but it can typically handle more than 15 decimal digits. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.

BigDecimal

An arbitrary-precision signed decimal number.

StringDouble

A Double data type with support for formatting patterns.

Others

Boolean

The Boolean data type has only two possible values: true and false. Use this data type for simple flags that track true/false conditions. This data type represents one bit of information.

Date

Represents a specific instant in time, with millisecond precision.

Color

The Color data type is used to encapsulate colors in the default sRGB color space. Every color has an implicit alpha value of 1.0 or an explicit one provided in the constructor. The alpha value defines the transparency of a color and can be represented by a float value in the range 0.0 - 1.0 or 0 - 255. An alpha value of 1.0 or 255 means that the color is completely opaque and an alpha value of 0 or 0.0 means that the color is completely transparent. When constructing a Color with an explicit alpha or getting the color/alpha components of a Color, the color components are never premultiplied by the alpha component.

Icon

A small fixed size picture, typically used to decorate components.

Image

Represents graphical images.

URL

The URL data type represents a Uniform Resource Locator, a pointer to a "resource" on the World Wide Web. A resource can be something as simple as a file or a directory, or it can be a reference to a more complicated object, such as a query to a database or to a search engine. More information on the types of URLs and their formats can be found in the URL Specification.

File

A representation of file and directory pathnames.

byte[]

For binary data.

Geometry

Represents geometric information, such as points, lines, and polygons.