What is Stata? Stata is a full-featured statistical programming language for Windows, Macintosh, Unix and Linux. It can be considered a “stat package,” like SAS, SPSS, RATS, or eViews. The number of variables is limited to 2,047 in standard Stata/IC, but can be much larger in Stata/SE or Stata/MP. The number of observations is limited only by memory.

Stata has traditionally been a command-line-driven package that operates in a graphical (windowed) environment. Stata version 11 (released July 2009) contains a graphical user interface (GUI) for command entry. Stata may also be used in a command-line environment on a shared system (e.g., Unix) if you do not have a graphical interface to that system.

Stata is advertised as having three major strengths:

  • Data manipulation
  • Statistics
  • Graphics

Stata is an excellent tool for data manipulation: moving data from external sources into the program, cleaning it up, generating new variables, generating summary data sets, merging data sets and checking for merge errors, collapsing cross–section time-series data on either of its dimensions, reshaping data sets from “long” to “wide”, and so on. In this context, Stata is an excellent program for answering ad hoc questions about any aspect of the data.

In terms of statistics, Stata provides all of the standard univariate, bivariate and multivariate statistical tools, from descriptive statistics and t-tests through one-, two- and N-way ANOVA, regression, principal components, and the like. Stata’s regression capabilities are full-featured, including regression diagnostics, prediction, robust estimation of standard errors, instrumental variables and two-stage least squares, seemingly unrelated regressions, vector autoregressions and error correction models, etc. It has a very powerful set of techniques for the analysis of limited dependent variables: logit, probit, ordered logit and probit, multinomial logit, and the like.



