Introduction
To use data is to work at the level of the unit of analysis or observation. Presumably, you want to work at the level of the unit of analysis/observation because you wish to create, verify or test a statistic.
Your efforts will be concentrated on finding data, interpreting its structure with a codebook, converting it to an appropriate format for analysis and performing the desired analysis.
For most, the "Introduction to Data" occurs in the context of a course in statistical analysis where the data you have used has been preformatted for use with a statistical package such as SPSS, SAS or STATA. Once you begin your own research, you will discover that few datasets come prepared for you in the same way. In fact, you will find that getting your data in an appropriate form for analysis is often more involved than the analysis itself!
Source: Data Library-- Introduction To Data Handling
Why? Because a given file has two "formats" in play: the data file format and data file format. The first refers to the logical structure of data inside the file (e.g. a "rectangular" file versus a "hierarchical" file). The second refers to the type of file independent of its contents (an ASCII file versus an MS Excel file).
Find Data with our list of Data Sources
This link will take you to a list of the major data sources on the net. They typically provide users with some combination of data discovery, data ordering/downloading, lists of links, documentation and online analysis.Examining Files
Before you do any work with a file, it's worthwhile to verify its contents and their integrity. Utilities that can help you do this include word/line/character counters, multi-format viewers and file type guessers. See File Examination & Viewing Utilities for a list of basic programs. Some are free, some are shareware and some can be used in a UNIX environment only. This will vary from department to department, so if you don't know what your options are, contact your department about available computing resources.Use the Codebook
You might be wondering how you would know what to look for in a file when you've never used it before and you're trying to verify its contents and integrity. That and everything else you need to know should be in the codebook. Typical elements include:- A description of how the data was collected including sampling design;
- The variables contained in the data;
- In the case of surveys, the survey instrument or questionairre used to solicit responses from the respondent and the coded values of each question;
- The location and format of the variable within the raw data file;
- Meaning of the coded values for each variable.
Source: Reading And Using A Codebook. Social Science Research Computing Data Library, University of Chicago - For more information on codebooks, see Introduction to Data Handling from the Social Science Research Computing Data Library, University of Chicago.
Work With the Files
As noted in the introduction, getting a data file isn't merely a matter of finding what content you want. That's only step one. Step two is getting that file into a usable form. You may have to negotiate the file format, the file size, the file transfer and the file conversion.
CompressionCompression is the coding of data to save storage space or transmission time. Although data is already coded in digital form for computer processing, it can often be coded more efficiently (using fewer bits). There are many compression algorithms and utilities. Compressed data must be decompressed before it can be used.
The standard Unix compression utilty is called compress though GNU's superior gzip has largely replaced it. Other compression utilties include zip, PKZIP, Stuffit and WinZip.
ICPSR generally uses Gzip which has the file extension ".gz". For additional compression software and notes on file extensions see the Compression FAQ, particularly "What is this .xxx file type? Where can I find the corresponding compression program?" Source: Free Online Dictionary of Computing Some content added by GPL.
ConversionThere are a number of software applications that can be used to analyze a data file. Often, user A will create a data file in a format specific to software application 1 and user B needs it formatted for application 2. As a result, a market has appeared for still other software applications that can do the conversion from 1 to 2. The link above goes to a list of coversion tools.
TransferringFile transfer is the movement of one or more files from one location to another. A collection of electronically-stored files can be moved by physically moving the electronic storage medium, such as a computer diskette, hard disk, or compact disk from one place to another or by sending the files over a telecommunications medium. On the Internet, the File Transfer Protocol (FTP) is a common way to transfer a single file or a relatively small number of files from one computer to another. For larger file transfers (a single large file or a large collection of files), file compression and aggregation into a single archive is commonly used. (A zip file is a popular implemention.) Source: searchNetworking.com
"Download" and "upload" are commonly used to denote file transfer as well as "ftp" (used as a noun and verb) even though these are technically not interchangeable terms.
Do the Analysis
There are resources to help you learn how to do analysis and free online software. See the links below for specifics:
About Data Analysis
- Are there any free, web-accessible statistical analysis programs?
- Where can I find help with choosing a method of data analysis?
- Where can I go to learn how to do data analysis?
- Are there any guides to data entry into specific statistical packages?
- Converting Text Tables into Excel Files
- How do I prepare an MSExcel file for use with a statistical program?
- Quick and Easy Subsetting with Perl
- Where can I find help with importing a data file into SAS or SPSS?
- Where can I get help with STATA?
- Where can I get help with SPSS?
- Where can I get help with using SAS?
