Pandas

From PrattWiki
Revision as of 18:30, 19 January 2020 by DukeEgr93 (talk | contribs) (Created page with "This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than tha...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!

File Types

Pandas can load data from a text file or from an Excel spreadsheet.

Text Files

For text files, you need to figure out two things:

  • How are individual data points separated in the file? (tabs, commas, spaces, etc)
    • If separated by commas, use pd.read_csv("file") to load data frame
    • If separated by tabs, use pd.read_table("file") to load data frame
    • If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_csv() or pd.read_table() command.

Excel Files

For Excel files, you need to figure out two things:

  • Does the file have one sheet or more than one sheet?
    • If there is only one sheet, use pd.read_excel("file") to load data frame
    • If there are multiple sheets, include sheet_name=X where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_excel()