Overview
This section discusses data access through web API's. A web Application Programming Interface (API) is a callable interface exposing an end-point (URL) to an HTTP web server defining a request/response set of methods. By passing in a 'request' defined by the interface, we expect a 'response' most often returning data as an XML or JSON stream.
We start the examples using the Federal Reserve Economic Data, FRED API. This is a well documented, straight-forward API provided by the Research Branch of the Federal Reserve Bank of St. Louis. Below, we go into detail on how the Python fredapi library provides a wrapper to the FRED API making easy work returning time-series data into a pandas DataFrame.
Beyond illustrating acquisition of data payloads by calling into APIs, we provide basic examples for visualization using matplotlib and visually-enhanced, reporting featuring pandas style property.
The SAS analog for calling into a web API is PROC HTTP. We illustrate PROC SGPLOT as an analog to pandas plot() method and SAS ODS sub-system as an analog to the pandas style property.
When working with libraries calling web-APIs it is useful to have a diagnostic tool like the free Postman app to send well-formed GETs and POSTs over HTTP and examining returned payloads. Mostly these Python programs are building a URL string composed of an end-point name, authorization token, and a collection of arguments forming the request for the target data. The Postman app helps isolate testing between forming the correct API calls from correctly parsing returned payload.
fredapi Library
In this example we call the fredapi API library to retrieve the FRED (Federal Reserve Economic Data) time-series data. The fredapi library offers conveniences like sub-setting, parsing, and analysis services. Best of all, it returns the data either as a panda series or DataFrame, so we don't have to supply parsing logic to incoming payloads arranging returned values into these structures.
import os
import pandas as pd
from fredapi import Fred
FRED_API_KEY = os.environ.get("FRED_TOKEN")
fred = Fred(api_key=FRED_API_KEY)
s = fred.get_series('sp500', observation_start='2020-01-01', observation_end='2020-02-20')
print(s.tail())
Returns
2020-02-14 3380.16
2020-02-17 NaN
2020-02-18 3370.29
2020-02-19 3386.15
2020-02-20 3373.23
dtype: float64
This could hardly be any simpler!
As a good practice, the required FRED API key is retrieved from the Windows Environment Variable arbitrarily named FRED_TOKEN. Refer to this short YouTube video for setting Windows or Linux environment variables.
The environ parameter is to the os module enables a portable way to call operating system services. Since a dictionary is returned, we chain the get attribute and supply the key FRED_TOKEN returning the value assignment to this Windows environment variable. If the target environment variable is not found, then a None object is returned. You will need to restart your Python interpreter after setting an environment variable since they are static and read at process startup-time.
The fred.get_series accepts as an argument the valid name of a series found within the FRED database, in this case, the Standard and Poors 500 daily close value. observation_start and observations_end are arguments identifying the start date and end date for the series values to be returned.
Return the S&P 500 daily close value as a Series and plot it over time.
import matplotlib.pyplot as plt
fred.get_series('sp500').plot()
plt.show()
Since a time-series is retuned as a DataFrame, the plot() attribute renders a line-plot with dates assigned to the x-asis and corresponding values assigned to the y-axis. Not supplying observation_start and observations_end arguments returns all values from the series.

We expand our API call example to return multiple series for comparative analysis. The U.S. Census Bureau conducts a quarterly survey of a sample of U.S. men and women, age 16 above asking them to self-report pre-tax wages. The unit of measure is 1982-1984, seasonally-adjusted CPI dollars. See details here.
dict = {}
dict['wmn_earnings'] = fred.get_series('LES1252882800Q')
dict['men_earnings'] = fred.get_series('LES1252881900Q')
df = pd.DataFrame.from_dict(dict)
plt.figure(figsize=(12,5))
plt.legend(loc='best')
We need two calls to the fred.get_series method to enable comparisons for women and men. We begin by declaring an empty dictionary called dict with a key arbitrarily named wmn_earnings whose value is the JSON payload returned by calling the fred.get_series method. We utilize the same approach for returning the corresponding series for men's earnings.
The code above renders a line plot. The pandas plot() method plots all labeled DataFrame columns giving men and women's earnings as values plotted along the Y-axis and dates plotted along the X-axis.

To understand how the dictionary key/value pairs are mapped into a DataFrame, the code below 'unpacks' the first two sequences for dict dictionary.
start = 0
stop = 2
while start < stop:
for key, value in dict.items():
print(key, 'corresponds to', value)
start += 1
Returns:
wmn_earnings corresponds to 1979-01-01 251.0
1979-04-01 250.0
1979-07-01 251.0
1979-10-01 249.0
1980-01-01 244.0
...
2018-10-01 314.0
2019-01-01 317.0
2019-04-01 319.0
2019-07-01 323.0
2019-10-01 327.0
Length: 164, dtype: float64
men_earnings corresponds to 1979-01-01 408.0
1979-04-01 405.0
1979-07-01 399.0
1979-10-01 393.0
1980-01-01 385.0
...
2018-10-01 392.0
2019-01-01 392.0
2019-04-01 394.0
2019-07-01 394.0
2019-10-01 395.0
Length: 164, dtype: float64
Once the dictionary key/value pairs are created by calling fred.get_series, the df DataFrame is constructed by calling the pd.DataFrame.from_dict method.
You may be wondering how we acquired the date values for the X-axis? The fred.get_series method call mapped incoming JSON date values to a DatetimeIndex. This is another example of abstraction provided by the fredapi library.
df.index
Returns:
DatetimeIndex(['1979-01-01', '1979-04-01', '1979-07-01', '1979-10-01',
'1980-01-01', '1980-04-01', '1980-07-01', '1980-10-01',
'1981-01-01', '1981-04-01',
...
'2017-07-01', '2017-10-01', '2018-01-01', '2018-04-01',
'2018-07-01', '2018-10-01', '2019-01-01', '2019-04-01',
'2019-07-01', '2019-10-01'],
dtype='datetime64[ns]', length=164, freq=None)
Now that we have the basics for producing the line plot we can annotate it to improve its ledgibility.
plt.figure(figsize=(12,5))
ax1 = df.wmn_earnings.plot(grid = True, label = 'Women', color = 'blue')
ax2 = df.men_earnings.plot(grid = True, label = 'Men', color = 'red')
ax1.set_ylabel('82-84 CPI Adjusted $', fontsize='x-large')
ax1.set_title('Median Weekly Earnings 16 Years Old and Above', fontsize='xx-large')
plt.legend(loc='best')
plt.figtext(0.99, 0.01, 'Source: U.S. Bureau of Labor Statistics', horizontalalignment='right')
plt.show()
Renders:

API Call Anatomy
The abstraction offered by the fredapi library makes our programs compact. The library calls are intuitive and as noted above, handle parsing logic mapping XML or JSON payloads into DataFrames (or Series). These features simplify coding efforts. However, this comes at a potential 'cost'. Our program logic now has a dependency on a open-source library which may, or may not be updated, if the FRED API changes.
That is why it makes sense to have a good understanding of what happens 'under-the-covers'. By following the documentation for the FRED API, we can read the details on just exactly how these API calls work to return a JSON-formatted payload.
As you can see from the screen-shot from the Postman app, our simple call into the FRED API is composed of the following:
| Element | Value |
|---|---|
| end-point | https://api.stlouisfed.org/fred/series/observations? |
| series_id | LES1252882800Q |
| api_key | masked |
| file_type | JSON |
PROC HTTP
Base SAS provisions the general-purpose PROC HTTP for calling into web APIs using HTTP GET and POST requests.
Aanalogous to the fredapi library, SAS/ETS software provisions the SASFRED LIBNAME engine offering conveniences such as mapping the XML payload into a SAS Dataset and other features like frequency aggregation, value formatting, etc. This feature is not illustrated.
The Base SAS code below makes two calls to PROC HTTP to return separate series for men and women's wage data. Because end-point parameters use an ampersand (&) to delimit the URL elements, we call the %nrstr function to mask this symbol at Macro compile time. See this seminal paper on SAS Macro processing internals for details.
%let token = %sysget(FRED_TOKEN);
/******************************************************************************************/
/* Process the women's wage data */
/******************************************************************************************/
filename womens temp;
%let url = "https://api.stlouisfed.org/fred/series/observations?series_id=LES1252882800Q%nrstr(&api_key)=&token%nrstr(&file_type)=json";
proc http
url = &url.
method = "GET"
out = womens;
run;
libname in_json1 JSON fileref=womens;
data date(keep = value)
values(keep = value);
set in_json1.alldata;
if p2 = 'date' then output date;
if p2 = 'value' then output values;
run;
data womens_wages(drop = temp_d temp_v);
merge date(rename = (value = temp_d))
values(rename = (value = temp_v));
date = input(temp_d, yymmdd10.);
value = input(temp_v, 8.);
format date mmddyy10.
value 8.;
run;
/******************************************************************************************/
/* Process the men's wage data */
/******************************************************************************************/
filename mens temp;
%let url = "https://api.stlouisfed.org/fred/series/observations?series_id=LES1252881900Q%nrstr(&api_key)=&token%nrstr(&file_type)=json";
proc http
url = &url.
method = "GET"
out = mens;
run;
libname in_json2 JSON fileref=mens;
data date(keep = value)
values(keep = value);
set in_json2.alldata;
if p2 = 'date' then output date;
if p2 = 'value' then output values;
run;
data mens_wages(drop = temp_d temp_v);
merge date(rename = (value = temp_d))
values(rename = (value = temp_v));
date = input(temp_d, yymmdd10.);
value = input(temp_v, 8.);
format date mmddyy10.
value 8.;
run;
data all_wages;
merge womens_wages(rename=(value=womens))
mens_wages(rename=(value=mens));
run;
title "Median Weekly Earnings 16 Years Old and Above";
proc sgplot data=all_wages;
series x = date y = mens / lineattrs = (color = red);
series x = date y = womens / lineattrs = (color = blue);
xaxis label = 'Date' grid;
yaxis label = '82-84 CPI Adjusted $' grid values = (225 to 425 by 25);
keylegend / location=inside position=bottomright across=1;
run;
Renders:
The use of the JSON LIBNAME engine handles parsing logic loading incoming JSON into a SAS dataset. This JSON payload returns all values as strings. In the Python example mapping the JSON string values to floats and date strings to a DatetimeIndex is handled by the fredapi library. This is shown by examining the source for the _parse helper function.
In the case of SAS, the JSON LIBNAME engine returns variable values as strings. As a result we call the INPUT function with a corresponding INFORMATS mapping string values representing dates to SAS date values. Similarly we convert corresponding string variable values to numerics. Absent this logic, PROC SGPLOT will render the Y-axis values in alphabetically rather than ascending numerical order.
requests library
While the fredapi library is specific to the FRED API, in many circumstances mastering the general-purpose requests library suits a wide range of public API's.
We begin by making API calls into GitHub's REST API. Unlike the FRED API, calls into GitHub require an authorization header as part of the GET ot POST. The FRED API call permit passing credentials as parameters to the URL string. Observe the postman GET request below:
Notice rather than passing key/value pairs as parameters like the FRED API example, these key value pairs are passed as part of the HTTP header. In the image above we see the postman UI highlights the Headers tab.
The requests library provisions the ability to pass in an end-point and optional headers parameter to the requests.get method.
import requests
import json
token = os.environ.get("GIT_TOKEN")
header = {'Content-Type':'application/json' ,'Authorization': 'token ' + token}
r = requests.get('https://api.github.com/user', headers=header)
json_dict = r.json()
Returns:
login: RandyBetancourt
id: 22532833
node_id: MDQ6VXNlcjIyNTMyODMz
avatar_url: https://avatars3.githubusercontent.com/u/22532833?v=4
gravatar_id:
url: https://api.github.com/users/RandyBetancourt
html_url: https://github.com/RandyBetancourt
followers_url: https://api.github.com/users/RandyBetancourt/followers
following_url: https://api.github.com/users/RandyBetancourt/following{/other_user}
gists_url: https://api.github.com/users/RandyBetancourt/gists{/gist_id}
starred_url: https://api.github.com/users/RandyBetancourt/starred{/owner}{/repo}
subscriptions_url: https://api.github.com/users/RandyBetancourt/subscriptions
organizations_url: https://api.github.com/users/RandyBetancourt/orgs
repos_url: https://api.github.com/users/RandyBetancourt/repos
events_url: https://api.github.com/users/RandyBetancourt/events{/privacy}
received_events_url: https://api.github.com/users/RandyBetancourt/received_events
type: User
site_admin: False
name: Randy Betancourt
company: None
blog: https://www.pythonforsasusers.com
location: None
email: None
hireable: None
bio: Author and analytics consultant
public_repos: 3
public_gists: 0
followers: 9
following: 0
created_at: 2016-09-29T23:17:29Z
updated_at: 2020-02-27T20:19:15Z
In this example, the headers= parameter is composed of a dictionary called header sending these key/value pairs as an element to the requests.get method.
'Content-Type' : 'application/json'
'Authorization' : 'token ' + token'
With the r object defined we can manipulate the returned content. The line:
json_dict = r.json()
calls the request library's json() decoder defining the json_dict dictionary. Assuming the request is valid then the JSON streams from the web server behind the API and is passed written as key/value pairs into the json_dict dictionary. There is no need for parsing/flattening the returned JSON stream. All values are returned as strings. If the library we use does not cast numeric values into ints or floats, then we need to write logic to handle this requirement.
We could call the pd.DataFrame.from_dict(json_dict) method to create a DataFrame from this dictionary (not shown).
Exploring the requests library further, we wish to create a DataFrame for recorded low and high temperatures from one of the thousands of NOAA weather stations in the U.S. Cody Gette, from the University of North Dakota has a great write-up named, Using an API and NOAA.
To start, we acquire a token from NOAA to access the Climate Data Online (CDO) API.
We construct the function noaa_json_data() to return low, high, ,and average daily temperatures from a single station. An obvious improvement is establishing default values for each parameter and passing them as a dictionary to the function call. Another useful addition is adding try/except/finally blocks for flow control to handle errors and events. As my graduate studies advisor frequently said, "This is an exercise left to the reader".
from datetime import date, datetime
def noaa_json_data():
endpoint = "https://www.ncdc.noaa.gov/cdo-web/api/v2/data"
datasetid = 'GHCND'
stationid = 'GHCND:USR0000ICOP'
limit = 1000
startdate = date(2020, 1, 1)
enddate = date(2020, 1, 14)
params = 'datasetid=' + str(datasetid) + '&' + \
'stationid=' + str(stationid) + '&' + \
'limit=' + str(limit) + '&' + \
'startdate=' + startdate.strftime("%Y-%m-%d") + '&' + \
'enddate=' + enddate.strftime("%Y-%m-%d") + '&' + \
'units=standard'
r = requests.get(endpoint, params=params, headers={'token':token})
print("Sending URL: " + r.url)
print("Status Code from NOAA: " + str(r.status_code))
df = pd.DataFrame.from_dict(r.json()['results'])
df.date = pd.to_datetime(df['date'], format = '%Y-%m-%d')
return df
Since the json() encoder returns string values, we call the to_datetime method casting date values from string to a datetime64 column type.
df_tmp1 = noaa_json_data()
The code above calls the function and because the function returns a DataFrame we create the df_tmp1 DataFrame with this assignment. Python functions are lazily-evaluated, meaning the function is not sent to the interpreter until it is called. The function contains a call to the print() function to echoing the formed URL sent to the API while a subsequent call to the print() function returns the HTTP status code.
Sending URL: https://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&stationid=GHCND:USR0000ICOP&limit=1000&startdate=2020-01-01&enddate=2020-01-14&units=standard
Status Code from NOAA: 200
Printing the first 6 rows of the df_tmp1 DataFrame displays values for daily average temperature (TAVG), daily maximum (TMAX), and daily minimum (TMIN).
print(df_tmp1.head(6))
Returns:
date datatype station attributes value
0 2020-01-01 TAVG GHCND:USR0000ICOP ,,U, 24.0
1 2020-01-01 TMAX GHCND:USR0000ICOP H,,U, 29.0
2 2020-01-01 TMIN GHCND:USR0000ICOP H,,U, 17.0
3 2020-01-02 TAVG GHCND:USR0000ICOP ,,U, 18.0
4 2020-01-02 TMAX GHCND:USR0000ICOP H,,U, 27.0
5 2020-01-02 TMIN GHCND:USR0000ICOP H,,U, 5.0
For our needs we wish to "flatten" or unstack() the DataFrame with each row containing the average, maximum, and minimum reading per day. Below, the first three lines split the df_tmp1 DataFrame into three based on temperature reading type.
tmax = df_tmp1.loc[df_tmp1['datatype'] == 'TMAX']
tmin = df_tmp1.loc[df_tmp1['datatype'] == 'TMIN']
tavg = df_tmp1.loc[df_tmp1['datatype'] == 'TAVG']
df = pd.merge(pd.merge(tmax, tmin, how='inner', on='date', suffixes=['_TMAX', '_TMIN']),tavg, how='inner', on='date')
To 'glue' the three DataFrames together with the three temperature values associated with a single date value, we nest two calls to pd.merge method each with an inner join returning rows where date values match in the two input DataFrames.
df.head(3)
Returns:
date_TMAX datatype_TMAX station_TMAX attributes_TMAX value_TMAX rec_date date_TMIN datatype_TMIN
0 2020-01-01T00:00:00 TMAX GHCND:USR0000ICOP H,,U, 29.0 2020-01-01 2020-01-01T00:00:00 TMIN
. . . .
1 2020-01-02T00:00:00 TMAX GHCND:USR0000ICOP H,,U, 27.0 2020-01-02 2020-01-02T00:00:00 TMIN
. . . .
2 2020-01-03T00:00:00 TMAX GHCND:USR0000ICOP H,,U, 35.0 2020-01-03 2020-01-03T00:00:00 TMIN
. . . .
To simplify matters, we create the new wd DataFrame by supplying a list of column names we wish to copy from from the df DataFrame similiar to a SAS KEEP list. Recall the outter brackets ([ ]) are the indexer slicing DataFrame columns and the inner brackets denote the list of column names.
wd = df[['date', 'value_TMAX', 'value_TMIN', 'value']]
wd.set_index('date', inplace=True)
wd.rename(columns={'value': 'value_TAVG'})
Returns:
value_TMAX value_TMIN value_TAVG
date
2020-01-01 29.0 17.0 24.0
2020-01-02 27.0 5.0 18.0
2020-01-03 35.0 4.0 14.0
2020-01-04 34.0 0.0 21.0
2020-01-05 25.0 -2.0 8.0
2020-01-06 25.0 -10.0 6.0
2020-01-07 33.0 -2.0 19.0
2020-01-08 30.0 5.0 19.0
2020-01-09 20.0 -4.0 8.0
2020-01-10 18.0 -17.0 4.0
2020-01-11 28.0 -6.0 10.0
2020-01-12 20.0 -9.0 11.0
2020-01-13 17.0 -4.0 10.0
2020-01-14 18.0 0.0 13.0
An alternative approach to "flattening" the df DataFrame uses pandas GroupBy and unstack(). See the section on groupby operations here for further details and additional examples.
#call the function
df_tmp2 = noaa_json_data()
#create row counter
df_tmp2['rc'] = df_tmp2.groupby('date').cumcount() + 1
#pivot the dataframe
wd2 = df_tmp2.set_index(['date', 'rc']).unstack()
#rename the columns
wd2.columns = [x + '_' + str(y) for (x, y) in wd2.columns]
# keep the measurement columns
wd2 = wd2[['value_1', 'value_2', 'value_3']]
We call our defined noaa_json_data()function again to illustrate and execute the following steps:
1. Create the row counter column rc with a groupby for the date column.
2. Pivot the wd2 DataFrame calling the unstack() method. Columns date and rc become indexes.
3. Rename the wd2 columns since they act as labels in plots and reports.
The rc column is needed as means to 'trigger' the groupby for creating the wd2 DataFrame. Without an action like sum(), mean(), or agg(), etc., a groupby object is otherwise a definition, like an SQL view found at a given memory location. See the section on pivot here for details and additional examples.
wd2 = wd2.rename(columns={"value_1": "Average", \
"value_2": "Maximum", \
"value_3": "Minimum"}) \
The pandas rename() method accepts a dictionary where the key is the old column name and value is the new name.
Returns:
Average Maximum Minimum
date
2020-01-01 24.0 29.0 17.0
2020-01-02 18.0 27.0 5.0
2020-01-03 14.0 35.0 4.0
2020-01-04 21.0 34.0 0.0
2020-01-05 8.0 25.0 -2.0
2020-01-06 6.0 25.0 -10.0
2020-01-07 19.0 33.0 -2.0
2020-01-08 19.0 30.0 5.0
2020-01-09 8.0 20.0 -4.0
2020-01-10 4.0 18.0 -17.0
2020-01-11 10.0 28.0 -6.0
2020-01-12 11.0 20.0 -9.0
2020-01-13 10.0 17.0 -4.0
2020-01-14 13.0 18.0 0.0
We can easily visualize this DataFrame.
import matplotlib.pyplot as plt
wd2.plot()
plt.show()