DataFrame Index

SAS users tend to think of indexing SAS data sets to improve query performance or avoid dataset sorting.   A SAS index stores values in ascending order for a specific variable or variables and manages information on how to locate a given observation(s) in the dataset.

pandas automatically creates an index structure at DataFrame creation time for both rows and columns.   We ecountered the RangeIndex used as the default row index for DataFrame rows and columns.   It is responsible for holding the axis labels and other metadata like integer-based location identifiers,   axis names,   etc.

Think of a DataFrame index as a means for labeling rows and columns.   Recall that at DataFrame creation time,  the RangeIndex object is created as the default index similar to the automatic _N_ variable SAS establishes at dataset creation time.

Just as you can return SAS dataset observations using the automatic variable _N_,   a DataFrame's default RangeIndex return,   or slice rows and columns using a zero-based offset.

Create Index

When a DataFrame is assigned an index,   rows are accessible by supplying integer values from the RangeIndex as well the row or column label defined by an index.

Build and display the df Dataframe.

import pandas as pd
df = pd.DataFrame([['0071', 'Patton'  , 17,  27],
                   ['1001', 'Joyner'  , 13,  22],
                   ['0091', 'Williams', 111, 121],
                   ['0110', 'Jurat'   , 51,  55]],
         columns = ['ID',   'Name', 'Before', 'After'])
print(df)
         ID      Name  Before  After
    0  0071    Patton      17     27
    1  1001    Joyner      13     22
    2  0091  Williams     111    121
    3  0110     Jurat      51     55


Call the set_index method with the inplace=True argument and display the DataFrame with ID column as the index.

df.set_index('ID', inplace=True)
print(df)
          Name  Before  After
ID
0071    Patton      17     27
1001    Joyner      13     22
0091  Williams     111    121
0110     Jurat      51     55

Observe how the rows are now labeled with values from the ID column.

Indexers

Slice DataFrame rows or columns using three indexers.

1.  [ ] operator enables selection by columns or by rows.

2.  .loc indexer uses row and column labels for slicing.   Column labels and row labels are assigned to an index,  either with the index= parameter at DataFrame creation time or with the df.set_index method.

3.  .iloc indexer uses integer positions (from 0 to length-1 of the axis) for sub-setting rows and columns provided by the default RangeIndex.

All indexers return a DataFrame.

Slice Columns by Position

Construct and print i DataFrame.

import pandas as pd
i = pd.DataFrame([['Patton'   , 17,  27],
                  ['Joyner'   , 13,  22],
                  ['Williams' , 111, 121],
                  ['Jurat'    , 51,  55],
                  ['Aden'     , 71,  70]])
print(i)
          0    1    2
0    Patton   17   27
1    Joyner   13   22
2  Williams  111  121
3     Jurat   51   55
4      Aden   71   70


Display the default RangeIndex for the rows and columns.

print(' Row Index:   ',    i.index, '\n', 'Column Index:', i.columns)
 Row Index:    RangeIndex(start=0, stop=5, step=1)
 Column Index: RangeIndex(start=0, stop=3, step=1)


Slice column 0 (zero).

i[0]
0      Patton
1      Joyner
2    Williams
3       Jurat
4        Aden

The [ ] operator accepts a Python list of columns,  or rows to return.   Recall a Python list is a mutable structure holding a collection of items.   List literals are written within square brackets [ ] with commas (,) to indicate multiple list items.

Create the df DataFrame.   Slice by placing the column labels Name and After in a list and display first 4 rows.

import pandas as pd
df = pd.DataFrame([['I','North', 'Patton', 17, 27],
                   ['I', 'South','Joyner', 13, 22],
                   ['I', 'East', 'Williams', 111, 121],
                   ['I', 'West', 'Jurat', 51, 55],
                   ['II','North', 'Aden', 71, 70],
                   ['II', 'South', 'Tanner', 113, 122],
                   ['II', 'East', 'Jenkins', 99, 99],
                   ['II', 'West', 'Milner', 15, 65],
                   ['III','North', 'Chang', 69, 101],
                   ['III', 'South','Gupta', 11, 22],
                   ['III', 'East', 'Haskins', 45, 41],
                   ['III', 'West', 'LeMay', 35, 69],
                   ['III', 'West', 'LeMay', 35, 69]],
                   columns=['District', 'Sector', 'Name', 'Before', 'After'])

df[['Name', 'After']].head(4)
       Name  After
0    Patton     27
1    Joyner     22
2  Williams    121
3     Jurat     55


The Python list with [' Name', 'After'] inside the DataFrame slice operator,  [ ] results in two pair of brackets [ ].  The outer pair is the syntax for the DataFrame [ ] indexer while the inner pair hold the column names in a Python list.

SAS analog.

data df;
length sector $ 6
       name $ 8
       district $ 3;

infile cards dlm=',';
input district $
       sector $
       name $
       before
       after;
list;
datalines;
I,   North, Patton,   17,  27
I,   South, Joyner,   13,  22
I,   East,  Williams, 111, 121
I,   West,  Jurat,    51,  55
II,  North, Aden,     71,  70
II,  South, Tanner,   113, 122
II,  East,  Jenkins,  99,  99
II,  West,  Milner,   15,  65
III, North, Chang,    69,  101
III, South, Gupta,    11,  22
III, East,  Haskins,  45,  41
III, West,  LeMay,    35,  69
III, West,  LeMay,    35,  69
;;;;
run;

proc print data = df(obs=4);
   var name after;
run;

PROC PRINT output.

PROC PRINT

Slice Rows by Position

The syntax for the DataFrame [ ] indexer is:

df:[start : stop : step]

The start position is included in the output and the stop position is not.  An empty start position implies 0 (zero).

df[:3]
District Sector      Name  Before  After
0        I  North    Patton      17     27
1        I  South    Joyner      13     22
2        I   East  Williams     111    121


Return every other row.

df[::2]
   District Sector      Name  Before  After
0         I  North    Patton      17     27
2         I   East  Williams     111    121
4        II  North      Aden      71     70
6        II   East   Jenkins      99     99
8       III  North     Chang      69    101
10      III   East   Haskins      45     41
12      III   West     LeMay      35     69


Return every other row with SAS.

data df1;
   set df;
if mod(_n_, 2) ^= 0 then output;
run;
NOTE: There were 13 observations read from the data set WORK.DF.
NOTE: The data set WORK.DF1 has 7 observations and 5 variables

PROC PRINT


Slice Rows and Columns by Position

The .iloc indexer uses integer positions (from 0 to length-1 of the axis) using the default RangeIndex to slice rows and columns.  The syntax for the .iloc indexer is:

df.iloc[row selection, column selection]


Slice first and last row.

df.iloc[[0, -1]]
       District Sector    Name  Before  After
Name
Patton        I  North  Patton      17    100
LeMay       III   West   LeMay      35     69


SAS Analog.

data df1;
   set df end = last;

if name in ('Patton', 'Jurat', 'Gupta') then after = 100;
if _n_  = 1 then output;
if last = 1 then output;
run;

proc print data = df1 noobs;  
run;

PROC PRINT

Slice Rows and Columns by Label

The .loc indexer is a method for returning rows and columns by labels.

To retrieve rows from the df DataFrame by label,   set the Name column as the DataFrame index.  This action assigns values from the Name column as labels for the rows.

Dislay the df DataFrame before index creation:

print(df)
   District Sector      Name  Before  After
0         I  North    Patton      17     27
1         I  South    Joyner      13     22
2         I   East  Williams     111    121
3         I   West     Jurat      51     55
4        II  North      Aden      71     70
5        II  South    Tanner     113    122
6        II   East   Jenkins      99     99
7        II   West    Milner      15     65
8       III  North     Chang      69    101
9       III  South     Gupta      11     22
10      III   East   Haskins      45     41
11      III   West     LeMay      35     69
12      III   West     LeMay      35     69


Call the .index attribute for DataFrame df.

print(df.index)
RangeIndex(start=0, stop=13, step=1)


Create the index in-place and display first 4 rows.   drop=True drops the Name column from the DataFrame although it is still available to label rows.

df.set_index('Name', inplace = True, drop = True)
print(df.head(4))
         District Sector  Before  After
Name
Patton          I  North      17     27
Joyner          I  South      13     22
Williams        I   East     111    121
Jurat           I   West      51     55


Call the .index attribute to display the new index.

print(df.index)
Index(['Patton', 'Joyner', 'Williams', 'Jurat', 'Aden', 'Tanner', 'Jenkins','Milner', 'Chang', 'Gupta', 'Haskins', 'LeMay', 'LeMay'],
dtype='object', name='Name')


Using the index,  slice the DataFrame.  The syntax for the .loc indexer is:

df.loc[row selection, column selection]


Slice rows beginning with label Patton ending with label Aden inclusive.  No value after the comma requests all columns.

df.loc['Patton': 'Aden', ]
         District Sector  Before  After
Name
Patton          I  North      17     27
Joyner          I  South      13     22
Williams        I   East     111    121
Jurat           I   West      51     55
Aden           II  North      71     70


Slice rows and columns.  Column slice uses a list.

df.loc['LeMay', ['Before','After']]
       Before  After
Name
LeMay      35     69
LeMay      35     69


Conditional Slicing

Slice rows with Boolean operators where Sector equals West and Before greater than 20.

df.loc[(df['Sector'] == 'West') & (df['Before'] > 20)]
      District Sector  Before  After
Name
Jurat        I   West      51     55
LeMay      III   West      35     69
LeMay      III   West      35     69


Slice rows where values in the Name column ending in "r" by chaining the str.endswith method,   keep columns District and Sector.

df.loc[df['Name'].str.endswith("r"), ['Name, District', 'Sector']]
KeyError: 'the label [Name] is not in the [index]'


This raises a KeyError since we dropped the Name column earlier.   One remedy is reset the index,   'returning' Name as a column.

df.reset_index(inplace = True)
df.loc[df['Name'].str.endswith("r"), ['Name', 'District', 'Sector']]
     Name District Sector
1  Joyner        I  South
5  Tanner       II  South
7  Milner       II   West 


A better approach:   Keep the Name column as the index and chain the .str.endswith attribute to the .index attribute.

df.set_index('Name', inplace = True, drop = True) 
df.loc[df.index.str.endswith('r'), ['District', 'Sector']]
       District Sector
Name
Joyner        I  South
Tanner       II  South
Milner       II   West


SAS analog.

proc sql;
   select name
         ,district
         ,sector
   from df
where substr(reverse(trim(name)),1,1) = 'r';
quit;

PROC PRINT

The <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html"isin attribute returns a Boolean indicating if a Python list of elements are values found in the target DataFrame column Sector.

df.loc[df['Sector'].isin(['North', 'South'])]
       District Sector  Before  After
Name
Patton        I  North      17     27
Joyner        I  South      13     22
Aden         II  North      71     70
Tanner       II  South     113    122
Chang       III  North      69    101
Gupta       III  South      11     22


SAS analog with the <a href="http://support.sas.com/documentation/cdl//en/sqlproc/69822/HTML/default/viewer.htm#n17vpypuojeg9zn1psc26z2ymju5.htm"IN condition.

proc sql;
   select *
   from df
where sector in ('North', 'South');
quit;

PROC SQL

Updating with .loc Indexer

The .loc indexer can update values.   Slice rows and columns before update.

df.loc[['Patton', 'Jurat', 'Gupta'], 'After']
Name
Patton    27
Jurat     55
Gupta     22
Name: After, dtype: int64


Update value for After column and display results.

df.loc[['Patton', 'Jurat', 'Gupta'], ['After']] = 100
df.loc[['Patton', 'Jurat', 'Gupta'], 'After']
Name
Patton    100
Jurat     100
Gupta     100
Name: After dtype: int64


SAS analog.

data df;
   set df;
   if _n_ = 1 then put
      'Name     After';

   if name in ('Patton', 'Jurat', 'Gupta') then do;
      after = 100;
      put @1 name @10 after;
   end;
run;
Name     After
Patton   100
Jurat    100
Gupta    100


MultiIndexing

This section introduces the DataFrame   MultiIndex,   also known as hierarchical indexing.   Often the data for analysis is captured at the detail level.   As part of performing an exploratory analysis,   a MultiIndex DataFrame provides a multi-dimensional ‘view’ of data.

Call the DataFrame contructor method to create the tickets DataFrame.   Set the idx row MultiIndex with Year as the outer-level and Month as the inner-level using the .from_product attribute.   Set the columns MultiIndex with Area as the outer-level and When as the inner-level using the .from_product attribute.

import pandas as pd
import numpy as np
np.random.seed(654321)
idx = pd.MultiIndex.from_product([[2015, 2016, 2017, 2018],
                         [1, 2, 3]],
                 names = ['Year', 'Month'])
columns=pd.MultiIndex.from_product([['City' , 'Suburbs', 'Rural'],
                         ['Day' , 'Night']],
                 names = ['Area', 'When'])

data = np.round(np.random.randn(12, 6),2)
data = abs(np.floor_divide(data[:] * 100, 5))

tickets = pd.DataFrame(data, index=idx, columns = columns).sort_index().sort_index(axis=1)


Display the tickets DataFrame.

print(tickets)
Area        City       Rural       Suburbs
When         Day Night   Day Night     Day Night
Year Month
2015 1      15.0  18.0   9.0   3.0     3.0   3.0
     2      11.0  18.0   3.0  30.0    42.0  15.0
     3       5.0  54.0   7.0   6.0    14.0  18.0
2016 1      11.0  17.0   1.0   0.0    11.0  26.0
     2       7.0  23.0   3.0   5.0    19.0   2.0
     3       9.0  17.0  31.0  48.0     2.0  17.0
2017 1      21.0   5.0  22.0  10.0    12.0   2.0
     2       5.0  33.0  19.0   2.0     7.0  10.0
     3      31.0  12.0  19.0  17.0    14.0   2.0
2018 1      25.0  10.0   8.0   4.0    20.0  15.0
     2      35.0  14.0   9.0  14.0    10.0   1.0
     3       3.0  32.0  33.0  21.0    24.0   6.0


Print the row Multi-index.

MultiIndex(levels=[[2015, 2016, 2017, 2018], [1, 2, 3]],
           codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]],
           names=['Year', 'Month'])

To slice DataFrame rows refer to: [2015, 2016, 2017, 2018] as the outer level of the MultiIndex to indicate Year and: [1, 2, 3] as the inner level of the MultiIndex to indicate Month for row slicing.

To slice columns refer to: ['City', 'Rural', 'Suburbs'] as the outer levels of the of the MultiIndex to indicate Area and: ['Day', 'Night'] as the inner portion of the MultiIndex to indicate When for column slicing.

Create the analog ticketsdataset with PROC TABULATE to render data shaped like the tickets DataFrame.  The Python and SAS code call different random number generators.

data tickets;
   length Area $ 7
          When $  9;
   call streaminit(123456);
   do year = 2015, 2016, 2017, 2018;
      do month = 1, 2, 3;
         do area = 'City', 'Rural', 'Suburbs';
            do when = 'Day', 'Night';
               tickets = abs(int((rand('Normal')*100)/5));
               output;
            end;
         end;
      end;
   end;
run;

proc tabulate;
   var tickets;;
   class area 
         when 
         year 
         month;
   table year * month ,
         area=' ' * when=' ' * sum=' ' * tickets=' ';
run;

PROC TABULATE

Slicing with MultiIndexes

A MultiIndex has the ability to slice by “partial” labels identifying data subgroups.   Partial selection “drops” levels of the hierarchical index analogous to row and column slicing.

tickets['Rural']
When         Day  Night
Year Month
2015 1       9.0    3.0
     2       3.0   30.0
     3       7.0    6.0
2016 1       1.0    0.0
     2       3.0    5.0
     3      31.0   48.0
2017 1      22.0   10.0
     2      19.0    2.0
     3      19.0   17.0
2018 1       8.0    4.0
     2       9.0   14.0
     3      33.0   21.0


For each month how many tickets were issued in the city during night time?

tickets['City', 'Night']
Year  Month
2015  1        18.0
      2        18.0
      3        54.0
2016  1        17.0
      2        23.0
      3        17.0
2017  1         5.0
      2        33.0
      3        12.0
2018  1        10.0
      2        14.0
      3        32.0


Create and print the sum_tickets DataFrame by applying the sum function to return the sum of all tickets by year.

sum_tickets = tickets.sum(level = 'Year')
print(sum_tickets)
Area   City        Rural      Suburbs
When   Day  Night  Day  Night Day  Night
Year
2015  31.0  90.0  19.0  39.0  59.0  36.0
2016  27.0  57.0  35.0  53.0  32.0  45.0
2017  57.0  50.0  60.0  29.0  33.0  14.0
2018  63.0  56.0  50.0  39.0  54.0  22.0


The SAS analog.

ods output 
   table = sum_tickets (keep = area
                                when
                                year
                                tickets_sum);
proc tabulate data=tickets;
   var tickets;
   class area 
         when 
         year;
   table year,
         area=' ' * when=' ' * sum=' ' * tickets=' ';run;

ods output close;
proc print data = sum_tickets;
run;

PROC TABULATE

Apply the sum function along a column.

sum_tickets2 = tickets.sum(level = 'Area', axis=1)
print(sum_tickets2)
Area        City  Rural  Suburbs
Year Month
2015 1      33.0   12.0      6.0
     2      29.0   33.0     57.0
     3      59.0   13.0     32.0
2016 1      28.0    1.0     37.0
     2      30.0    8.0     21.0
     3      26.0   79.0     19.0
2017 1      26.0   32.0     14.0
     2      38.0   21.0     17.0
     3      43.0   36.0     16.0
2018 1      35.0   12.0     35.0
     2      49.0   23.0     11.0
     3      35.0   54.0     30.0


Advanced Slicing with MultiIndexes

Slicing rows and columns with the .loc indexer can be used with a MultiIndexed DataFrame using similar syntax.   It supports Boolean logic for filtering.

Slice rows for 2018 and return all columns.

tickets.loc[2018]
Area   City       Rural       Suburbs
When    Day Night   Day Night     Day Night
Month
1      25.0  10.0   8.0   4.0    20.0  15.0
2      35.0  14.0   9.0  14.0    10.0   1.0
3       3.0  32.0  33.0  21.0    24.0   6.0


Slice rows for month 3,   year 2018 and all columns.

tickets.loc[2018, 3, :]
Area       City       Rural       Suburbs
When        Day Night   Day Night     Day Night
Year Month
2018 3      3.0  32.0  33.0  21.0    24.0   6.0


MultiIndex Slicing Rows and Columns

Slice the 3rd month for each year.   Seems like the correct syntax.

tickets.loc[(:,3),:]
>>> tickets.loc[(:,3),:]
  File "<stdin>", line 1
    tickets.loc[(:,3),:]
                 ^
SyntaxError: invalid syntax

However,   a : (colon) is illeage inside a tuple.   As a convenience Python’s built-in slice(None) function selects all the content for a level.   Let's try again.

tickets.loc[(slice(None), 3), :]
Area        City       Rural       Suburbs
When         Day Night   Day Night     Day Night
Year Month
2015 3       5.0  54.0   7.0   6.0    14.0  18.0
2016 3       9.0  17.0  31.0  48.0     2.0  17.0
2017 3      31.0  12.0  19.0  17.0    14.0   2.0
2018 3       3.0  32.0  33.0  21.0    24.0   6.0

The syntax slice(None) is the slicer for the Year column to include all values for a given level,  in this case,  2015 to 2018 followed by 3 to designate the level for month.   Empty values for the column selection using : (colon) returns all columns.

Slice months 2 and 3 for all years and no column slices.

tickets.loc[(slice(None), slice(2,3)), :]
 Area        City       Rural       Suburbs
When         Day Night   Day Night     Day Night
Year Month
2015 2      11.0  18.0   3.0  30.0    42.0  15.0
     3       5.0  54.0   7.0   6.0    14.0  18.0
2016 2       7.0  23.0   3.0   5.0    19.0   2.0
     3       9.0  17.0  31.0  48.0     2.0  17.0
2017 2       5.0  33.0  19.0   2.0     7.0  10.0
     3      31.0  12.0  19.0  17.0    14.0   2.0
2018 2      35.0  14.0   9.0  14.0    10.0   1.0
     3       3.0  32.0  33.0  21.0    24.0   6.0


Eventually we have difficulty supplying a collection of tuples for the slicers used by the .loc indexer. pandas provide the IndexSlice object to deal with this situation.

idx = pd.IndexSlice
tickets.loc[idx[2015:2018, 2:3], :]
Area        City       Rural       Suburbs
When         Day Night   Day Night     Day Night
Year Month
2015 2      11.0  18.0   3.0  30.0    42.0  15.0
     3       5.0  54.0   7.0   6.0    14.0  18.0
2016 2       7.0  23.0   3.0   5.0    19.0   2.0
     3       9.0  17.0  31.0  48.0     2.0  17.0
2017 2       5.0  33.0  19.0   2.0     7.0  10.0
     3      31.0  12.0  19.0  17.0    14.0   2.0
2018 2      35.0  14.0   9.0  14.0    10.0   1.0
     3       3.0  32.0  33.0  21.0    24.0   6.0

In the above example,  tickets.loc[idx[2015:2018, 2:3], :]   return years 2015:2018 inclusive on the outer level of the row MultiIndex and months 2 and 3 inclusive on the inner level.   The : (colon) designates the start and stop positions for the row labels.   Following the row slicer is a , (comma) designating the column slicer,   in this case,   all columns.


Request months 2 and 3 for year 2018 as the row slice with City and Rural from the Areaouter-level column slice.

idx = pd.IndexSlice
tickets.loc[idx[2018:, 2:3 ], idx['City', 'Day' : 'Night']]
Area        City
When         Day Night
Year Month
2018 2      35.0  14.0
     3       3.0  32.0

In the example above,  the column slicer does not slice along the inner level of the MultiIndex for When.


MultiIndex Conditional Slicing

The .loc indexer can use a Boolean Mask for slicing based an criteria applied to values in the DataFrame.

Which months are the number of issued tickets in the city during day time exceeding 25?   Create a Boolean mask representing a slicer and apply it to the tickets DataFrame.

mask = tickets[('City' ,'Day' )] > 25
tickets.loc[idx[mask], idx['City', 'Day']]
Year  Month
2017  3        31.0
2018  2        35.0
Name: (City, Day), dtype: float64

In the example above,   rows are sliced conditionally by defined mask object.   Columns are sliced using City from the outer-level Area and Day from the inner-level When.

The where attribute returns a DataFrame the same size as the original whose corresponding values are returned when the condition is True.   When the condition is False,   the default behavior is to return NaN's.

missing = "XXX"
tickets.where(tickets> 30, other = missing)
Area       City       Rural       Suburbs
When        Day Night   Day Night     Day Night
Year Month
2015 1      XXX   XXX   XXX   XXX     XXX   XXX
     2      XXX   XXX   XXX   XXX      42   XXX
     3      XXX    54   XXX   XXX     XXX   XXX
2016 1      XXX   XXX   XXX   XXX     XXX   XXX
     2      XXX   XXX   XXX   XXX     XXX   XXX
     3      XXX   XXX    31    48     XXX   XXX
2017 1      XXX   XXX   XXX   XXX     XXX   XXX
     2      XXX    33   XXX   XXX     XXX   XXX
     3       31   XXX   XXX   XXX     XXX   XXX
2018 1      XXX   XXX   XXX   XXX     XXX   XXX
     2       35   XXX   XXX   XXX     XXX   XXX
     3      XXX    32    33   XXX     XXX   XXX


Cross Section

DataFrames provision the xs cross section method to slice rows and columns from an indexed DataFrame or 'partial data' in the case of a MultiIndexed DataFrame.

tickets.xs((1), level='Month')
Area  City       Rural       Suburbs
When   Day Night   Day Night     Day Night
Year
2015  15.0  18.0   9.0   3.0     3.0   3.0
2016  11.0  17.0   1.0   0.0    11.0  26.0
2017  21.0   5.0  22.0  10.0    12.0   2.0
2018  25.0  10.0   8.0   4.0    20.0  15.0


The xs method works along a column axis.

tickets.xs(('City'), level='Area', axis = 1)
When         Day  Night
Year Month
2015 1      15.0   18.0
     2      11.0   18.0
     3       5.0   54.0
2016 1      11.0   17.0
     2       7.0   23.0
     3       9.0   17.0
2017 1      21.0    5.0
     2       5.0   33.0
     3      31.0   12.0
2018 1      25.0   10.0
     2      35.0   14.0
     3       3.0   32.0


Sum of all issued tickets during daylight for each area.

tickets.xs(('Day'), level='When', axis = 1).sum()
Area
City       178.0
Rural      164.0
Suburbs    178.0


The SAS analog.

proc sql;
   select unique area
        , sum(tickets) as Sum_by_Area
   from tickets
       where when = 'Day'
   group by area;
quit;

PROC TABULATE