2 BLADE
2.1 General Structure of BLADE
BLADE is stored in the Product network drive of your virtual machine in DataLab. There are separate folders for different file types, I recommend working with csv files if you are an R or Python user. There is a separate csv file for each financial year and source, e.g. one file will represent BAS returns in FY18-19. Alongside general data cleaning, you will need to merge across different sources, as well as appending data for each year, if you are interested in panel data.
The ‘core’ of BLADE consists of four different data sources containing administrative data which most businesses will complete during the financial year:
- Indicative Data Items: This contains the ‘spine’ of ABNs from which all other datasets can be linked.
- BAS: These are filled out by all firms that remit GST and include information on sales, operating expenses, purchases and capital expenditures
- BIT: These files contain income tax data (including balance sheet data) that is submitted to the ATO.
- PAYG: This contains information on employee headcount and wages expense when payment summaries are submitted to the ATO.
There are other datasets within the extended version of BLADE which cover R & D expenses and customs data, however these tend to cover a smaller proportion of firms within the population.
2.1.1 TAU or ABN levels
There are two levels of observation inside BLADE, the ABN level, which encompasses a legal entity and the TAU level, which attempts to identify the ‘economic firm’. A TAU may consist of multiple ABNs, for example, all of the legal entities that Kmart Australia operates would be aggregated into one TAU. TAUs can be aggregated further into an enterprise id, which in the case of Kmart, would identify its parent company Wesfarmers Ltd.
2.2 BLADE Utils in R
2.2.1 Append multiple time periods together
library(matrixStats)
library(data.table)
<- " " # Directory containing BLADE source files in CSV format
blade_path
<-" " # Directory where merged files will go
destination_path
<- c("frame", "BAS", "BIT", "PAYG") # File path patterns
file_patterns
<- c("ind_items.csv", "BAS.csv", "BIT.csv", "PAYG.csv") # Destination files
dest_patterns
for (i in 1:4) {
<- list.files(path, pattern=file_patterns[i], full.names=TRUE)
files <- lapply(files, fread)
files for (j in 1:length(files)) {
<- files[[j]][,source:=match(j,files)+10]
files[[j]]
}<- rbindlist(files)
files fwrite(files,file = paste(destination_path, dest_patterns[i], sep="/"))
}
2.2.2 Merging BLADE core datasets
# This is assuming you have completed the appending step
# Read in the files you saved down
<- c("ind_items", "BAS", "BIT", "PAYG")
objects
for (i in 1:4){
assign(objects[i],fread(paste(destination_path, dest_patterns[i], sep="/")))
}
# Merge datatable objects
for (i in list(BAS,BIT,PAYG)){
<- i[ind_items, on = .(id,tsid)]
ind_items }