May 30, 2024

Calling USASpending API with Python

At work this week I spent some time diving back into Python. I had some ETL jobs to think about and started to write a class to connect to different sources. A talented coworker passed off some code to connect to Dataverse and that honestly got me most of the way there.

I started listing the different possible data sources I would need to connect to and a good chunk of them were API-based. While I can write Power Query to connect to APIs easily, it is not second nature yet for me with Python. So I decided in my free time to brush up on it.

USASpending API

My choice API for practicing is the USA Spending API. It is pretty well documented and easy to use. I started with simple GET requests. If you are unfamiliar, GET requests are simpler than POST requests when getting data. You do not have to pass in any keys or authenticate parameters and there is no JSON that you have to write to get data back. You may have to do some string manipulation with the URL, but that is where the complexity usually ends.

The requests I worked with mostly ended up in two camps:

  • Simple responses that fit on one page of results
  • Longer responses that needed you to paginate

The paginated responses usually had keys up front that dealt with page_metadata, so catching that was important to that I could set up a while loop:

# check if there is another page
    if 'page_metadata' in json_response:
      has_next = json_response['page_metadata']['hasNext']

      # get the results
      result= pd.concat([result, pd.json_normalize(json_response, "results")])

      # if there is another page
      while has_next==True:

The other key to pagination was if there was another page, you could get the next page number and use that to manipulate the URL and get the results from the next page:

# build the next url dynamically and GET
response = re.get(f'{url}/?page={next}')

String interpolation in Power Apps, C#, and Python is literally my favorite thing right now. I am not writing a lot of C#, but when doing training I try to use it as much as possible. In Power Apps it is my go-to for string concatenation between multiple fields and variables.

Future Enhancements

At some point, I would like to start doing POST requests, but that is a bigger commitment as each endpoint may have different parameters. Some of them support website activities such as auto-completion, so I may be able to skip over those.

I would also like to dump the data frame results into DuckDB for SQL querying. That would be relatively easy to do. I just haven’t figured out queries that would be cool to show.

Lastly, some visuals with Altair would be cool to figure out. I love the Deneb visual in Power BI and would like to use Vega in Python.

The Gist

Enough yapping. Here is a simple script with some examples in comments: