A design sketch for a data aggregator and reporting tool

We chatted today in work about generating reports that aggregate many peoples trading positions over many stocks. The way we do it quickly slows down as data grows. So I wondered if we could use some tricks to make a faster reporting system, that also might be more general, and also independent of storage technology.

We can easily enough produce an endpoint that provides a single days trades in json, a bit like this: /trades/22-March-2014 produces

[
	{client-id: joe, 
	 trades:[
		{quantity:101, price:34.5, amount: 3484.5, stock:AAPL},	
		{quantity:50, price:32.65, amount: 1632.5, stock:AAPL},	
		{quantity:-30, price:35.1, amount: -1053,  stock:AAPL}	
	]},
	{client-id: mary, 
	 trades:[
		{quantity:-1000, price:2.78, amount:-2780, stock:BPL}	
	]}
]

What we chatted about is providing another service that hits this url each day and auto-aggregates the data for us. This aggregating system would be configured as follows (using pseudo-json):

{
	client-id: {
		trades.stock: {
			sum: quantity,amount
		}
	} 
}

so now we could go to

/aggregate/trades/22-March-2014

and it would show us the summed position of each client for each stock like this:

[
	{client-id: joe, 
	 trades:[
		{quantity:121, amount: 2685.5, stock:AAPL}	
	]},
	{client-id: mary, 
	 trades:[
		{quantity:-1000, amount:-2780,  stock:BPL}	
	]}
]

What about if we wanted to see an aggregated report of trades between 01-Nov-2012 and 22-March-2014? This aggregating system could also auto-aggregate in blocks of time, so there would be an aggregate for each week of the year, each month of the year, and each year. If we coupled this with another little restful service – lets say

/range?from=01-Nov-2012&to=22-March-2014

which would return how many days, weeks, months and years there are between these dates:

[
	{month:Nov-2012},
	{month:Dec-2012},
	{year:2013},
	{month:Jan-2014},
	{month:Feb-2014},
	{week:1-March-2014},
	{week:2-March-2014},
	{week:3-March-2014},
	{day:22-March-2014}
]

We can now go down through this list, getting the aggregate for each block of time, and aggregating with the previous one – folding the results into each other. All the constituent aggregates are prepared, so its a quick look up for each, and a quick process to aggregate them. It should be possible to make a system like this work for any json data, and it should be able to support several kinds of aggregating functions.

I’m sincerely hoping the demand for this requirement in our system continues, as it would be fun to build something like this. If feels like the right separation of concerns. Of course, there is every chance something like this is already out there – in which case I hope that gets pointed out to me.

Leave a Reply