<img height="1" width="1" alt="" style="display:none" src="https://www.facebook.com/tr?id=785769564843890&ev=PixelInitialized" />
Screen shot 2015 06 01 at 3.02.14 pm.jpg
Search for Jobs by Todd Kosik

SRE DevOps AWS Engineer (Accounting & Finance)
New York, NY 10011
Recruited by: Todd Kosik | Director of Recruiting | HCR Group See all my Jobs

  • Hiring Company: HCR Group
  • Industry: Accounting & Finance
  • Compensation: Salary: 175K to 205K
  • Expires: Feb 08, 2020
Share this job:
Please ensure all fields are filled in and your resume is attached


Submit your information and resume instantly with Monster!

Job Description

Our client, a global quant hedge fund with offices in US and Europe, is seeking a SRE DevOps individual with Cloud/AWS and financial services experience. In the long term, this person will be responsible for managing their cloud presence, handling things like security, setting up vendor integrations, and generally just helping keep their systems running and performing by monitoring and optimizing them.

High-level architecture goal:
Company has an event-based architecture. Publishers to the stream, create, and maintain stateful connections to the outside world (e.g., connections to exchanges or clients). Consumer processes listen to the event stream, and in turn, can publish new events back to the event stream. Most consumers are stateful. Some communicate back to the outside world in response to these events. The end-to-end latency should be sub-second--ideally, much faster. A dashboard application subscribes to these events and runs on internal risk manager's desktops. These dashboards also can publish events to the stream. Most events will need to be captured and persisted on an encrypted WORM device for an audit trail. An additional replica is persisted for research, batch reporting, and analytics. Their software is built in both Python and Java.

AWS Technologies being utilized:
• Kinesis Streams and Firehose: all event streams are currently implemented using the Kinesis data streams API and KCL/KPL S3: For their audit trail and stream replayability.
• EC2: For both stream publishers and consumers. Would like to containerize these applications to run on a hosted container service such as ECS/Fargate. Also, running an OpenVPN instance.
• Lambda: For state-less event consumers and generic triggers.
• Managed SFTP: For external vendors pushing data to us.
• CDK/CloudFormation
• CodeCommit
• VPC Endpoints: several of their third-party data providers (e.g., their market data providers, market surveillance experts) expose services to us via AWS PrivateLink.

Beginning work with the following technologies:
• ECR/ECS/Fargate
• CodeBuild/CodePipeline
• Postgres RDS: For databases that manage external vendor data and translations. (e.g., instrument master, trading hours and calendars, exchange metadata, end-of-day positions)
• Client VPN: Currently, we're running an OpenVPN server on an EC2 instance. Due to its current manual configuration, it would be easier to codify the hosted VPN solution.

Potential future projects:
• Software Development Life Cycle (SDLC): As developers, we would like to be able to commit code to a source repository and have a build system that compiles the code, runs tests, and then can deploy the applications to a segregated environment (staging or production). We currently have some infrastructure as code written using the AWS CDK in Java and Python. However, we haven't yet fully automated their application deployments.
• Environment Segregation and Developer Sandboxes: Everyone is currently an admin. Would ideally like to create a set of IAM policies that would allow us to not step on each other's toes while still being empowered to explore new technologies (i.e., we don't want to have to ask permission in order to test out a new implementation idea, but we also don't want to be able to wipe out a production service accidentally). Furthermore, we need to be able to create some controls and audit trails around deployments to the production environment.
• Monitoring: Need a monitoring solution that can capture performance metrics (e.g., database load, end-to-end latency, kinesis event throughput, EC2 instance memory utilization, network traffic, etc.)
• Alerting: We have an explicit "alerts” stream where we can publish business-level alerts (e.g., risk limits breached, clients taking large losses, large market move, etc.). However, we do not currently have a mechanism that issues alerts based on technical infrastructure performance.
• Configuration and Secrets: Many of their clients services are currently configured via code or text files that are committed into source control. It would be useful to see best practices around this.
• End of day batch processing: Need an implementation of a job scheduling service that understands dependencies and successes and failures.
• End-to-end Security Audit: This is a large ask. There are likely entire companies dedicated to just this. Call this a "nice to have” from a cloud consultant. However, it would be useful to learn best practices around data management and security.