Principal Data Scientist

Job Locations US-MD-Columbia | US-CA-San Diego
Computer/Data Science
Position Type
Regular Full-Time
Clearance Level
Able to obtain

Overview is seeking a Principal Data Scientist to lead the shaping of requirements for the largest and most challenging machine learning and predictive analytics solutions, guide and significantly contribute to solving the hard problems of implementing those solutions, support customer and capture engagements to demonstrate our differentiation, and mentor data scientists across the company.  The successful candidate will have the experience using data science to understand environments / processes, detect risks, forecast behaviors, and optimize related courses of action for enterprise programs, especially those seeking to accelerate decision cycles, improve objectivity, and reduce surprise.

The Principal Data Scientist will have broad exposure to multiple programs and projects across coupled with the ability to directly influence design and implementation. They will discern and champion the best practices for machine learning, predictive/prescriptive analytics, ensuring that “the BigBear way” is consistent, pushes the envelope of what is possible, and differentiated. You will educate our clients and staff to help them understand what we can achieve, then take a leading hands-on role solving the hardest problems and shepherding the process of getting projects started following your designs. Our Principals will stay connected to emerging methods and technologies and support related experimentation and adoption, driving the future evolution of

This is an ideal opportunity to be part of one of the fastest growing AI/ML companies in the industry. At, we're in this business together. We own it, we make it thrive, and we enjoy the challenges of our work. We know that our employees play the largest role in our continual success. That is why we foster an environment of growth and development, with an emphasis on opportunity, recognition, and work-life balance. We give the same high level of commitment to our employees that we give to our clients. If sounds like the place where you want to be, we'd enjoy speaking with you. 

What you will do

  • Perform the most senior design and development tasks for machine learning, predictive, and prescriptive analytics on projects, including:
    • Design and develop reusable workflows for data integration, transformation, and feature engineering required to train models
    • Train and validate models, often in big data, distributed environments
    • Perform exploratory data analysis to understand data and model-derived insights
    • Perform model introspection to understand model behaviors and explain predictions
    • Visualize data and model-derived insights to communicate internally and externally
    • Integrate models into operational systems
  • Support solution development, prototyping, and writing of technical proposals
  • Act as a visionary trusted customer liaison and “product manager” in our most complicated accounts, engaging with mission owners/operators to discover applications for ML, set project expectations, define requirements, and review results
  • Provide ongoing advice to projects around technical solutions, best practices, and efficiencies
  • Maintain awareness of the latest data science methods and technologies and proactively support the adoption of truly beneficial new capabilities
  • Lead trade studies, analyses of alternatives, and assessments of existing systems
  • Mentor data scientists as assigned
  • Work with product development teams to ensure customer functional requirements are fully considered, best practices and differentiated learnings are shared, and a single identity is supported.

What you need to have

  • Post-graduate degree in a technical field
  • Minimum of 15 years of proven successful design, development, and deployment of multiple complex or large-scale operational predictive capabilities, where individual coding contributions were foundational and broadly applicable to success
  • Extensive experience managing or providing senior-level technical subject matter expertise and support for Data Science efforts in support of large, complex projects or programs
  • Ability to describe the data assumptions and processing steps of common machine learning methods, and which methods are appropriate for a variety of use cases
  • Demonstrated expertise with the following:
    • ML Libraries such as Scikit-learn, Pandas, Numpy, PySpark, MXNet, MLib, Weka, Spacy, and FastText
    • ML Platforms such as KNIME, Spark, SageMaker, H2O, and TensorFlow
    • Java, Python, and SQL including related ecosystems and frameworks, like Eclipse and Jupyter Notebook
    • Distributed platforms (i.e. HBase, PrestoDB, Athena, Spark, Kafka) and the cloud (i.e. AWS, GCP, Azure)
    • Performing dimensionality reduction, especially through encoders
    • Selecting features appropriate to a target using bottom-up approaches at scale, like Boruta
    • Automating the assessment of model validity to drive hyper-parameter tuning
    • Transforming data to create stationary inputs, especially when facing heteroskedasticity, periodicity, and trends
  • Able to match customer requirements to appropriate technologies and approaches, describe the pros and cons of each option, and provide optimal recommendations
  • Able to communicate complex technical concepts to external business and technical audiences, and write at a level appropriate for a senior executive
  • Strong problem-solving skills
  • Able to work independently and self-identify tasks
  • Ability to obtain and maintain a TS/SCI clearance

What we'd like you to have

  • Active TS clearance with SCI eligibility
  • Familiarity with time series forecasting and survival modeling
  • Machine Learning specialist certification from AWS, Google Cloud, or Microsoft Azure
  • Experience with distributed databases (like CockroachDB, Impala, Drill, GreenPlum, or Spark SQL), graph databases (especially Neo4j and Neptune), noSQL query engines (especially ElasticSearch, MongoDB, and DynamoDB), streaming/timeseries databases (like InfluxDB), and schema-at-query engines (especially PrestoDB)
  • Experience deploying data and processing in commercial clouds, especially AWS, Google Cloud, and Microsoft Azure

About is a new leader in decision dominance serving the national defense and intelligence communities. The Company delivers high-end capabilities across the data and digital spectrum to deliver information superiority and decision support. provides a comprehensive suite of solutions including artificial intelligence and machine learning, data science, advanced analytics, offensive and defensive cyber, data management, cloud solutions, digital engineering, and systems integration.’s customers, including the U.S. Intelligence Community, Department of Defense, and U.S. Federal Government, rely on its advanced technology solutions to analyze information, manage risk, and solve complex problems, leading to better decision making. Headquartered in Columbia, Maryland, has additional locations in Virginia, Massachusetts, and California. is an Equal Opportunity/Veterans/Disabled Employer. 


Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed