Search by job, company or skills

N

Python , Pyspark

5-10 Years
Save
  • Posted 11 hours ago
  • Be among the first 50 applicants
Early Applicant
Quick Apply

Job Description

  • Design and implement scalable data processing pipelines using PySpark for batch and real-time data processing.
  • Develop Python scripts and libraries to integrate, cleanse, and process data from various data sources (databases, APIs, etc.).
  • Optimize PySpark jobs for performance and resource efficiency.
  • Collaborate with data engineers and analysts to define data transformation and aggregation requirements.
  • Analyze and troubleshoot issues in large-scale distributed data systems.
  • Develop custom functions for data manipulation and aggregation using Spark DataFrame API and RDD API .
  • Perform data analysis and create insightful visualizations for internal stakeholders.
  • Write unit tests, document code, and maintain code quality through peer reviews and best practices.
  • Monitor and optimize the performance of Apache Spark clusters.
  • Work with cloud platforms such as AWS , Azure , or GCP to deploy Spark jobs and other data processing services.
  • Skills and Qualifications:Strong experience with Python programming language and libraries such as Pandas , NumPy , Pytest , etc.
  • Hands-on experience with Apache Spark and PySpark for distributed data processing.
  • Familiarity with data formats such as CSV , JSON , Parquet , Avro , and other big data file formats.
  • Solid understanding of Hadoop ecosystem components (HDFS, YARN, etc.).
  • Proficiency with data wrangling, transformation, and aggregation.
  • Experience with performance tuning, resource management, and debugging Spark applications.
  • Understanding of cloud-based data storage solutions (S3, GCS, etc.) and distributed file systems.
  • Familiarity with data warehousing concepts and tools.
  • Good knowledge of SQL and relational databases (e.g., PostgreSQL, MySQL).
  • Strong problem-solving skills and ability to work with large datasets.
  • Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
  • Preferred Qualifications:Bachelor s or Master s degree in Computer Science, Engineering, or a related field.
  • Experience with additional big data frameworks like Flink , Kafka , Hive , etc.
  • Knowledge of Docker and containerization for deployment.
  • Experience with ETL tools like Airflow or Luigi .
  • Familiarity with machine learning concepts and frameworks (e.g., MLlib , Scikit-learn ).

More Info

About Company

Job ID: 122964763

Similar Jobs

Pune

Skills:

Python BasicsShell ScriptingJsonXmlCsvDocker

Pune, India

Skills:

SQL ServerApache SparkRestful ServicesFastAPIDevops ToolsPythonmicroservices architectureETL processesAPI integrationsAI-assisted coding tools

Pune, India

Skills:

PysparkPostgreSQLKafkaApache AirflowTensorflowPytorchDockerMySQLFlaskOraclePythonAWSAzure DevOpsHadoopSQL ServerSqlBig Data TechnologiesDjangoJenkinsAzure Data FactoryHiveGcpSparkMongoDBAzureKubernetesEtlAWS Step FunctionsGitLab CI

Pune, India

Skills:

Amazon S3Apache SparkAws Ec2Big DataPythonAWS EMR

Pune, India

Skills:

bedrock S3ReactLambdaEc2DockerTerraformPythonAWSApi GatewayJavaRDSDynamodbJenkinsVue.JSIamRest ApisAI Toolsvector databasesPineconeOpenSearchLangGraphEmbeddingsLangChainAgentic AI frameworksLLMsFAISS