Weave Code
Code Weaver
Helps Laravel developers discover, compare, and choose open-source packages. See popularity, security, maintainers, and scores at a glance to make better decisions.
Feedback
Share your thoughts, report bugs, or suggest improvements.
Subject
Message

Databricks Bundle Laravel Package

codibly/databricks-bundle

View on GitHub
Deep Wiki
Context7

Getting Started

Minimal Steps

  1. Installation Run composer require codibly/databricks-bundle in your Symfony3 project root. Ensure guzzlehttp/guzzle is installed (required for HTTP requests).

  2. Enable the Bundle Register new Codibly\DatabricksBundle\CodiblyDatabricksBundle() in app/AppKernel.php under registerBundles().

  3. Configure the Driver Add the bundle configuration to app/config/config.yml:

    codibly_databricks:
        driver: guzzle
        host: "https://<your-databricks-instance>.cloud.databricks.com"
        token: "%env(DATABRICKS_TOKEN)%"  # Store token in .env
    
  4. First Use Case Inject the DatabricksClient service into a controller/service and call methods like:

    use Codibly\DatabricksBundle\Client\DatabricksClientInterface;
    
    class MyController extends Controller
    {
        public function runJob(DatabricksClientInterface $client)
        {
            $response = $client->jobs()->runNow(123); // Replace 123 with job ID
            return new Response(json_encode($response));
        }
    }
    

Where to Look First

  • Documentation: Check Resources/doc/index.md in the bundle or Symfony’s hosted docs.
  • Client API: Explore src/Client/ for available methods (e.g., JobsClient, ClustersClient).
  • Configuration: Review Resources/config/services.yml for service definitions.

Implementation Patterns

Core Workflows

  1. Job Management

    • Submit Jobs: Use jobs()->create() with a JSON payload defining the job (e.g., notebook ID, parameters).
      $payload = [
          'new_cluster' => ['spark_version' => '7.3.x-scala2.11'],
          'notebook_task' => ['notebook_path' => '/path/to/notebook']
      ];
      $client->jobs()->create($payload);
      
    • Monitor Jobs: Poll job status via jobs()->get(123) or listen to webhooks (implement DatabricksWebhookListener).
  2. Cluster Operations

    • Start/Stop Clusters: Use clusters()->create() and clusters()->delete().
      $cluster = $client->clusters()->create(['spark_version' => '7.3.x-scala2.11']);
      $client->clusters()->delete($cluster['cluster_id']);
      
  3. Data Operations

    • Run SQL Queries: Use sql()->statements() for ad-hoc queries.
      $result = $client->sql()->statements()->run(
          'SELECT * FROM my_table',
          123  // Warehouse ID
      );
      
  4. Webhooks

    • Listen for Events: Extend DatabricksWebhookListener and bind it to the databricks.webhook event in services.yml:
      services:
          App\EventListener\DatabricksWebhookListener:
              tags:
                  - { name: 'kernel.event_listener', event: 'databricks.webhook', method: 'onWebhook' }
      

Integration Tips

  • Environment Variables: Store sensitive data (e.g., token) in .env and reference via %env(DATABRICKS_TOKEN)%.
  • Retry Logic: Wrap API calls in a retry mechanism (e.g., using GuzzleHttp\HandlerStack) for transient failures.
  • Logging: Decorate the DatabricksClientInterface to log requests/responses:
    $client->setLogger($logger); // If supported; otherwise, log manually.
    
  • Async Jobs: Use Symfony’s Messenger component to queue job submissions for background processing.

Gotchas and Tips

Pitfalls

  1. Deprecated Bundle

    • Last release: 2018-01-31. Verify compatibility with Symfony 3.x (may need patches for newer PHP/Guzzle versions).
    • Mitigation: Fork the repo and update dependencies (e.g., guzzlehttp/guzzle:^7.0) if critical features are missing.
  2. Token Management

    • Hardcoding tokens in config.yml is insecure. Always use .env and restrict file permissions (chmod 600 .env).
    • Tip: Rotate tokens periodically and implement token revocation logic.
  3. Rate Limiting

    • Databricks enforces rate limits. Handle 429 Too Many Requests by:
      • Adding delays between requests.
      • Using exponential backoff (e.g., via GuzzleHttp\RetryMiddleware).
  4. Webhook Delays

    • Webhook events may arrive out of order or be delayed. Use job_id and run_id to deduplicate events in your listener.
  5. Cluster Auto-Termination

    • Clusters may terminate unexpectedly (e.g., due to inactivity). Implement health checks or auto-restart logic:
      if ($client->clusters()->get($clusterId)['state'] === 'TERMINATED') {
          $client->clusters()->create($clusterConfig);
      }
      

Debugging

  • Enable Guzzle Debugging: Add a middleware to log requests:
    $stack = HandlerStack::create();
    $stack->push(Middleware::tap(function ($request) {
        \Log::debug('Databricks Request:', ['url' => $request->getUri(), 'body' => $request->getBody()]);
    }));
    $client->setHandler($stack);
    
  • Validate API Responses: Databricks may return 200 OK with errors in the response body. Always check:
    $response = $client->jobs()->runNow(123);
    if (isset($response['errors'])) {
        throw new \RuntimeException($response['errors'][0]['message']);
    }
    

Extension Points

  1. Custom Clients Extend Codibly\DatabricksBundle\Client\AbstractClient to add domain-specific methods:

    class MyCustomClient extends AbstractClient {
        public function customMethod() {
            return $this->request('POST', '/api/2.0/custom-endpoint', ['data' => 'value']);
        }
    }
    

    Register the service in services.yml:

    services:
        App\Client\MyCustomClient:
            arguments: ['@codibly_databricks.client']
    
  2. Event Dispatching Trigger Symfony events for critical actions (e.g., job completion):

    $dispatcher->dispatch(new DatabricksJobEvent($jobId, $runId), 'databricks.job.completed');
    
  3. Mocking for Tests Use Guzzle’s MockHandler to simulate API responses:

    $mock = new MockHandler([
        new Response(200, [], json_encode(['run_id' => 456])),
    ]);
    $client->setHandler(new HandlerStack($mock));
    

Configuration Quirks

  • Driver Overrides: The bundle defaults to guzzle, but you can swap drivers by implementing Codibly\DatabricksBundle\Client\DriverInterface and updating services.yml.
  • Proxy Support: Configure Guzzle to use a proxy via config.yml:
    codibly_databricks:
        guzzle_options:
            proxy: 'http://proxy.example.com:8080'
    
Weaver

How can I help you explore Laravel packages today?

Conversation history is not saved when not logged in.
Prompt
Add packages to context
No packages found.
jayeshmepani/jpl-moshier-ephemeris-php
elnasnato/laraliveui
labrodev/rest-sdk
sampaui/sampaui
babelqueue/php-sdk
facebook/capi-param-builder-php
babelqueue/symfony
hamzi/corewatch
minionfactory/raw-hydrator
hexters/coinpayment
rjcodes/rjcms
act-training/laravel-permissions-manager
alimarchal/laravel-chart-of-accounts
babenkoivan/elastic-scout-driver
mkwebdesign/filament-watchdog-v5
renatomarinho/laravel-page-speed
zedmagdy/filament-business-hours
renatovdemoura/blade-elements-ui
devgeek/beacon-admin
benjamin-rqt/data-watcher-bundle