Handling large data in an ETL tool has always been a painstaking process. More specifically, when the data transfer involves a powerful CRM tool such as Salesforce where the application can carry humongous data with a million records. Reading or processing this huge data can put any powerful data integration tool to test.
The data to be processed should not only be carefully read but also be processed seamlessly, thereby producing the required reliable output. Adeptia Connect has a proven capability especially when it comes to handling big size data. Adeptia Connect flexibly interacts with various connectors to transform data for its companies, Salesforce being one of the in-demand connectors.
The robust engine of Adeptia Connect facilitates users to break the data from Salesforce in chunks and process it for the desired output. For reading large volume data from Salesforce in chunks, the user should use Bulk API: PK Chunking and it is supported for the objects such as Account, Campaign, CampaignMember, Case, CaseHistory, Contact, Event, EventRelation, Lead, LoginHistory, Opportunity, Task, User, and other custom objects in Salesforce.
For example, a user uses Adeptia Connect to pull information (i.e. Object IDs) from Salesforce and dumping into the local database instances. By using this dumped data the operation of reading large data can be carried out.
Let’s assume a user enables PK Chunking for the following query on an Account table with 10,000,000 records.
1. SELECT Name FROM Account
For example, the chunk size is of 250,000 records and a starting record ID is 001300000000000, the query is split into 40 queries where each query is denoted as a separate batch.
1. SELECT Name FROM Account WHERE Id >= 001300000000000 AND Id < 00130000000132G
2. SELECT Name FROM Account WHERE Id >= 00130000000132G AND Id < 00130000000264W
3. SELECT Name FROM Account WHERE Id >= 00130000000264W AND Id < 00130000000396m
5. SELECT Name FROM Account WHERE Id >= 00130000000euQ4 AND Id < 00130000000fxSK
Each query then is executed on a chunk of 250,000 records that are specified by base-62 ID boundaries.
Though PK Chunking is designed for extracting data from the tables, it can also be used for filtering the queries. As records can be filtered from each query’s results, the number of results for each chunk can be less than the actual chunk size. Also, the IDs of the deleted records are counted when the query is split into chunks, but the records are omitted from the results. Therefore, if the soft-deleted records fall within a given chunk’s ID boundaries, the number of returned results is less than the chunk size.
The default chunk size is 100,000 and the maximum size remains at 250,000. The default starting ID is the first record in the table. However, a user can specify a different starting ID to restart a job that had failed between chunked batches.
When a query is successfully chunked, the status of the original batch is marked as NOT_PROCESSED. If the chunking fails, the status shown is FAILED, but any chunked batches that were successfully queued during the chunking are processed as normal. When the original batch’s status is changed to NOT_PROCESSED, user needs to monitor the subsequent batches. The user can retrieve the results from each subsequent batch after it has been completed, and can now safely close the job.
Adeptia recommends that user should enable PK chunking while querying tables with more than 10 million records or when a bulk query consistently times out. However, the effectiveness of PK Chunking depends on the specifics of the query and the queried data.
Note: Refer to this link for detailed description related to this Bulk API: PK Chunking.
Adeptia’s adept capabilities in dealing the data from Salesforce has made it a leader when it comes to managing data using Salesforce as a connector.