000-421 - IBM Certified Solution Developer - InfoSphere DataStage v8.5
Go back to
IBM
Example Questions
Which two statements are true about the usage of scratch disk? (Choose two.)
You can define multiple scratch disk spaces to distribute disk I/O.
The parallel framework uses the disk space specified in the scratch disk setting to buffer virtual data set records.
A customer requires that a single output file generated by a parallel job be created in sort order. Which two job designs would achieve this goal? (Choose two.)
Specify both partition and sort key columns on the input to the targetSequentialFile stage.
Set the "Execution Mode" in theAdvanced stage properties tab of the parallel Sort stage to execute sequentially.
You are running a DataStage job using a 2-node configuration file. How can a fixed-width single sequential file be read in parallel? (Choose two.)
Set the Execution Mode to theSequentialFile stage to "Parallel".
Set the "Number of Readers per Node" optional property to a value greater than 1.
You are processing groups of rows in a Transformer. The first row in each group contains "1" in the Flag column and "0" in the remaining rows of the group. At the end of each group you want to sum and output the QTY column values. Which three techniques will enable you to retrieve the sum of the last group? (Choose three.)
Within each group sort the Flag column in ascending order. Output the sum each time you process the row with a "1" in the Flag column.
Output a running total for each group for each row. Follow the Transformer stage by an Aggregator stage. Take the MAX of the QTY column for each group.
Output the sum that you generated up to the previous row each time youprocess a row with a "1" in the Flag column. Use the LastRow() function to determine when the last group is done.
What role must a user have to delete shared metadata assets from the repository?
Information Analyzer User
The number of File Set data files created depends upon what three items? (Choose three.)
File system limitations.
Number of processing nodes in the default node pool.
Number of disks in the export or default disk pool connected to each processing node in the default node pool
Using a DB2 for z/OS source database, a 200 million row source table with 30 million distinct values must be aggregated to calculate the average value of two column attributes. What would provide optimal performance while satisfying the business requirements?
Select all source rows using a DB2 API stage.Aggregate using a Sort Aggregator.
What is the correct method to process a file containing multiple record types using a Complex Flat File stage?
Define record definitions on the Constraints tab of the Complex Flat File stage.
Configuring the weighting column of an Aggregator stage affects which three options. (Choose three.)
Sum
Sum of Weights
Percent Coefficient of Variation
There is a requirement to transfer a large file using an FTP Enterprise stage. How can you minimize processing time when a transfer failure occurs?
Manually split the file into multiple files and specifyrestartable mode on a transfer.
You write a job control routine to control a sequence of jobs running as a single unit of work. What are three valid job status types you can trap for? (Choose three.)
DSJS.RUNOK
DSJS.STOPPED
DSJS.RUNFAILED
You have run ten instances of the same job the previous evening. You want to examine the job logs for all instances but can only find five of them. How can you avoid this in the future for this job?
Set the $APT_AUTOLOG_PURGE environment variable to False.
Which environment variable determines where the temporary scores are stored?
APT_SAVE_SCORE
You are assigned to correct a job from another developer. The job contains 20 stages sourcing data from two Data Sets and many sequential files. The annotation in the job indicates who wrote the job and when, not the objective of the job. All link and stage names use the default names. One of the output columns has an incorrect value which should have been obtained using a lookup. What could the original developer have done to make this task easier for maintenance purposes?
Named all stages and links based on what they do.
Which of the following is not an ODBC connector property?
Remote server
Which three statements are true about stage variables in a Transformer Stage? (Choose three.)
Stage variables can be set to NULL.
Varchar stage variables can be initialized with spaces.
The expression executed for a stage variable can refer to a stage variable which is executed later.
When you run a parallel job, any error messages and warnings are written to the job log and can be viewed from the Director client. What two levels of message handlers are there? (Choose two.)
job level
record level
A star schema data warehouse consists of four dimension tables and one fact table. How many Slowly Changing Dimensions (SCD) stages will you need in your jobs to update the star schema tables?
four
You are experiencing performance issues for a given job. You are assigned the task of understanding what is happening at run time for that job. What are the first two steps you should take to understand the job performance issues? (Choose two.)
Review the objectives of the job.
Replace Transformer stages with custom operators.
Which two steps are required to change from a normal lookup to a sparse lookup in an ODBC Enterprise stage? (Choose two.)
Change the lookup option in the stage properties to "Sparse".
Establish a relationship between the key fieldcolumn in the source stage with the database table field.
Which two statements are true regarding access to a MQ queue? (Choose two.)
MQ stage connects to a queue manager in Client mode only.
MQ connector stage is capable of connecting to Queue manager in both Server and Client mode.
A job reads from a sequential file using a SequentialFile stage with option "number of readers" set to 2. This data goes to a Transformer stage and then is written to a dataset using the DataSet stage. The default configuration file has three nodes. The environment variable $APT_DISABLE_COMBINATION is set to "True" and partitioning is set to "Auto". How many processes will be created?
5
You set environment variable $APT_ORACLE_LOAD_OPTIONS=PTIONS(DIRECT=TRUE, PARALLEL=TRUE)?for loading index organized tables.You set environment variable $APT_ORACLE_LOAD_OPTIONS=?PTIONS(DIRECT=TRUE, PARALLEL=TRUE)?for loading index organized tables. Which statement is accurate regarding the resulting effect of this environment variable setting?
Oracle load will fail when executed.
You are required to use a Make Vector stage in your job. What three requirements must be met in order to use this stage? (Choose three.)
Input columns must all be of the same data type.
Input columns must form a numeric sequence.
All columns are combined into a vector of the same length as the number of columns.
The derivation for a stage variable is: Upcase(input_column1) : ' ' : Upcase(input_column2). Suppose that input_column1 contains a NULL value. Which behavior is expected?
NULL is written to the target stage variable.
How must the input data set be organized for input into the join stage? (Choose two.)
Unsorted
Sorted in ascending order
Which two tasks can the Slowly Changing Dimensions (SCD) stage perform? (Choose two.)
Look up whether a record with a matching business key value exists in a dimension table. If it does not, retrieve a new surrogate key value and insert a new row into the dimension table.
Look up whether a record with a matching business key value exists in a dimension table. If it does, mark the record as not-current, and generate a new record with new values for selected fields.
A DataStage job uses an Inner Join to combine data from two source parallel datasets that were written to disk in sort order based on the join key columns. Which two methods could be used to dramatically improve performance of this job? (Choose two.)
Disable job monitoring.
Add a parallel sort stage before each Join input, specifying the "Don't Sort, Previously Grouped" sort key mode for each key.
Which Oracle Enterprise stage read property can be set using -dboptions to tune job performance?
arraysize
Records in a source file must be copied to multiple output streams for further processing. Which two conditions would require the use of a Transformer stage instead of a Copy stage? (Choose two.)
Renaming one or more output columns.
Directing selected output records down one output link rather than another.
How is DataStage Table Metadata shared among DataStage projects?
Use the "Shared Table Creation Wizard" to create a copy of the table in the shared repository.
In which two situations is it appropriate to use a Sparse Lookup? (Choose two.)
When reference data is significantly larger than the streaming data (100:1).
When invoking a stored procedure within a database per row in the streaming link.
What stage allows for more than one reject link?
Lookup stage
Your job design calls for using a target ODBC Connector stage. The target database is found on a remote server. The target table you are writing into contains a single column primary key. What are the three "Write mode" properties that allow the possibility of multiple SQL actions? (Choose three.)
Insert then update
Delete then insert
Update then insert
Input rows to a Transformer contain a product name field and a field with a list of colors the product can be ordered with. The colors are separated by the pipe character (|). Here is an example of an input row: "Shirt"| ....| "Red, Blue, Black"|... For each input row, you want to output multiple output rows, one for each color in the list. For the above example input row, three rows are to be output, one per color: "Shirt" ... "Red", "Shirt" ... "Blue", "Shirt" ... "Black". Which three operations will you need to accomplish this? (Choose three.)
Use theCount() function over the ColorList field to determine the number of loop iterations.
Use the @ITERATION variable to determine which color in theColorList field to extract using the Field function.
Specify the following loop condition: @ITERATION <= n, where n is a stage variable initialized with number of loop iterations.
You are asked to identify the jobs and shared containers that use the ADDRESS column. The Size has changed from 50 to 120 characters in the source system so the jobs must be updated with the new size. What feature of the Designer will locate where a column is used in a DataStage project?
From Designer Tools open an Advanced Find dialog. Select Columns from the Type list and enter the column name in the Name To Find field.
How are Shared Table definitions created from the DataStage Client?
Using the "Shared Table Creation Wizard" from theDataStage Client.
What are two fundamental functions of the Information Server Source Code Integration based on the Eclipse Team framework? (Choose two.)
Send to Source Code Control Workspace
Replace from Source Code Control Workspace
Rows of data going into a Transformer stage are sorted and hash partitioned by the Input.Product column. Using stage variables, how can you determine when a new row is the first of a new group of Product rows?
Create a stage variable namedsv_Product and follow it by a second stage variable named sv_IsNewProduct . Map the Input.Product column to sv_Product.The derivation for sv_IsNewProduct is: IF Input.Product = sv_Product THEN "YES" ELSE "NO".
Which three of the following options does the dsjob command have? (Choose three.)
Stopping a job
Specifying an appropriate log file
Listing projects, jobs, stages, links, and parameters
You are responsible for deploying objects into your customers production environment. To ensure the stability of the production system the customer does not permit compilers on production machines. They have also protected the project and only development machines have the required compiler. What two options will allow jobs with a parallel transformer to execute in the customers production machines? (Choose two.)
Export the jobs with Information Server Manager with the executables.
Create a package with Information Server Manager and select the option to include executables.
In which two situations would you use the Web Services Client stage? (Choose two.)
You do not need both input and output links in a single web service operation.
You need the Web service to act as either a data source or a data target during an operation.
Which two statements are correct when referring to an Aggregator Stage? (Choose two.)
Use Hash method for a limited number of distinct key values.
Use Hash method with a large number of distinct key-column values.
Which three methods can be used to import metadata from a Web Services Description Language (WSDL) document? (Choose three.)
XML Table Definitions
Web Services WSDL Definitions
Web Service Function Definitions
The purchase history record contains CustID, ProductID, ProductType and TotalAmount. You need to retain the record of greatest TotalAmount per CustID and ProductType using RemoveDuplicate stage. Which two statements accomplish this requirement? (Choose two.)
Hash-partition onCustID; Sort on CustID, ProductType and TotalAmount.
Hash-partition onCustID and ProductType; Sort on CustID, ProductType and TotalAmount.
A client needs to process a flat file where a set of values in the import data columns should be treated as Null. What is the best way to handle multiple Null values using a Sequential File stage?
On the Output Link format tab, specify a separator character in the dependent Null field value separator property and then use this separator to delimit the null values in the Null field value property.
You are setting up project defaults. Which three items can be set in DataStage Administrator? (Choose three.)
default for compile options
defaults for environment variables
default for Runtime Column Propagation
What are two advantages of using Runtime Column Propagation (RCP)? (Choose two.)
Only columns used in the data flow need to be defined.
Columns not specifically used in the flow are propagated as if they were.
You would like to pass values into parameters that will be used in a variety of downstream activity stages within a job sequence. What are three valid ways to do this? (Choose three.)
Use local parameters.
Use environment variables.
Use theUserVariablesActivity Stage to populate the local parameters from an outside source such as a file.
You have a parallel shared container that is used by other parallel jobs within your project. Part of the logic in this shared container has been changed. Which two statements are true regarding this change to the parallel shared container? (Choose two.)
Jobs using this parallel shared container need to be re-compiled.
Jobs using this parallel shared container need to be re-compiled only when the metadata of the container is changed.