Tuesday, January 15, 2013

Datastage Job Report Script


@echo off
:: Required Components:
::     dsjob.exe (Windows Version)
:: -----------------------------------------------------------------
:: Command Line Parameters:
:: 1. Host
:: 2. User
:: 3. Password
:: 4. Project
:: -----------------------------------------------------------------
:: Ensure that everything that are set here are not permanent.
:: -----------------------------------------------------------------
    SETLOCAL
:: -----------------------------------------------------------------
:: Test for command line parameters.
:: -----------------------------------------------------------------
    IF "%1"=="" GOTO Syntax
    IF "%2"=="" GOTO Syntax
    IF "%3"=="" GOTO Syntax
    IF "%4"=="" GOTO Syntax
:: -----------------------------------------------------------------
:: Set paramters.
:: -----------------------------------------------------------------
    SET Host=%1
    SET User=%2
    SET Password=%3
    SET Project=%4
:: -----------------------------------------------------------------
:: Hard-coded values.  Dependent on each computer.
:: -----------------------------------------------------------------
    SET Designer=C:\Progra~1\Ascential\DataStage\dsdesign.exe
    SET DsJob=C:\Progra~1\Ascential\DataStage\dsjob.exe
    SET JobList=DsJobReportList.txt
    SET ProjectList=ProjectList.txt
    SET DSLog=DsJobReportLog
    SET BackupDir=c:\DataStage\KimD\Jobs\%Project%
:: -----------------------------------------------------------------
:: Get the current Date
:: -----------------------------------------------------------------
    FOR /f "tokens=2-4 delims=/ " %%a in ('DATE/T') do SET DsxDate=%%c%%a%%b
:: -----------------------------------------------------------------
:: Get the current Time
:: -----------------------------------------------------------------
    FOR /f "tokens=1* delims=:" %%a in ('ECHO.^|TIME^|FINDSTR "[0-9]"') do (SET DsxTime=%%b)
:: -----------------------------------------------------------------
:: Set delimeters so that current time can be broken down into components
:: then execute FOR loop to parse the DsxTime variable into Hr/Min/Sec/Hun.
:: -----------------------------------------------------------------
    SET delim1=%DsxTime:~3,1%
    SET delim2=%DsxTime:~9,1%
    FOR /f "tokens=1-4 delims=%delim1%%delim2% " %%a in ('echo %DsxTime%') do (
        set DsxHr=%%a
        set DsxMin=%%b
        set DsxSec=%%c
        set DsxHun=%%d
    )
:: -----------------------------------------------------------------
:: If provided directory is missing an ending \, append it.
:: Validate %BackupDir%'s existance.
:: -----------------------------------------------------------------
    if exist %BackupDir%\ set BackupDir=%BackupDir%\
    if NOT exist %BackupDir% GOTO BadMain
:: -----------------------------------------------------------------
:: Set the log file name to improve readability of code.
    SET LogFileName=%BackupDir%%DSLog%-%DsxDate%-%DsxHr%_%DsxMin%_%DsxSec%.log
:: -----------------------------------------------------------------
:: Announce to log of this program's run.
:: -----------------------------------------------------------------
    ECHO. > %LogFileName%
    ECHO DsJobReport ran on %DsxDate% %DsxHr%:%DsxMin%:%DsxSec% with the following parameters >> %LogFileName%
    ECHO Host=%Host% >> %LogFileName%
    ECHO User=%User% >> %LogFileName%
    ECHO BackupDir=%BackupDir%%DsxDate%\ >> %LogFileName%
    ECHO Designer=%Designer% >> %LogFileName%
    ECHO DsJob=%DsJob% >> %LogFileName%
    ECHO ProjectList=%ProjectList% >> %LogFileName%
    ECHO JobList=%JobList% >> %LogFileName%
    ECHO DSLog=%DSLog% >> %LogFileName%
    ECHO. >> %LogFileName%
:: -----------------------------------------------------------------
:: Pull project information (and validate parameters).
:: -----------------------------------------------------------------
    %DsJob% -server %Host% -user %User% -password %Password% -lprojects > %ProjectList%
    IF NOT %ERRORLEVEL%==0 GOTO BadParam
:: -----------------------------------------------------------------
:: Pull job list
:: -----------------------------------------------------------------
    %DsJob% -server %Host% -user %User% -password %Password% -ljobs %Project% > %JobList%
    IF NOT %ERRORLEVEL%==0 GOTO BadParam
:: -----------------------------------------------------------------
:: Report valid projects to log.
:: -----------------------------------------------------------------
    ECHO Jobs found on %Host%: >> %LogFileName%
    type %JobList% >> %LogFileName%
:: -----------------------------------------------------------------
:: Create subdirectory within Backup Directory
:: Validate ability to create %BackupDir%%DsxDate%\.
:: -----------------------------------------------------------------
    if exist %BackupDir%%DsxDate%\ GOTO DirCont
    ECHO ***  Creating: %BackupDir%%DsxDate%\
    mkdir %BackupDir%%DsxDate%\
    IF NOT %ERRORLEVEL%==0 GOTO BadDir
    ECHO. >> %LogFileName%
    ECHO ***  Created: %BackupDir%%DsxDate%\ >> %LogFileName%
    ECHO. >> %LogFileName%
:: -----------------------------------------------------------------
:DirCont
    ECHO.
    ECHO Saving Job Report on %Host% ...
    ECHO.
:: -----------------------------------------------------------------
:: Begin Backup Loop
::
:: NOTE:  %ERRORLEVEL% does not work for some reason.
:: -----------------------------------------------------------------
    for /F "tokens=1" %%i in (%JobList%) do (
        ECHO Saving Job Report on Job %%i
        ECHO %Designer% /H=%Host% /U=%User% /P=%Password% %Project% %%i /R /RP=%BackupDir%%DsxDate% /RX >> %LogFileName%
        %Designer% /H=%Host% /U=%User% /P=%Password% %Project% %%i /R /RP=%BackupDir%%DsxDate% /RX >> %LogFileName%
        IF NOT %ERRORLEVEL%==0 GOTO ProjFail

        ECHO. >> %LogFileName%
        ECHO *** Completed Job Report for Job: %%i on Host: %Host% Project: %Project% >> %LogFileName%
        ECHO     to File: %BackupDir%%DsxDate%\%%i.html >> %LogFileName%
        ECHO. >> %LogFileName%
    )
:: -----------------------------------------------------------------
::    ECHO *** Export completed successfully for projects:
::    type %TempFile%
:: -----------------------------------------------------------------
    GOTO EXITPT
:: -----------------------------------------------------------------
:: a job failed to be exported.
:: -----------------------------------------------------------------
:ProjFail
    ECHO.
    ECHO *** ERROR:  Failed to Export Job: %%i on Host: %Host% on Project: %Project%
    ECHO.
    ECHO Please ensure that nobody else is accessing this server while you
    ECHO are running this backup script.
    ECHO.

    ECHO. >> %LogFileName%
    ECHO *** ERROR:  Failed to Export Job: %%i on Host: %Host% on Project: %Project% >> %LogFileName%
    ECHO. >> %LogFileName%
GOTO EXITPT
:: -----------------------------------------------------------------
:: Report that paramters are not valid on screen and log file.
:: Note: Password are not reported for better security.
:: -----------------------------------------------------------------
:BadParam
    echo.
    echo Invalid parameters - Unable to access Server.
    echo.
    echo. >> %LogFileName%
    echo Invalid parameters - Unable to access Server. >> %LogFileName%
    echo. >> %LogFileName%
GOTO Syntax
:: -----------------------------------------------------------------
:: Report that directory is non-existant.
:: -----------------------------------------------------------------
:BadMain
    echo.
    echo Bad/Non-existing directory: %BackupDir%
    echo.
    echo Please ensure that you have permission to access/create directories
    echo and files.  Also ensure that directory listed exists.
    echo.
    echo. >> %LogFileName%
    echo Bad/Non-existing directory: %BackupDir% >> %LogFileName%
    echo. >> %LogFileName%
GOTO EXITPT
:: -----------------------------------------------------------------
:: Report that program does not have privs to create directory.
:: -----------------------------------------------------------------
:BadDir
    echo.
    echo Unable to create subdirectory: %BackupDir%%DsxDate%\
    echo.
    echo Please ensure that you have permission to access/create directories
    echo and files.
    echo.
    echo. >> %LogFileName%
    echo Unable to create subdirectory: %BackupDir%%DsxDate%\ >> %LogFileName%
    echo. >> %LogFileName%
GOTO EXITPT
:: -----------------------------------------------------------------
:: Report proper syntax usage.
:: -----------------------------------------------------------------
GOTO ENDPOINT
:: -----------------------------------------------------------------
:EXITPT
    ECHO. >> %LogFileName%
:: -----------------------------------------------------------------
:ENDPOINT
    ENDLOCAL

Tuesday, January 8, 2013

Ton of Data Warehousing and Datastage Interview Questions


·         Types of Stages in DS? Explain with Examples
·         What are active stages and passive stages?
·         Can you filter data in hashed file? (No)
·         Difference between sequential and hashed file?
·         How do you populate time dimension?
·         Can we use target hashed file as lookup? (Yes)
·         What is Merge Stage?
·         What is Job Sequencer?
·         What are stages in sequences?
·         How do you pass parameters?
·         What parameters you used in your project?
·         What are log tables?
·         What is job controlling?
·         Facts and dimension tables?
·         Confirmed dimensions?
·         Difference between OLTP and OLAP?
·         Difference between star schema and snow flake schema?
·         What are hierarchies? Examples?
·         What are materialized views?
·         What is aggregation?
·         What is surrogate key? Is it used for both fact and dimension tables?
·         Why do you go for oracle sequence generator rather than datastage routine?
·         Flow of data in datastage?
·         Initial loading and incremental loading?
·         What is SCD? Types?
·         How do you develop SCD type2 in your project?
·         How do you load dimension data and fact data? Which is first?
·         Difference between oracle function and procedure?
·         Difference between unique and primary key?
·         Difference between union and union all?
·         What is minus operator?
·         What is audit table?
·         If there is a large hash file and a smaller oracle table and if you are looking up from
·         transformer in different jobs which will be faster?
·         Tell me about SCD’s?
·         How did you implement SCD in your project?
·         What are derivations in transformer?
·         How do you use surrogate key in reporting?
·         Logs view in datastage, logs in Informatica which is clear?
·         How does pivot stage work?
·         What is surrogate key? What is the importance of it? How did you implement it in your
·         project?
·         Totally how many jobs did you developed and how many lookups did you use totally?
·         How do constraint in transformer work?
·         How will you declare a constraint in datastage?
·         How will you handle rejected data?
·         Give me some performance tips in datastage?
·         Can we use sequential file as a lookup?
·         How does hash file stage lookup?
·         Why can’t we use sequential file as a lookup?
·         What is data warehouse?
·         What is ‘Star-Schema’?
·         What is ‘Snowflake-Schema’?
·         What is difference between Star-Schema and Snowflake-Schema?
·         What is mean by surrogate key?
·         What is ‘Conformed Dimension’?
·         What is Factless Fact Table?
·         When will we use connected and unconnected lookup?
·         Which cache supports connected and unconnected lookup?
·         What is the difference between SCD Type2 and SCD Type3?
·         What is difference between data mart and data warehouse?
·         What is composite key?
·         What is surrogate key? When you will go for it?
·         What is dimensional modeling?
·         What are SCD and SGT? Difference between them? Example of SGT from your project.
·         How do you import your source and targets? What are the types of sources and targets?
·         What is Active Stages and Passive Stages means in datastage?
·         What is difference between Informatica and DataStage? Which do you think is best?
·         What are the stages you used in your project?
·         What do you mean by parallel processing?
·         What is difference between Merge Stage and Join Stage?
·         What is difference between Copy Stage and Transformer Stage?
·         What is difference between ODBC Stage and OCI Stage?
·         What is difference between Lookup Stage and Join Stage?
·         What is difference between Change Capture Stage and Difference Stage?
·         What is difference between Hashed file and Sequential File?
·         What are different Joins used in Join Stage?
·         How you decide when to go for join stage and lookup stage?
·         What is partition key? Which key is used in round robin partition?
·         How do you handle SCD in datastage?
·         What are Change Capture Stage and Change Apply Stages?
·         How many streams to the transformer you can give?
·         What is primary link and reference link?
·         What is routine? What is before and after subroutines? These are run after/before job or
·         stage?
·         What is Config File? Each job having its own config file or one is needed?
·         What is Node?
·         What is IPC Stage? What it increase performance?
·         What is Sequential buffer?
·         What are Link Partioner and Link Collector?
·         What are the performance tunning you have done in your project?
·         Did you done scheduling? How? Can you schedule a job at the every end date of month?
·         How?
·         What is job sequence? Had you run any jobs?
·         What is status view? Why you clear this? If you clear the status view what internally
·         done?
·         What is hashed file? What are the types of hashed file? Which you use? What is default?
·         What is main advantage of hashed file? Difference between them. (static and dynamic)
·         What are containers? Give example from your project.
·         What are parameters and parameter file?
·         How do you convert columns to rows and rows to columns in datastage? (Using Pivot
·         Stage).
·         What is Pivot Stage?
·         What is execution flow of constraints, derivations and variables in transformer stage?
·         What are these?
·         How do you eliminate duplicates in datastage? Can you use hash file for it?
·         If 1st and 8th record is duplicate then which will be skipped? Can you configure it?
·         How do you import and export datastage jobs? What is the file extension? (See each
·         component while importing and exporting).
·         How do you rate yourself in DataStage?
·         Explain DataStage Architecture?
·         What is repository? What are the repository items?
·         What is difference between routine and transform?
·         When you write the routines?
·         What is the complex situation you faced in DataStage?
·         System variable, what are system variables used your project?
·         What are the different datastage functions used in your project?
·         Difference between star schema and snow flake schema?
·         What is confirmed, degenerated and junk dimension?
·         What are confirmed facts?
·         Different type of facts and their examples?
·         What are approaches in developing data warehouse?
·         Different types of hashed files?
·         What are routines and transforms? How you used in your project?
·         Difference between Data Mart and Data Warehouse?
·         What is surrogate key? How do you generate it?
·         What are environment variables and global variables?
·         How do you improve the performance of the job?
·         What is SCD? How do you developed SCD type1 and SCD type2?
·         How do you generate surrogate key in datastage?
·         What is job sequence?
·         What are plug-ins?
·         How much data you can get every day?
·         What is the biggest table and size in your schema or in your project?
·         What is the size of data warehouse (by loading data)?
·         How do you improve the performance of the hashed file?
·         What is IPC Stage?
·         What are the different types of stages and used in your project?
·         What are the operations you can do in IPC Stage and transformer stage?
·         What is merge stage? How do you merge two flat files?
·         What is difference between ODBC and ORACLE OCI stage?
·         What difference between sequential file and hashed file?
·         Can you use sequential file as source to hashed file? Have you done it? What error it will
·         give?
·         Why hashed file improve the performance?
·         Can aggregator and transformer stage used for sorting data? How
·         How many input links you can give to transformer?
·         Definition of Slowly Changing Dimensions? Types?
·         What is iconv and oconv functions?
·         What is the advantage of using OCI stage as compared to ODBC stage
·         What is the difference between Interprocess and inprocess? Which one is the best?

Sunday, January 6, 2013

Informatica Interview Questions


1. What is a staging area? Do we need it? What is the purpose of a staging area?
2. What is a three-tier data warehouse?
3. What are the various methods of getting incremental records or delta records from the source systems?
4. What are the various tools? - Name a few
5. What is latest version of Power Center / Power Mart?
6. What is the difference between Power Center & Power Mart?
7. What are the various transformation available?
8. What are the modules in Power Mart?
9. What are active transformation / Passive transformations?
10. What are the different Lookup methods used in Informatica?
11. Can Informatica load heterogeneous targets from heterogeneous sources?
12. How do we call shell scripts from informatica?
13. What is Informatica Metadata and where is it stored?
14. What is a mapping, session, worklet, workflow, mapplet?
15. How can we use mapping variables in Informatica? Where do we use them?
16. What are parameter files? Where do we use them?
17. Can we override a native sql query within Informatica? Where do we do it? How do we do it?
18. Can we use procedural logic inside Infromatica? If yes how, if now how can we use external procedural logic in informatica?
19. Do we need an ETL tool? When do we go for the tools in the market?
20. How do we extract SAP data Using Informatica? What is ABAP? What are IDOCS?
21. How to determine what records to extract?
*                     Timestamps
*                     Deletes are logical with timestamped deletes
*                     Triggers on source system tables (Generally we dont do this as this decreases the source system efficiency)
*                     Application Integration Software TIBCO , MQSERIES
*                     File Compares (least method)
*                     Snapshots in Oracle(daily)
*                     Oracle Streams
22. What is Full load & Incremental or Refresh load?
23. Techniques of Error Handling - Ignore , Rejecting bad records to a flat file , loading the records and reviewing them (default values)
24. What are snapshots? What are materialized views & where do we use them? What is a materialized view log?
25. What is partitioning? What are the types of partitioning?
26. When do we Analyze the tables? How do we do it?
27. Compare ETL & Manual development?1. What is a staging area? Do we need it? What is the purpose of a staging area?
2. What is a three-tier data warehouse?
3. What are the various methods of getting incremental records or delta records from the source systems?
4. What are the various tools? - Name a few
5. What is latest version of Power Center / Power Mart?
6. What is the difference between Power Center & Power Mart?
7. What are the various transformation available?
8. What are the modules in Power Mart?
9. What are active transformation / Passive transformations?
10. What are the different Lookup methods used in Informatica?
11. Can Informatica load heterogeneous targets from heterogeneous sources?
12. How do we call shell scripts from informatica?
13. What is Informatica Metadata and where is it stored?
14. What is a mapping, session, worklet, workflow, mapplet?
15. How can we use mapping variables in Informatica? Where do we use them?
16. What are parameter files? Where do we use them?
17. Can we override a native sql query within Informatica? Where do we do it? How do we do it?
18. Can we use procedural logic inside Infromatica? If yes how, if now how can we use external procedural logic in informatica?
19. Do we need an ETL tool? When do we go for the tools in the market?
20. How do we extract SAP data Using Informatica? What is ABAP? What are IDOCS?
21. How to determine what records to extract?
*                     Timestamps
*                     Deletes are logical with timestamped deletes
*                     Triggers on source system tables (Generally we dont do this as this decreases the source system efficiency)
*                     Application Integration Software TIBCO , MQSERIES
*                     File Compares (least method)
*                     Snapshots in Oracle(daily)
*                     Oracle Streams
22. What is Full load & Incremental or Refresh load?
23. Techniques of Error Handling - Ignore , Rejecting bad records to a flat file , loading the records and reviewing them (default values)
24. What are snapshots? What are materialized views & where do we use them? What is a materialized view log?
25. What is partitioning? What are the types of partitioning?
26. When do we Analyze the tables? How do we do it?
27. Compare ETL & Manual development?

Saturday, January 5, 2013

Data Warehousing Interview Questions


1. What is a data-warehouse?
2. What are Data Marts?
4. What is ER Diagram?
5. What is a Star Schema?
6. What is Dimensional Modeling?
7. What Snow Flake Schema?
8. What are the Different methods of loading Dimension tables?
9. What are Aggregate tables?
10. What is the Difference between OLTP and OLAP?
11. What is ETL?
12. What are the various ETL tools in the Market?
13. What are the various Reporting tools in the Market?
14. What is Fact table?
15. What is a dimension table?
16. What is a lookup table?
17. What is a general purpose scheduling tool? Name some of them?
18. What are modeling tools available in the Market? Name some of them?
19. What is real time data-warehousing?
20. What is data mining?
21. What is Normalization? First Normal Form, Second Normal Form, Third Normal Form?
22. What is ODS?
23. What type of Indexing mechanism do we need to use for a typical data warehouse?
24. Which columns go to the fact table and which columns go the dimension table?
     (My user needs to see <data element> <data element> broken by
     <data element> <data element>
     All elements before broken = Fact Measures
     All elements after broken =    Dimension Elements
25. What is a level of Granularity of a fact table? What does this signify?
     (Weekly level summarization there is no need to have Invoice Number in the fact table anymore)
26. How are the Dimension tables designed?
     De-Normalized , Wide, Short , Use Surrogate Keys, Contain Additional date fields and flags.
27. What are slowly changing dimensions?
28. What are non-additive facts? (Inventory, Account balances in bank)
29. What are conformed dimensions?
30. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB)
31. What are SCD1, SCD2, and SCD3?


Few more questions.......



1. What are the different Dimensional modeling Techniques are Available?
2. What is the Difference between Star-schema and snow-flake-schema? When we go
for star and snow-flake?
3. What are the types of dimension and facts are in DW?
4. What is the life cycle of Data warehousing project?
5. What is a Data-model?
6. What is the Difference between Top-down Approach and Bottom-up Approach?
7. What is a factless-fact Table?
8. What is a confirmed-dimension?
9. What is a junk-dimension?
10. What is a cleansing?
11. Tell me about your current project?
12. What is your role or job profile in the project?
13. What is your Job profile?
14. What is dimesion and fact?
15. What are types of dimensions?
16. What are confirmed dimensions?
17. What are generated dimensions?
18. What are slowly changing dimensions?
19. How many data marts in your project?
20. What is data mart name in your project?
21. What is the size of your data mart?
22. What is factless fact table? Give example.
23. How many fact tables are used in the project?
24. What is your fact table name in your project?
25. How many dimension tables used in the project?
26. What are the names of the dimension tables?
27. What is Schema? Types? Explain Star-Schema and Snowflake Schema with difference.
Which schema you used in your project? Why?
28. Why star-schema called as star-schema? Give example.
29. How frequently and from where you get the data as source?
30. What is difference between data mart and data warehouse?
31. What is composite key?
32. What is surrogate key? When you will go for it?
33. What is dimensional modeling?
34. What are SCD and SGT? Difference between them? Example of SGT from your project.
35. How do you rate yourself in data warehouse?
36. What is the status of your current project?
37. What is data warehouse?
38. What is ‘Star-Schema’?
39. What is ‘Snowflake-Schema’?
40. What is difference between Star-Schema and Snowflake-Schema?
41. What is mean by surrogate key?
42. What is ‘Conformed Dimension’?
43. What is Factless Fact Table?
44. When will we use connected and unconnected lookup?
45. Which cache supports connected and unconnected lookup?
46. What is the difference between SCD Type2 and SCD Type3?
47. Draw the ETL Architecture?
48. Draw the DWH Architecture?

Few more Here
You can find answers for all these questions in this blog itself.