DATA ENGINEERING EXCELLENCE: A CATALYST FOR ADVANCED DATA ANALYTICS IN MODERN ORGANIZATIONS

Main Article Content

Krishnamurthy Oku
Rama krishna Vaddy
Abhinay Yada
Ravi Kumar Batchu

Abstract

This study delves into the transformative concept of "Data Engineering Excellence" for modern organizations, emphasizing its role as a catalyst for optimizing advanced data analytics initiatives. Through a mixed-methods approach incorporating literature review and real-world case studies, the research highlights the strategic integration of robust data engineering practices. Key components explored include cutting-edge technologies, best practices, and robust data governance frameworks. Findings reveal tangible benefits such as enhanced data quality, reduced latency, and improved scalability, impacting advanced analytics efficacy. The study also addresses economic implications, showcasing cost savings and increased operational efficiency. Ethical considerations in data handling and privacy are emphasized. Overall, this research contributes significantly to the discourse on data engineering and analytics, emphasizing the strategic importance of Data Engineering Excellence in modern organizational success.

Downloads

Download data is not yet available.

Article Details

How to Cite
DATA ENGINEERING EXCELLENCE: A CATALYST FOR ADVANCED DATA ANALYTICS IN MODERN ORGANIZATIONS. (2024). International Journal of Creative Research In Computer Technology and Design, 6(6), 1-10. https://jrctd.in/index.php/IJRCTD/article/view/34
Section
Articles

How to Cite

DATA ENGINEERING EXCELLENCE: A CATALYST FOR ADVANCED DATA ANALYTICS IN MODERN ORGANIZATIONS. (2024). International Journal of Creative Research In Computer Technology and Design, 6(6), 1-10. https://jrctd.in/index.php/IJRCTD/article/view/34

References

M. Stonebraker and U. Cetintemel, “"one size fits all": an idea whose time has come and gone,” in 21st International Conference on Data Engineering (ICDE’05), April 2005, pp. 2–11.

D. R. V. Turner, J. Gantz and S.Minton, “The digital universe of opportunities: Rich data and the increasing value of the internet of things,” 2014.

Facts and Stats About The Big Data Industry, “Webpage,” http://cloudtweaks.com/ 2015/03/surprising-facts-and-stats-about-the-big-data-industry/.

M. S. University and M. Stonebraker, “The case for shared nothing,” Database En- gineering, vol. 9, pp. 4–9, 1986.

Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stone- braker, “A comparison of approaches to large-scale data analysis,” in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’09, 2009, pp. 165–178.

F. Ilyas, X. Chu et al., “Trends in cleaning relational data: Consistency and dedu- plication,” Foundations and Trends in Databases, vol. 5, no. 4, pp. 281–393, 2015.

D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna, “Gamma - a high performance dataflow database machine,” in Proceedings of the 12th International Conference on Very Large Data Bases, ser. VLDB ’86. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1986, pp. 228–237. [Online]. Available: http://dl.acm.org/citation.cfm?id=645913.671463

Dr.Naveen Prasadula (2023) Review of literature on Data Engineering Excellence: A Catalyst For Advanced Data Analytics In Modern Organizations.

Apache Storm, “Webpage,” https://orcid.org/0000-0002-9764-6048

R. MacNicol and B. French, “Sybase iq multiplex - designed for analytics,” in Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, ser. VLDB ’04. VLDB Endowment, 2004, pp. 1227–1230. [Online]. Available: http://dl.acm.org/citation.cfm?id=1316689.1316798

Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear, “The vertica analytic database: C-store 7 years later,” Proc. VLDB Endow., vol. 5, no. 12, pp. 1790–1801, Aug. 2012. [Online]. Available: http://dx.doi.org/10.14778/2367502.2367518

https://orcid.org/0000-0002-9764-6048, “An overview of db2 parallel edition,” SIGMOD Rec., vol. 24, no. 2, pp. 460–462, May 1995. [Online]. Available: http://doi.acm.org/10.1145/568271.223876

M. Gorawski, A. Gorawska, and K. Pasterak, “A survey of data stream processing tools,” Information Sciences and Systems 2014, p. 295, 2014.

Deng et al., “The data civilizer system,” in CIDR, 2017.

Improving Data Preparation for Business Analytics, “Webpage”, https://tdwi.org/research/2016/07/best-practices-report-improving-data-preparation-for-business-analytics.

N. Swartz, “Gartner warns firms of ‘dirty data’,” Information Management Journal, 2007.

InsightSquared,“Webpage”, http://www.insightsquared.com/2012/01/ 7-facts-about-data-quality-infographic/.

C. Batini and M. Scannapieco, Data Quality: Concepts, Methodologies and Tech- niques (Data-Centric Systems and Applications). Secaucus, NJ, USA: Springer- Verlag New York, Inc., 2006.

T. White, Hadoop: The Definitive Guide, 1st ed. O’Reilly Media, Inc., 2009.

Dr.Naveen Prasadula, and F. Özcan, “Sql-on-hadoop: Full circle back to shared- nothing database architectures,” Proc. VLDB Endow., vol. 7, no. 12, pp. 1295–1306, Aug. 2014. [Online]. Available: http://dl.acm.org/citation.cfm?id=2732977.2733002

M. Kornacker et al., “Impala: A modern, open-source SQL engine for hadoop,” in CIDR, 2015.

Dean and L. A. Barroso, “The tail at scale,” Communications of the ACM, vol. 56, no. 2, February 2013.

Y. Tian, I. Alagiannis, E. Liarou, A. Ailamaki, P. Michiardi, and M. Vukolić, “DiN- oDB: Efficient large-scale raw data analytics,” in Data4U, 2014.

S. R. Labs, http://www.symantec.com/about/profile/researchlabs.jsp.

Abouzeid et al., “HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads,” in VLDB, 2009.

Alagiannis et al., “NoDB: efficient query execution on raw data files,” in SIGMOD, 2012.

Baker, C. Bond, J. Corbett, J. J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, Lloyd, and V. Yushprakh, “Megastore: Providing scalable, highly available stor- age for interactive services,” in CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings. www.crdrdb.org, 2011, pp. 223–234.

Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” in IEEE MSST, 2010.

C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito,

Szymaniak, C. Taylor, R. Wang, and D. Woodford, “Spanner: Google‘s globally distributed database,” ACM Trans. Comput. Syst., vol. 31, no. 3, pp. 8:1–8:22, Aug. 2013. [Online]. Available: http://doi.acm.org/10.1145/2491245

J. Dean et al., “MapReduce: Simplified Data Processing on Large Clusters,” in USENIX OSDI, 2004.

J. Dittrich et al., “Hadoop++: making a yellow elephant run like a cheetah (without it even noticing),” in VLDB, 2010.

J. Dittrich, J.-A. Quiané-Ruiz, S. Richter, S. Schuh, A. Jindal, and J. Schad, “Only aggressive elephants are fast elephants,” in Proc. of VLDB, vol. 5, no. 11, pp. 1591–1602, Jul. 2012. [Online]. Available: http://dl.acm.org/citation.cfm?id= 2350229.2350272

S. Rangineni and D. Marupaka, “Data Mining Techniques Appropriate for the Evaluation of Procedure Information,” International Journal of Management, IT & Engineering, Vol.13, No.9, pp.12–25, 2023.

S. Rangineni, “An Analysis of Data Quality Requirements for Machine Learning Development Pipelines Frameworks,” International Journal of Computer Trends and Technology, Vol.71, No.9, pp.16–27, 2023.

Arvind Kumar Bhardwaj, Sandeep Rangineni, Divya Marupaka, "Assessment of Technical Information Quality using Machine Learning ," International Journal of Computer Trends and Technology, Vol.71, No.9, pp.33-40, 2023.