Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
poteman f8e8380213 | 1 year ago | |
---|---|---|
.. | ||
README.md | 2 years ago | |
__init__.py | 2 years ago | |
fe_count.py | 2 years ago | |
fe_cross.py | 2 years ago | |
fe_cumsum.py | 2 years ago | |
fe_denoising_autoencoder.py | 2 years ago | |
fe_diff.py | 2 years ago | |
fe_dimension_reduction.py | 2 years ago | |
fe_exp_weighted_mean.py | 2 years ago | |
fe_gbdt.py | 2 years ago | |
fe_image2vec.py | 2 years ago | |
fe_nlp.py | 1 year ago | |
fe_one2M.py | 2 years ago | |
fe_one2many.py | 2 years ago | |
fe_rank.py | 2 years ago | |
fe_rolling_stat_ts.py | 2 years ago | |
fe_shift.py | 2 years ago | |
fe_shift_ts.py | 2 years ago | |
fe_stat.py | 2 years ago | |
fe_target_encoding.py | 2 years ago | |
fe_time.py | 2 years ago |
将类别型特征转化为其出现的次数。
举例:将商品id在转化成商品id在全量数据集中出现的次数。
对某一列进行聚合,求另一列的累计求和。
举例:每一条消费记录为一条样本,对用户聚合,求消费金额的cumsum。表示该笔消费之后,用户累计消费金额。
todo
按时间排序后,对某一列进行聚合,求另一列当前样本和前N条(或后N条)样本的差值。
举例:按时间排序后,计算用户当前笔消费消费金额和上一笔消费消费金额的差值。
用pca、ica、grp、srp四种降维方法对原始特征进行降维,将降维之后的结果作为特征
from autox.autox_competition.feature_engineer import FeatureDimensionReduction
featureDimensionReduction = FeatureDimensionReduction()
featureDimensionReduction.fit(df, id_column = ['row_id','time_id','investment_id'], target = 'target')
dr_feature = featureDimensionReduction.transform(df)
指数移动平均值
将样本输入到训练好的gbdt模型(例如一个包含30颗树的gbdt模型)中,将样本落入到每棵树的叶子结点的编号作为特征。
from autox.autox_competition.feature_engineer import FeatureGbdt
featureGbdt = FeatureGbdt()
featureGbdt.fit(X_train, y_train, objective= 'binary', num_of_features = 50)
lgb_feature_train = featureGbdt.transform(X_train)
lgb_feature_test = featureGbdt.transform(X_test)
将图片输入转化为向量特征。
对于识别为长文本的列,提取nlp信息。
对某一列进行聚合,求另一列在聚合窗口内的排序值。
举例:计算当前样本属于用户在当天内第几次出现的样本
时序类特征,计算滚动窗口内的统计特征(均值、方差、中位数、最大值、最小值)。
对某一列聚合,获得另一列在前N条(或后N条)样本中的值。
举例:获得用户在上一条记录中的违约情况。
时序类特征,获得lag信息。
对某一列聚合,获得另一列在窗口内的统计信息(对于连续型变量求均值、最小值、最大值、中位数、方差,
对于类别型变量求nunique)
将类别型变量转化为对应类别下标签的平均值。
举例:标签为年收入,将学历(类别型变量)转化为对应学历的平均年收入。
将时间列特征进行分解。
获得信息包括:年、月、日、时、一年的第几周、星期、是否工作日、季度、是否月初、是否月末。
使用有标签的数据训练模型, 对无标签的数据进行预测, 将预测置信度大的样本取出来,用预测结果进行标记,作为伪标签增强数据集。
AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.
Jupyter Notebook CSV Python Markdown Pickle other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》