regexp

Matching two arrays and removing values from one of them

Nothing too exciting but here it comes.

So while I was doing some scraping, I had this small problem where I already had some data, which didn't exactly match up with the new stuff I had scraped, but parts of it was contained in the new stuff. So what I needed to do was to find all the occurrences of the old data in the new data and remove it, because I already had that lying around.

I had two files with the two data sets to be matched, each of them with their data separated by a line break

<?php
//load data
$dataset1 = file_get_contents('data1.csv');
$dataset2 = file_get_contents('data2.csv');

//make data into arrays
$dataarr1 = explode("\n", $dataset1);
$dataarr2 = explode("\n", $dataset2);

foreach($dataarr1 as $d1) { //shoop da loop!
        foreach($dataarr2 as $d2key => $d2) {
                preg_match("/".$d1."/", $d2, $matches);
                if (!empty($matches) && $matches[0] != "") {
                        unset($dataset2[$d2key]);
                }
        }
}

//write the data
$fileres = fopen("outdata.csv", "w+");
fwrite($fileres, implode("\n", $dataset2));
fclose($fileres);
?>

So what this essentially does is to take two datasets. It takes the first and matches up against the second. If the first one is contained in the second one, delete the data from the second dataset. Try all possibilities. Afterwards it just writes it to a file.