[00:09] *** BlueMaxim has joined #archiveteam [01:38] *** j08nY has quit IRC (Remote host closed the connection) [01:53] *** nertzy has quit IRC (This computer has gone to sleep) [02:12] *** ZexaronS has quit IRC (Leaving) [02:20] *** fie has joined #archiveteam [03:00] *** Aranje has quit IRC (Three sheets to the wind) [03:17] *** ndiddy has quit IRC () [04:58] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:05] *** Sk1d has joined #archiveteam [05:17] *** zyphlar has joined #archiveteam [05:35] *** zenguy has quit IRC (Read error: Operation timed out) [05:42] *** zenguy has joined #archiveteam [05:45] *** zenguy has quit IRC (Read error: Operation timed out) [05:49] *** zenguy has joined #archiveteam [07:08] *** schbirid has joined #archiveteam [08:01] *** Jonison has joined #archiveteam [08:02] *** atomotic has joined #archiveteam [08:47] *** zyphlar has quit IRC (Quit: Connection closed for inactivity) [09:03] *** j08nY has joined #archiveteam [10:21] *** Guest has joined #archiveteam [10:25] *** anhedonis has joined #archiveteam [11:03] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:25] *** ZexaronS has joined #archiveteam [11:47] *** Smiley has joined #archiveteam [11:47] *** SmileyG has quit IRC (Read error: Connection reset by peer) [11:48] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:26] *** j08nY has quit IRC (Quit: Leaving) [12:43] *** atomotic has joined #archiveteam [13:32] *** icedice has joined #archiveteam [13:35] *** icedice2 has joined #archiveteam [13:36] *** anhedonis has quit IRC (Quit: anhedonis) [13:36] *** ZexaronS has quit IRC (Leaving) [13:36] *** atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [13:37] *** icedice has quit IRC (Ping timeout: 250 seconds) [13:52] *** ZexaronS has joined #archiveteam [14:36] *** atomotic has joined #archiveteam [16:26] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:31] *** Atom has quit IRC (Read error: Operation timed out) [16:35] *** MMovie2 has quit IRC (Read error: Operation timed out) [16:45] *** MMovie has joined #archiveteam [17:36] *** ats has quit IRC (Quit: kernel!) [17:39] *** MMovie has quit IRC (Read error: Operation timed out) [17:40] *** ats has joined #archiveteam [17:50] *** MMovie has joined #archiveteam [18:26] *** powerKitt has joined #archiveteam [18:27] https://blogs.msdn.microsoft.com/bharry/2017/03/31/shutting-down-codeplex/ Have we grabbed a copy of CodePlex yet? [18:27] Not yet, but it's known: http://archiveteam.org/index.php?title=CodePlex [18:28] Also, it doesn't look like they'll nuke the content anytime soon. For once, a company handles a service shutdown properly. [18:29] Yeah, and the download tools for the archived projects sound pretty complete. [18:30] We should probably still grab a copy of the archive, like we did with the Google Code archive. [18:30] Read only archives don't last forever. [18:31] Oh, definitely. [18:37] *** j08nY has joined #archiveteam [19:11] Is there any cache of http://web.archive.org/web/19980509084931/http://members.visi.net/ member websites? http://members.visi.net/~fathom was from before 1999 I think so internet archive doesn't have it ( http://www.infotoday.com/online/mar02/OnTheNet.htm ) [19:12] *** TheLovina has quit IRC (Read error: Operation timed out) [19:20] kittymeow: http://web.archive.org/web/*/http://members.visi.net/* [19:38] *** fie has quit IRC (Ping timeout: 370 seconds) [19:39] *** TheLovina has joined #archiveteam [19:46] *** powerKitt has quit IRC (Quit: Page closed) [19:57] https://imgbox.com/ [19:57] "Dear User, [19:57] We’d like to inform you that we will be shutting down services on June 30th, 2017. Please download and backup your files before this date. [19:57] Sincerely, [19:57] The Team " [20:02] *** RichardG has quit IRC (Ping timeout: 370 seconds) [20:12] *** RichardG has joined #archiveteam [20:20] *** fie has joined #archiveteam [20:47] *** schbirid has quit IRC (Quit: Leaving) [21:08] Hey ArchiveTeam, as you may know, it's the UK general election today, and therefore I've compiled a list of all the candidate's facebook, twitter and campaign websites. [21:08] There's a lot of URLs - 6098 total, (2324 twitter, 2303 facebook and 1471 websites). [21:08] It would be great if they were processed as soon as possible because I imagine tomorrow they'll all start changing and being taken down. [21:08] I've got the URL lists in three separate files - what's the next step? ArchiveBot? [21:09] *** TheLovina has quit IRC (Read error: Operation timed out) [21:09] *** ndiddy has joined #archiveteam [21:13] facebook tends to not work well in archivebot, we are generally banned for scraping [21:13] but websites, yes [21:13] that sounds like it took a ton of work! [21:13] Here are the files: websites: https://pastebin.com/cayH7CKT :: twitters: https://pastebin.com/1Puq0M4W :: facebooks: https://pastebin.com/RuTw1Rbg [21:14] Not really, there were a few websites I scraped to find all the links, but it only took an hour or so. [21:14] I've never used archivebot, so can I just leave those files with you? [21:14] i've got a lot of work that needs doing, but hopefully someone else here can take up the charge? [21:19] Just looking at the archivebot readthedocs, is there a way to do recursive archiving from a list of urls? [21:19] yes but it's liable to go off the rails unless you keep it on a tight leash [21:19] tapedrive: might want to check your lists, i see "http://www.Facebook/Lab4fav" in there [21:19] you probably want !a < http://url --no-offsite-links --ignore-sets=blogs , at the very least [21:20] Kaz: I've scraped these from many other sites, so there may be some incorrect ones. [21:21] xmc: Is that url for the file with the url list, or each individual url? [21:21] the file with the url list [21:22] make sure it's a plain text file, one url per line [21:47] *** ZexaronS has quit IRC (Leaving) [21:47] With that many URLs it'd probably be a good idea to split it into multiple jobs [21:47] Archivebot instances aren't always very stable [21:50] Would you recommend aborting now, and splitting it, or keeping it going and monitoring it? [21:51] Eh, keep monitoring it [22:38] *** Jonison has quit IRC (Read error: Connection reset by peer) [22:42] *** nertzy has joined #archiveteam [22:53] *** icedice2 has quit IRC (Quit: Leaving) [23:05] Would it be acceptable to add more workers to the UK election candidates websites, seeing as they're all hosted on different domains, and time is critical with it? [23:06] more workers how? if you want to split it into multiple jobs, that would be sensible [23:07] With the concurrency command [23:07] oh! yeah go for it [23:42] Have we archived the main UK party websites? If not, they'd probably be pretty important. [23:42] However, they have a large amount of pages. [23:50] *** Atom has joined #archiveteam