#archiveteam 2017-06-08,Thu

↑back Search

Time Nickname Message
00:09 🔗 BlueMaxim has joined #archiveteam
01:38 🔗 j08nY has quit IRC (Remote host closed the connection)
01:53 🔗 nertzy has quit IRC (This computer has gone to sleep)
02:12 🔗 ZexaronS has quit IRC (Leaving)
02:20 🔗 fie has joined #archiveteam
03:00 🔗 Aranje has quit IRC (Three sheets to the wind)
03:17 🔗 ndiddy has quit IRC ()
04:58 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:05 🔗 Sk1d has joined #archiveteam
05:17 🔗 zyphlar has joined #archiveteam
05:35 🔗 zenguy has quit IRC (Read error: Operation timed out)
05:42 🔗 zenguy has joined #archiveteam
05:45 🔗 zenguy has quit IRC (Read error: Operation timed out)
05:49 🔗 zenguy has joined #archiveteam
07:08 🔗 schbirid has joined #archiveteam
08:01 🔗 Jonison has joined #archiveteam
08:02 🔗 atomotic has joined #archiveteam
08:47 🔗 zyphlar has quit IRC (Quit: Connection closed for inactivity)
09:03 🔗 j08nY has joined #archiveteam
10:21 🔗 Guest has joined #archiveteam
10:25 🔗 anhedonis has joined #archiveteam
11:03 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:25 🔗 ZexaronS has joined #archiveteam
11:47 🔗 Smiley has joined #archiveteam
11:47 🔗 SmileyG has quit IRC (Read error: Connection reset by peer)
11:48 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
12:26 🔗 j08nY has quit IRC (Quit: Leaving)
12:43 🔗 atomotic has joined #archiveteam
13:32 🔗 icedice has joined #archiveteam
13:35 🔗 icedice2 has joined #archiveteam
13:36 🔗 anhedonis has quit IRC (Quit: anhedonis)
13:36 🔗 ZexaronS has quit IRC (Leaving)
13:36 🔗 atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
13:37 🔗 icedice has quit IRC (Ping timeout: 250 seconds)
13:52 🔗 ZexaronS has joined #archiveteam
14:36 🔗 atomotic has joined #archiveteam
16:26 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
16:31 🔗 Atom has quit IRC (Read error: Operation timed out)
16:35 🔗 MMovie2 has quit IRC (Read error: Operation timed out)
16:45 🔗 MMovie has joined #archiveteam
17:36 🔗 ats has quit IRC (Quit: kernel!)
17:39 🔗 MMovie has quit IRC (Read error: Operation timed out)
17:40 🔗 ats has joined #archiveteam
17:50 🔗 MMovie has joined #archiveteam
18:26 🔗 powerKitt has joined #archiveteam
18:27 🔗 powerKitt https://blogs.msdn.microsoft.com/bharry/2017/03/31/shutting-down-codeplex/ Have we grabbed a copy of CodePlex yet?
18:27 🔗 JAA Not yet, but it's known: http://archiveteam.org/index.php?title=CodePlex
18:28 🔗 JAA Also, it doesn't look like they'll nuke the content anytime soon. For once, a company handles a service shutdown properly.
18:29 🔗 powerKitt Yeah, and the download tools for the archived projects sound pretty complete.
18:30 🔗 powerKitt We should probably still grab a copy of the archive, like we did with the Google Code archive.
18:30 🔗 powerKitt Read only archives don't last forever.
18:31 🔗 JAA Oh, definitely.
18:37 🔗 j08nY has joined #archiveteam
19:11 🔗 kittymeow Is there any cache of http://web.archive.org/web/19980509084931/http://members.visi.net/ member websites? http://members.visi.net/~fathom was from before 1999 I think so internet archive doesn't have it ( http://www.infotoday.com/online/mar02/OnTheNet.htm )
19:12 🔗 TheLovina has quit IRC (Read error: Operation timed out)
19:20 🔗 schbirid kittymeow: http://web.archive.org/web/*/http://members.visi.net/*
19:38 🔗 fie has quit IRC (Ping timeout: 370 seconds)
19:39 🔗 TheLovina has joined #archiveteam
19:46 🔗 powerKitt has quit IRC (Quit: Page closed)
19:57 🔗 icedice2 https://imgbox.com/
19:57 🔗 icedice2 "Dear User,
19:57 🔗 icedice2 We’d like to inform you that we will be shutting down services on June 30th, 2017. Please download and backup your files before this date.
19:57 🔗 icedice2 Sincerely,
19:57 🔗 icedice2 The Team "
20:02 🔗 RichardG has quit IRC (Ping timeout: 370 seconds)
20:12 🔗 RichardG has joined #archiveteam
20:20 🔗 fie has joined #archiveteam
20:47 🔗 schbirid has quit IRC (Quit: Leaving)
21:08 🔗 tapedrive Hey ArchiveTeam, as you may know, it's the UK general election today, and therefore I've compiled a list of all the candidate's facebook, twitter and campaign websites.
21:08 🔗 tapedrive There's a lot of URLs - 6098 total, (2324 twitter, 2303 facebook and 1471 websites).
21:08 🔗 tapedrive It would be great if they were processed as soon as possible because I imagine tomorrow they'll all start changing and being taken down.
21:08 🔗 tapedrive I've got the URL lists in three separate files - what's the next step? ArchiveBot?
21:09 🔗 TheLovina has quit IRC (Read error: Operation timed out)
21:09 🔗 ndiddy has joined #archiveteam
21:13 🔗 xmc facebook tends to not work well in archivebot, we are generally banned for scraping
21:13 🔗 xmc but websites, yes
21:13 🔗 xmc that sounds like it took a ton of work!
21:13 🔗 tapedrive Here are the files: websites: https://pastebin.com/cayH7CKT :: twitters: https://pastebin.com/1Puq0M4W :: facebooks: https://pastebin.com/RuTw1Rbg
21:14 🔗 tapedrive Not really, there were a few websites I scraped to find all the links, but it only took an hour or so.
21:14 🔗 tapedrive I've never used archivebot, so can I just leave those files with you?
21:14 🔗 xmc i've got a lot of work that needs doing, but hopefully someone else here can take up the charge?
21:19 🔗 tapedrive Just looking at the archivebot readthedocs, is there a way to do recursive archiving from a list of urls?
21:19 🔗 xmc yes but it's liable to go off the rails unless you keep it on a tight leash
21:19 🔗 Kaz tapedrive: might want to check your lists, i see "http://www.Facebook/Lab4fav" in there
21:19 🔗 xmc you probably want !a < http://url --no-offsite-links --ignore-sets=blogs , at the very least
21:20 🔗 tapedrive Kaz: I've scraped these from many other sites, so there may be some incorrect ones.
21:21 🔗 tapedrive xmc: Is that url for the file with the url list, or each individual url?
21:21 🔗 xmc the file with the url list
21:22 🔗 xmc make sure it's a plain text file, one url per line
21:47 🔗 ZexaronS has quit IRC (Leaving)
21:47 🔗 MrRadar With that many URLs it'd probably be a good idea to split it into multiple jobs
21:47 🔗 MrRadar Archivebot instances aren't always very stable
21:50 🔗 tapedrive Would you recommend aborting now, and splitting it, or keeping it going and monitoring it?
21:51 🔗 MrRadar Eh, keep monitoring it
22:38 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
22:42 🔗 nertzy has joined #archiveteam
22:53 🔗 icedice2 has quit IRC (Quit: Leaving)
23:05 🔗 tapedrive Would it be acceptable to add more workers to the UK election candidates websites, seeing as they're all hosted on different domains, and time is critical with it?
23:06 🔗 xmc more workers how? if you want to split it into multiple jobs, that would be sensible
23:07 🔗 tapedrive With the concurrency command
23:07 🔗 xmc oh! yeah go for it
23:42 🔗 whydomain Have we archived the main UK party websites? If not, they'd probably be pretty important.
23:42 🔗 whydomain However, they have a large amount of pages.
23:50 🔗 Atom has joined #archiveteam

irclogger-viewer