#archiveteam 2017-06-08,Thu

↑back Search

Time	Nickname	Message
00:09 ^🔗		BlueMaxim has joined #archiveteam
01:38 ^🔗		j08nY has quit IRC (Remote host closed the connection)
01:53 ^🔗		nertzy has quit IRC (This computer has gone to sleep)
02:12 ^🔗		ZexaronS has quit IRC (Leaving)
02:20 ^🔗		fie has joined #archiveteam
03:00 ^🔗		Aranje has quit IRC (Three sheets to the wind)
03:17 ^🔗		ndiddy has quit IRC ()
04:58 ^🔗		Sk1d has quit IRC (Ping timeout: 250 seconds)
05:05 ^🔗		Sk1d has joined #archiveteam
05:17 ^🔗		zyphlar has joined #archiveteam
05:35 ^🔗		zenguy has quit IRC (Read error: Operation timed out)
05:42 ^🔗		zenguy has joined #archiveteam
05:45 ^🔗		zenguy has quit IRC (Read error: Operation timed out)
05:49 ^🔗		zenguy has joined #archiveteam
07:08 ^🔗		schbirid has joined #archiveteam
08:01 ^🔗		Jonison has joined #archiveteam
08:02 ^🔗		atomotic has joined #archiveteam
08:47 ^🔗		zyphlar has quit IRC (Quit: Connection closed for inactivity)
09:03 ^🔗		j08nY has joined #archiveteam
10:21 ^🔗		Guest has joined #archiveteam
10:25 ^🔗		anhedonis has joined #archiveteam
11:03 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:25 ^🔗		ZexaronS has joined #archiveteam
11:47 ^🔗		Smiley has joined #archiveteam
11:47 ^🔗		SmileyG has quit IRC (Read error: Connection reset by peer)
11:48 ^🔗		BlueMaxim has quit IRC (Read error: Operation timed out)
12:26 ^🔗		j08nY has quit IRC (Quit: Leaving)
12:43 ^🔗		atomotic has joined #archiveteam
13:32 ^🔗		icedice has joined #archiveteam
13:35 ^🔗		icedice2 has joined #archiveteam
13:36 ^🔗		anhedonis has quit IRC (Quit: anhedonis)
13:36 ^🔗		ZexaronS has quit IRC (Leaving)
13:36 ^🔗		atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
13:37 ^🔗		icedice has quit IRC (Ping timeout: 250 seconds)
13:52 ^🔗		ZexaronS has joined #archiveteam
14:36 ^🔗		atomotic has joined #archiveteam
16:26 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
16:31 ^🔗		Atom has quit IRC (Read error: Operation timed out)
16:35 ^🔗		MMovie2 has quit IRC (Read error: Operation timed out)
16:45 ^🔗		MMovie has joined #archiveteam
17:36 ^🔗		ats has quit IRC (Quit: kernel!)
17:39 ^🔗		MMovie has quit IRC (Read error: Operation timed out)
17:40 ^🔗		ats has joined #archiveteam
17:50 ^🔗		MMovie has joined #archiveteam
18:26 ^🔗		powerKitt has joined #archiveteam
18:27 ^🔗	powerKitt	https://blogs.msdn.microsoft.com/bharry/2017/03/31/shutting-down-codeplex/ Have we grabbed a copy of CodePlex yet?
18:27 ^🔗	JAA	Not yet, but it's known: http://archiveteam.org/index.php?title=CodePlex
18:28 ^🔗	JAA	Also, it doesn't look like they'll nuke the content anytime soon. For once, a company handles a service shutdown properly.
18:29 ^🔗	powerKitt	Yeah, and the download tools for the archived projects sound pretty complete.
18:30 ^🔗	powerKitt	We should probably still grab a copy of the archive, like we did with the Google Code archive.
18:30 ^🔗	powerKitt	Read only archives don't last forever.
18:31 ^🔗	JAA	Oh, definitely.
18:37 ^🔗		j08nY has joined #archiveteam
19:11 ^🔗	kittymeow	Is there any cache of http://web.archive.org/web/19980509084931/http://members.visi.net/ member websites? http://members.visi.net/~fathom was from before 1999 I think so internet archive doesn't have it ( http://www.infotoday.com/online/mar02/OnTheNet.htm )
19:12 ^🔗		TheLovina has quit IRC (Read error: Operation timed out)
19:20 ^🔗	schbirid	kittymeow: http://web.archive.org/web//http://members.visi.net/
19:38 ^🔗		fie has quit IRC (Ping timeout: 370 seconds)
19:39 ^🔗		TheLovina has joined #archiveteam
19:46 ^🔗		powerKitt has quit IRC (Quit: Page closed)
19:57 ^🔗	icedice2	https://imgbox.com/
19:57 ^🔗	icedice2	"Dear User,
19:57 ^🔗	icedice2	We’d like to inform you that we will be shutting down services on June 30th, 2017. Please download and backup your files before this date.
19:57 ^🔗	icedice2	Sincerely,
19:57 ^🔗	icedice2	The Team "
20:02 ^🔗		RichardG has quit IRC (Ping timeout: 370 seconds)
20:12 ^🔗		RichardG has joined #archiveteam
20:20 ^🔗		fie has joined #archiveteam
20:47 ^🔗		schbirid has quit IRC (Quit: Leaving)
21:08 ^🔗	tapedrive	Hey ArchiveTeam, as you may know, it's the UK general election today, and therefore I've compiled a list of all the candidate's facebook, twitter and campaign websites.
21:08 ^🔗	tapedrive	There's a lot of URLs - 6098 total, (2324 twitter, 2303 facebook and 1471 websites).
21:08 ^🔗	tapedrive	It would be great if they were processed as soon as possible because I imagine tomorrow they'll all start changing and being taken down.
21:08 ^🔗	tapedrive	I've got the URL lists in three separate files - what's the next step? ArchiveBot?
21:09 ^🔗		TheLovina has quit IRC (Read error: Operation timed out)
21:09 ^🔗		ndiddy has joined #archiveteam
21:13 ^🔗	xmc	facebook tends to not work well in archivebot, we are generally banned for scraping
21:13 ^🔗	xmc	but websites, yes
21:13 ^🔗	xmc	that sounds like it took a ton of work!
21:13 ^🔗	tapedrive	Here are the files: websites: https://pastebin.com/cayH7CKT :: twitters: https://pastebin.com/1Puq0M4W :: facebooks: https://pastebin.com/RuTw1Rbg
21:14 ^🔗	tapedrive	Not really, there were a few websites I scraped to find all the links, but it only took an hour or so.
21:14 ^🔗	tapedrive	I've never used archivebot, so can I just leave those files with you?
21:14 ^🔗	xmc	i've got a lot of work that needs doing, but hopefully someone else here can take up the charge?
21:19 ^🔗	tapedrive	Just looking at the archivebot readthedocs, is there a way to do recursive archiving from a list of urls?
21:19 ^🔗	xmc	yes but it's liable to go off the rails unless you keep it on a tight leash
21:19 ^🔗	Kaz	tapedrive: might want to check your lists, i see "http://www.Facebook/Lab4fav" in there
21:19 ^🔗	xmc	you probably want !a < http://url --no-offsite-links --ignore-sets=blogs , at the very least
21:20 ^🔗	tapedrive	Kaz: I've scraped these from many other sites, so there may be some incorrect ones.
21:21 ^🔗	tapedrive	xmc: Is that url for the file with the url list, or each individual url?
21:21 ^🔗	xmc	the file with the url list
21:22 ^🔗	xmc	make sure it's a plain text file, one url per line
21:47 ^🔗		ZexaronS has quit IRC (Leaving)
21:47 ^🔗	MrRadar	With that many URLs it'd probably be a good idea to split it into multiple jobs
21:47 ^🔗	MrRadar	Archivebot instances aren't always very stable
21:50 ^🔗	tapedrive	Would you recommend aborting now, and splitting it, or keeping it going and monitoring it?
21:51 ^🔗	MrRadar	Eh, keep monitoring it
22:38 ^🔗		Jonison has quit IRC (Read error: Connection reset by peer)
22:42 ^🔗		nertzy has joined #archiveteam
22:53 ^🔗		icedice2 has quit IRC (Quit: Leaving)
23:05 ^🔗	tapedrive	Would it be acceptable to add more workers to the UK election candidates websites, seeing as they're all hosted on different domains, and time is critical with it?
23:06 ^🔗	xmc	more workers how? if you want to split it into multiple jobs, that would be sensible
23:07 ^🔗	tapedrive	With the concurrency command
23:07 ^🔗	xmc	oh! yeah go for it
23:42 ^🔗	whydomain	Have we archived the main UK party websites? If not, they'd probably be pretty important.
23:42 ^🔗	whydomain	However, they have a large amount of pages.
23:50 ^🔗		Atom has joined #archiveteam

irclogger-viewer