#archiveteam 2016-01-03,Sun

↑back Search

Time	Nickname	Message
00:06 ^🔗		jmad980 has quit IRC (Ping timeout: 369 seconds)
00:08 ^🔗	Sketchcow	Still uploading.
00:08 ^🔗	Sketchcow	We're past 675
00:15 ^🔗		RichardG has joined #archiveteam
00:25 ^🔗		megaminxw has joined #archiveteam
00:28 ^🔗		jmad980 has joined #archiveteam
00:42 ^🔗		JetBalsa has joined #archiveteam
01:09 ^🔗		MMovie2 has joined #archiveteam
01:11 ^🔗		MMovie has quit IRC (Read error: Operation timed out)
01:15 ^🔗		Ravenloft has quit IRC (Read error: Connection reset by peer)
01:47 ^🔗		JesseW has quit IRC (Leaving.)
01:49 ^🔗		JesseW has joined #archiveteam
01:54 ^🔗	Start	so i gathered a list of all the items in jux's s3 bucket (http://user-zip-files.s3.amazonaws.com) and put it into archivebot
01:54 ^🔗	Start	jux is being saved over a year after it died
01:55 ^🔗	JesseW	what is jux?
01:56 ^🔗	Start	http://www.archiveteam.org/index.php?title=Jux
01:56 ^🔗	Start	it was a blogging website
01:58 ^🔗	JesseW	nice. What's the format of the zip files?
01:59 ^🔗	Start	it appears that each zip file contains all the images on a users blog
02:00 ^🔗	Start	most of the posts were grabbed before it shut down: https://archive.org/details/jux_posts_to_nov_24
02:01 ^🔗	JesseW	nice of them to keep paying for AWS for another year. :-)
02:05 ^🔗		JesseW has quit IRC (Leaving.)
02:12 ^🔗		rctbeast has joined #archiveteam
02:16 ^🔗		schbirid2 has joined #archiveteam
02:19 ^🔗		schbirid has quit IRC (Read error: Operation timed out)
02:34 ^🔗		JesseW has joined #archiveteam
02:37 ^🔗		philpem has quit IRC (Ping timeout: 260 seconds)
02:41 ^🔗		Ravenloft has joined #archiveteam
02:49 ^🔗		Emcy has quit IRC (Ping timeout: 252 seconds)
03:06 ^🔗		ohhdemgir has quit IRC (Remote host closed the connection)
03:11 ^🔗		ohhdemgir has joined #archiveteam
03:29 ^🔗		megaminxw has quit IRC (Quit: Leaving.)
03:57 ^🔗		acridAxid has joined #archiveteam
04:04 ^🔗		dashcloud has quit IRC (Remote host closed the connection)
04:06 ^🔗		dashcloud has joined #archiveteam
04:07 ^🔗		dashcloud has quit IRC (Remote host closed the connection)
04:09 ^🔗		dashcloud has joined #archiveteam
04:59 ^🔗		JetBalsa has quit IRC (Read error: Connection reset by peer)
05:11 ^🔗	xmc	Start: nice!
05:22 ^🔗		rctbeast has quit IRC (Ping timeout: 240 seconds)
05:26 ^🔗		acridAxid has quit IRC (marauder)
05:29 ^🔗		acridAxid has joined #archiveteam
05:34 ^🔗		VADemon has joined #archiveteam
05:42 ^🔗	Sketchcow	Great work, Start.
05:59 ^🔗		megaminxw has joined #archiveteam
07:01 ^🔗		WinterFox has joined #archiveteam
07:30 ^🔗		FAMAS has joined #archiveteam
07:32 ^🔗	FAMAS	greetings to all, as this group is dedicated for purposes of data archival, this user is posting requests for volunteers who will participate in actions of video screenshotting contents displayed via digital devices
08:16 ^🔗		FAMAS has quit IRC (Quit: http://chat.efnet.org (EOF))
08:32 ^🔗		REiN^ has joined #archiveteam
08:33 ^🔗		JesseW has quit IRC (Read error: Operation timed out)
08:39 ^🔗	xmc	maybe, if you can interest someone in your project
08:40 ^🔗		Ghost_of_ has joined #archiveteam
09:12 ^🔗		BlueMaxim has quit IRC (Quit: Leaving)
09:13 ^🔗		philpem has joined #archiveteam
09:22 ^🔗		vOYtEC_ has quit IRC (Read error: Connection reset by peer)
09:36 ^🔗		scyther has joined #archiveteam
10:13 ^🔗		vOYtEC has joined #archiveteam
11:53 ^🔗		Emcy has joined #archiveteam
12:02 ^🔗		VADemon_ has joined #archiveteam
12:07 ^🔗		VADemon has quit IRC (hub.se irc.efnet.pl)
12:07 ^🔗		dashcloud has quit IRC (hub.se irc.efnet.pl)
12:07 ^🔗		godane has quit IRC (hub.se irc.efnet.pl)
12:18 ^🔗		lytv has quit IRC (Read error: Connection reset by peer)
12:21 ^🔗		lytv has joined #archiveteam
12:24 ^🔗		SimpBrain has quit IRC (Read error: Operation timed out)
12:27 ^🔗		dashcloud has joined #archiveteam
12:32 ^🔗		SimpBrain has joined #archiveteam
12:39 ^🔗		godane has joined #archiveteam
12:47 ^🔗		godane has quit IRC (Excess Flood)
12:49 ^🔗		godane has joined #archiveteam
13:04 ^🔗		zino_ has joined #archiveteam
13:08 ^🔗	zino_	Hmm. Just realized the combined potential disk sizes of the warrior images for Virtualbox is 68GiB. That seems excessive. Mine has ballooned to 62G so far.
13:12 ^🔗		nertzy2 has joined #archiveteam
13:30 ^🔗	phuzion	zino_: what seems excessive about it? You need 8GB for the system parttion, and 60GB for the data partition.
13:30 ^🔗	phuzion	ps, warrior questions generally go in #warrior
13:34 ^🔗		nertzy2 has quit IRC (Quit: This computer has gone to sleep)
13:36 ^🔗		megaminxw has quit IRC (Quit: Leaving.)
14:19 ^🔗		HarryCros has joined #archiveteam
14:19 ^🔗		wp494_ has joined #archiveteam
14:20 ^🔗		Emcy_ has joined #archiveteam
14:20 ^🔗		RichardG_ has joined #archiveteam
14:20 ^🔗		WinterFox has quit IRC (Remote host closed the connection)
14:21 ^🔗		Microguru has quit IRC (Ping timeout: 250 seconds)
14:21 ^🔗		wp494 has quit IRC (Ping timeout: 250 seconds)
14:21 ^🔗		lytv has quit IRC (Ping timeout: 250 seconds)
14:21 ^🔗		RichardG has quit IRC (Ping timeout: 250 seconds)
14:21 ^🔗		Gfy has quit IRC (Ping timeout: 250 seconds)
14:21 ^🔗		alard has quit IRC (Ping timeout: 250 seconds)
14:21 ^🔗		diacope has quit IRC (Ping timeout: 250 seconds)
14:22 ^🔗		Emcy has quit IRC (Ping timeout: 250 seconds)
14:22 ^🔗		HCross has quit IRC (Ping timeout: 250 seconds)
14:22 ^🔗		superkuh_ has quit IRC (Ping timeout: 250 seconds)
14:22 ^🔗		lytv has joined #archiveteam
14:28 ^🔗		Gfy has joined #archiveteam
14:32 ^🔗		alard has joined #archiveteam
14:32 ^🔗		swebb sets mode: +o alard
14:35 ^🔗		superkuh_ has joined #archiveteam
14:36 ^🔗		Microguru has joined #archiveteam
14:49 ^🔗		Ghost_of_ has quit IRC (Quit: Leaving)
15:29 ^🔗	zino_	phuzion: Excessive as in there is no need for it. In what situation should a casual user cache 60GiB of data before uploading?
15:31 ^🔗		RichardG_ is now known as RichardG
15:34 ^🔗	dashcloud	zino_: probably because 60 GB is enough to cover any kind of project, rather than needing to have separate images for different kinds of projects
15:35 ^🔗	dashcloud	for text or image projects, 60 GB is very likely overkill, but you'll never get there, where as for video projects, 60 GB is large enough to have you not constantly running out of space
15:36 ^🔗	HarryCros	the issue with large sizes like that, is that people with slow uploads like me will eventually have a large queue of files to upload
15:38 ^🔗	zino_	I consider it a bigger problem that a casual contributor suddenly finds his 240G C: SSD maxed out and deletes the whole thing. That will eventually happend even if he never caches more than a gig at any time.
15:39 ^🔗	zino_	(This is not an actual problem for me. I'm mostly bike-shedding.)
15:39 ^🔗	SmileyG	I don't think anyone is doing this casually
15:39 ^🔗	SmileyG	it was discussed previously
15:39 ^🔗	SmileyG	tho i wonder why it's expanding to use the entire 60Gb from the off
15:39 ^🔗	SmileyG	I thought it expanded as needed.
15:41 ^🔗	zino_	SmileyG: In a perfect world the same filestructure/blocks would be used every time, but in practise you eventually write to most blocks. Deleting data from blocks will not shrink the image.
15:41 ^🔗	SmileyG	yah
15:42 ^🔗	SmileyG	oh right, you've been running it awhile then?
15:42 ^🔗	SmileyG	there is an option to 'shrink' it again btw
15:42 ^🔗	zino_	It's probably been running for 2 years or so.
15:42 ^🔗	zino_	VirtualBoxs CLI interface has something for that I think. Will check when I get some time.
15:52 ^🔗	zino_	Meh. Waiting for a compile now anyway. I'll do it and pester #warrior with the result.
15:54 ^🔗		Atom__ has joined #archiveteam
16:16 ^🔗	zino_	Nope. Giving up that. the warrior is a vmdk, not vdi. So VirtualBox's tools can't shrink it. Would involve converting it or installing VMware tools.
16:17 ^🔗		Ravenloft has quit IRC (Ping timeout: 606 seconds)
17:28 ^🔗		JesseW has joined #archiveteam
17:56 ^🔗	DFJustin	if you shut the warrior down properly such that no tasks are running, you can just delete the data partition file and recreate an empty one
18:14 ^🔗		atomotic has joined #archiveteam
18:23 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
18:39 ^🔗		scyther has quit IRC (Read error: Connection reset by peer)
19:04 ^🔗		diacope has joined #archiveteam
19:29 ^🔗		dserodio has joined #archiveteam
19:46 ^🔗		Microguru has quit IRC (Read error: Connection reset by peer)
19:59 ^🔗	Sketchcow	Hey, News grabbing people - I was informed about this project that Archive interacts with: http://gdeltproject.org/
20:03 ^🔗	Sketchcow	Please look at it, make sure we're doing something different.
20:07 ^🔗	arkiver	Yes, we're doing somethinig different
20:08 ^🔗	arkiver	The GDELT projec misses a lot of non-English websites
20:08 ^🔗	arkiver	as well as more local newssites
20:09 ^🔗	arkiver	The GDELT grab also get's videos if a regex for them is provided
20:09 ^🔗	arkiver	We can also control the newsgrabs better then GDELT
20:10 ^🔗	arkiver	As far as I know GDELT does the frontpage and the subpages (not sure if they always do subpages) and srape new articles they find
20:10 ^🔗	arkiver	We can discover items through better manual settings and URLs, like RSS feeds, etc.
20:10 ^🔗	arkiver	This makes our crawls probably a bit more complete
20:11 ^🔗	arkiver	I'm not sure about this but I think GDELT covers 2 afghanistan newswebsites
20:11 ^🔗	arkiver	NewsGrabber currently covers more then 20
20:11 ^🔗	arkiver	<arkiver>The GDELT grab also get's videos if a regex for them is provided
20:11 ^🔗		dserodio has quit IRC (Read error: Operation timed out)
20:12 ^🔗	arkiver	^ for that I meant to say NewsGrabber grab also get's videos if a regex for them is provided
20:12 ^🔗	arkiver	But we're currently only grabbing videos if the website is supported by youtube-dl
20:12 ^🔗	arkiver	So that's why NewsGrabber is different from the GDELT grab
20:21 ^🔗	JesseW	Somebody, please add any of the above that isn't already on http://archiveteam.org/index.php?title=NewsGrabber
20:41 ^🔗		dserodio has joined #archiveteam
20:56 ^🔗	HarryCros	and if youtube-dl has decided to behave
21:09 ^🔗		WinterFox has joined #archiveteam
21:10 ^🔗		WinterFox has quit IRC (Client Quit)
21:10 ^🔗		WinterFox has joined #archiveteam
21:19 ^🔗		SilSte has quit IRC (Remote host closed the connection)
21:21 ^🔗		schbirid2 has quit IRC (Quit: Leaving)
21:47 ^🔗		HarryCros is now known as HCross
21:52 ^🔗		JesseW has quit IRC (Read error: Operation timed out)
22:25 ^🔗		acridAxid has quit IRC (marauder)
22:28 ^🔗		acridAxid has joined #archiveteam
23:00 ^🔗		megaminxw has joined #archiveteam
23:08 ^🔗		nertzy2 has joined #archiveteam
23:09 ^🔗		Emcy_ has quit IRC (Ping timeout: 252 seconds)
23:10 ^🔗		nertzy2 has quit IRC (Client Quit)
23:12 ^🔗		Emcy has joined #archiveteam
23:22 ^🔗		JesseW has joined #archiveteam
23:26 ^🔗	arkiver	The NewsGrabber project now covers more then 30 sites from the UAE!
23:26 ^🔗	arkiver	We now fully cover the UAE
23:27 ^🔗	JesseW	awesome!
23:27 ^🔗	arkiver	Join NewsGrabber: #newsgrabber
23:27 ^🔗	JesseW	Is there a nicely formatted, automatically updated, list of sites included in newsgrabber written yet?
23:28 ^🔗	arkiver	https://github.com/ArchiveTeam/NewsGrabber/tree/master/services
23:28 ^🔗	arkiver	that looks nice I think
23:28 ^🔗	arkiver	The bot of NewsGrabber, newsbuddy, gives updates on new found links and uploads and can be followed here: #newsgrabberbot
23:34 ^🔗	JesseW	Hm. I think I'll write something more like what I was thinking of, then.
23:35 ^🔗	arkiver	what are you thinking of?
23:36 ^🔗	JesseW	I think it'll be easier to write it than explain it. :-)
23:36 ^🔗	ersi	making an outdated list
23:36 ^🔗	ersi	:-)
23:36 ^🔗	JesseW	ersi: :-P
23:36 ^🔗	HCross	thing with github is that it updates all the time, and is the list the server feeds off
23:40 ^🔗	JesseW	I think what I was thinking of is the writehtmllist function in main.py
23:41 ^🔗	HCross	So http://newsgrabber.harrycross.me
23:42 ^🔗	JesseW	yes, but apparently with one row per file in services/ and different columns.

irclogger-viewer