#archiveteam-bs 2013-03-23,Sat

↑back Search

Time	Nickname	Message
00:52 ^🔗	godane	Kevin Pereira's Imaginary Friend: https://archive.org/details/g4tv.com-video39529
01:11 ^🔗	godane	my cpu temp is at 64.5 C
01:12 ^🔗	godane	never seen that before compiling firefox
01:50 ^🔗	joepie91	godane: well hey, it's called firefox
01:50 ^🔗	joepie91	:P
01:59 ^🔗	godane	its made to kill cpus then
04:22 ^🔗	godane	so i found a very old websites about laser discs and stuff
04:22 ^🔗	SketchCow	Grab that shit
04:23 ^🔗	godane	i'm mirroring could i want to see if i can past the 3585 files in wayback machine
04:24 ^🔗	godane	this is the website: http://www.blam1.com/
04:24 ^🔗	godane	best to have a stand alone archive of it
04:27 ^🔗	godane	it has bumpers of DiscoVision
04:27 ^🔗	godane	in real media format
04:28 ^🔗	godane	i think this was a database of laser discs
04:28 ^🔗	godane	with reviews
04:29 ^🔗	chronomex	niiice
04:29 ^🔗	chronomex	lddb.com is another I think
04:32 ^🔗	godane	looks like lddb.com was japanld.free.fr
04:33 ^🔗	godane	http://web.archive.org/web/20060114075257/http://japanld.free.fr/
04:33 ^🔗	godane	that one doesn't existed anymore
04:34 ^🔗	godane	must have been a old redirect since it had lddb.com in the page
04:34 ^🔗	godane	the best part of old websites is that everything is on one domain
04:35 ^🔗	godane	not freaking youtube redirect
04:35 ^🔗	godane	no weird comments hosted on other sites
05:05 ^🔗	godane	its over 70mb now
05:06 ^🔗	godane	also i have past 3585 files in wayback machine
05:06 ^🔗	godane	i'm at 4746 now
05:25 ^🔗	godane	ok so its done
05:25 ^🔗	godane	5628 files in warc.gz
05:37 ^🔗	godane	uploaded: https://archive.org/details/www.blam1.com-20130323
05:43 ^🔗	godane	its called the Blem Entertainment Group
05:49 ^🔗	godane	so i found that discovision.com website is still alive
05:49 ^🔗	godane	grabing it
05:49 ^🔗	godane	lets see if it bets the 301 total urls in wayback machine
05:49 ^🔗	DFJustin	typo, should be Blam Entertainment Group
05:51 ^🔗	godane	fixed
06:09 ^🔗	godane	it only has 83 files
06:09 ^🔗	godane	discovision.com that is
06:16 ^🔗	godane	also know that blamld.com and blam1.com are the same
06:17 ^🔗	godane	from what i could tell they bought blam1.com
06:17 ^🔗	godane	maybe to stop a porn site or something
06:18 ^🔗	godane	anyways even wayback doesn't have all the files under blamld.com host
06:29 ^🔗	godane	uploaded: https://archive.org/details/www.discovision.com-20130323
06:34 ^🔗	godane	i'm now grabing cedmagic.com
06:37 ^🔗	godane	its about this: http://en.wikipedia.org/wiki/Capacitance_Electronic_Disc
06:38 ^🔗	chronomex	ceds are cool
09:43 ^🔗	GLaDOS	kennethre: you around?
09:44 ^🔗	GLaDOS	Nevermind!
12:09 ^🔗	zenpho	hi there!
12:09 ^🔗	soultcer	Konnichiwa
12:12 ^🔗	zenpho	i'd like to dip into some of the archived bt internet dialup (http://archive.org/details/archiveteam-btinternet) stuff
12:13 ^🔗	zenpho	i've obtained hanzo warc-tools, grepped thru the CDX files for stuff i'd like to get, and now I think I have some byte offsets for specific spots in specific warc files with the files I'd like to dip into
12:14 ^🔗	zenpho	i don't fancy downloading the entire eleventy-billion gigabytes of warc files see ;o)
12:14 ^🔗	soultcer	I think the IA servers support range requests
12:16 ^🔗	zenpho	I'm struggling to see how to download specific parts of warc files - on a semi-automated basis - so I can unpack the files I'd like to see to my disk
12:17 ^🔗	zenpho	I'm very new to the warc format and tools for working with it - do you guys know if there's a part of warc-tools (or some other nifty warc-friendly tool) which will do what I want?
12:19 ^🔗	soultcer	I don't know about warc-tools, but basically you need to make a http request (be it with python's urllib or with curl) that tells the server to only return a specific range of bytes
12:20 ^🔗	soultcer	hcurl -L -r 2000-5000 http://archive.org/download/archiveteam-btinternet-u-z/btinternet-u-z.megawarc.warc.gz > extract.warc.gz will fetch only bytes 2000-5000 from the given file
12:20 ^🔗	zenpho	I think I can use wget or curl to specify a specific byte range to download, but I have a hunch I'll end up with just some data with no context, certainly not a valid warc which I can parse and extract data from?
12:21 ^🔗	zenpho	ah. whoops - I was typing whilst you were answering. ;o)
12:21 ^🔗	soultcer	A warc.gz file is basically a succession of warc records each individually gzipped, and then concatenated
12:21 ^🔗	soultcer	As long as you start at the correct offset, it should work
12:21 ^🔗	zenpho	oho, awesome sauce!
12:22 ^🔗	zenpho	i'll give this a go and report back - thanks soultcer!
13:45 ^🔗	Cameron_D	Here, have some light (20k words) reading of tech support stories http://www.reddit.com/user/jon6/submitted/
13:45 ^🔗	Cameron_D	There is great rage to be had
13:46 ^🔗	Cameron_D	(despite the naming similarities it is different to BOFH)
13:51 ^🔗	nwh	similarly r/talesfromtechsupport
13:52 ^🔗	nwh	and r/cablefail
13:52 ^🔗	Cameron_D	well, they are all submitted there, his user page is just a nice portal to list them all
13:57 ^🔗	godane	hey everyone
13:57 ^🔗	godane	i had to restart my cedmagic.com download
13:58 ^🔗	godane	luckly i was only at 12mb and i just past that with out any long wait
13:58 ^🔗	godane	my wifi droped in my sleep is the reason
13:59 ^🔗	nwh	so any, any of you know how to set up an EC2 instance with a GPU?
14:01 ^🔗	Smiley	nope
14:01 ^🔗	nwh	they're not even on the damn lsits.
14:02 ^🔗	nwh	is there anywhere that WOULD know?
14:05 ^🔗	godane	i found 10mins of news coverage
14:05 ^🔗	godane	its from good day oregon
14:06 ^🔗	*	nwh twitches
14:11 ^🔗	godane	the video was with the guy that owns cedmagic.com
14:37 ^🔗	godane	i'm past the number of files on wayback machine for cedmagic.com
15:48 ^🔗	godane	is there a way to stop multiable / urls from downloading
15:53 ^🔗	godane	i will see if adding /// to reject-regex works
15:54 ^🔗	soultcer	Ah, you mean URLs which have multiple "/" in them
15:55 ^🔗	godane	yes
15:55 ^🔗	soultcer	I know heritrix has a filter for that, but I don't know anything for wget
15:55 ^🔗	godane	it has reject-regex
18:12 ^🔗	kennethre	GLaDOS: what's up?
19:00 ^🔗	alard	kennethre: I think GLaDOS wanted to ask you about the ArchiveTeam warrior buildpack. The Python buildpack failed because of this https://github.com/heroku/heroku-buildpack-python/issues/79
19:00 ^🔗	kennethre	alard: ah well my response is the proper answer :)
19:00 ^🔗	alard	But that's fixed now that the AT buildpack uses the latest Python-buildpack tag.
19:00 ^🔗	kennethre	excellent
19:00 ^🔗	alard	So I think GLaDOS is running one Yahoo Messages instance on Heroku now.
19:00 ^🔗	kennethre	awesome
19:01 ^🔗	kennethre	i was going to run some
19:01 ^🔗	kennethre	soon
19:02 ^🔗	alard	Cool. There's a strong competition this time.
21:22 ^🔗	ersi	http://i.imgur.com/z0R4kXI.jpg
21:22 ^🔗	ersi	lul wut
22:05 ^🔗	Smiley	fuck knows
22:05 ^🔗	Smiley	"i think i'm cool because i charged someone $24 for a dongle" ?
22:07 ^🔗	ersi	I was just thinking of the PyCon debacle the whole time
22:08 ^🔗	Smiley	ersi: that too
22:16 ^🔗	ersi	this movie is kinda dope
22:16 ^🔗	ersi	Will Ferrel, time travel and dinosaurs - do I need to say more?
22:29 ^🔗	ivan`	https://www.youtube.com/user/ISO8 who likes trains? ;)
22:29 ^🔗	ivan`	I'm running low on disk after 422GB of k-pop
22:29 ^🔗	ersi	oooh, k-pop
22:30 ^🔗	ersi	hey! I've been on that user and watched some videos before
22:30 ^🔗	ivan`	that was https://www.youtube.com/user/godmd6 which I have 1 copy of
22:30 ^🔗	ivan`	there are at least two great cab view videos in ISO8
22:31 ^🔗	ivan`	https://www.youtube.com/watch?v=632rDJGrH1M https://www.youtube.com/watch?v=cW7IdpV49h0
22:34 ^🔗	ivan`	more, actually
22:45 ^🔗	ersi	huh, Jason Segal was in Slackers
22:53 ^🔗	joepie91	<ivan`>I'm running low on disk after 422GB of k-pop
22:53 ^🔗	joepie91	someone I know would virtually orgasm if he read this

irclogger-viewer