#archiveteam 2012-01-01,Sun

↑back Search

Time	Nickname	Message
00:26 ^🔗	godane	do you guys backup slashdot?
00:31 ^🔗	SketchCow	Not well enough
01:05 ^🔗	bsmith093	SketchCow: isnt that specifically against most cc licenses?
01:09 ^🔗	chronomex	CC-NC only
02:17 ^🔗	irata	hi all, i'm trying to do a little tiny personal archive project, and have a question.
02:18 ^🔗	irata	i had an old Lycos HTMLGear guestbook, and want to archive it. Of course Lycos doesn't give any d/l option.
02:20 ^🔗	irata	it displays five entries per page, and you navigate between pages by clicking a form submit button.
02:20 ^🔗	irata	my question is: what kind of tools could anyone recommend for grabbing data from a site setup this way?
02:35 ^🔗	irata	:-\
02:36 ^🔗	Wyatt\|Wor	Hmm, have you tried wget on it? I'm still pretty low level at this archival stuff, but it's kind of the go-to tool.
02:37 ^🔗	irata	yeah, i'm a total newbie in this regard. i figured if anyone could tell me what to do, it'd be archiveteam. :P
02:47 ^🔗	irata	ok, so i'm looking thru wget's docs right now... but it seems like it only follows links. problem for me is this guestbook doesn't use links to go from page to page. it uses a form submit button
02:47 ^🔗	Wyatt\|Wor	Uhm, hang on. I know I dealt with something like this a couple weeks ago...
02:48 ^🔗	Schbirid	maybe the URLs are easy?
02:48 ^🔗	Schbirid	eg jkust soome number increments
02:48 ^🔗	Schbirid	i should to do bed
02:48 ^🔗	Schbirid	ugh
02:48 ^🔗	Schbirid	4am
02:48 ^🔗	Schbirid	good night =D
02:48 ^🔗	Schbirid	and happy new year
02:48 ^🔗	irata	night
02:49 ^🔗	Wyatt\|Wor	There is that.
02:50 ^🔗	Wyatt\|Wor	And if not, look into using --post-file or --post-data
02:50 ^🔗	irata	since it uses form submits, the url never actually changes
02:53 ^🔗	Wyatt\|Wor	(I was hacking a module into wgetpaste for our internal pastebin and I learned of those from that)
05:16 ^🔗	bsmith093	woop woop woop HAPPY NEW YEAR!!!11!! but seriously, yay one year left till the end
05:18 ^🔗	dnova	one can only hope
05:18 ^🔗	godane	i think i may found something for twitter clone
05:18 ^🔗	godane	called bup
05:19 ^🔗	godane	https://github.com/apenwarr/bup
05:21 ^🔗	Wyatt\|Wor	godane: Don't like status.net?
05:22 ^🔗	godane	never seen that one
05:22 ^🔗	Wyatt\|Wor	It's what identi.ca uses, IIRC.
05:23 ^🔗	Wyatt\|Wor	OH!
05:23 ^🔗	Wyatt\|Wor	Twitter backup, not twitter-like
05:23 ^🔗	godane	yes
05:24 ^🔗	Wyatt\|Wor	I misinterpreted "Twitter clone"
05:24 ^🔗	godane	i was think twitter thats like git/bup
05:25 ^🔗	godane	where there is no center server
05:26 ^🔗	Wyatt\|Wor	bup looks pretty neat, I'll give you that!
05:27 ^🔗	godane	https://www.youtube.com/watch?v=u_rOi2OVvwU&feature=channel_video_title
07:08 ^🔗	chronomex	go out and party, guys
07:10 ^🔗	Wyatt\|Wor	Can't. Have qmail to battle.
07:11 ^🔗	SketchCow	Happppy neeewww yeearhrfkjdfdhf
07:12 ^🔗	Coderjoe	mmm
07:12 ^🔗	Coderjoe	a tenth of capn 100 proof
07:13 ^🔗	Coderjoe	on an empty stomach over 4 hours. (with 1L of coke)
07:13 ^🔗	Coderjoe	in other news, I can still spell?
07:15 ^🔗	Wyatt\|Wor	Seems to be going well for you in that regard.
07:18 ^🔗	Coderjoe	well, I had to call in reenforcements to get home
07:24 ^🔗	Coderjoe	btw, zip (without zip64 extensions) is limited to 4 GiB, iirc
07:27 ^🔗	Wyatt\|Wor	But is that compressed or uncompressed size?
07:28 ^🔗	Coderjoe	compressed
07:29 ^🔗	Coderjoe	well, as long as you try following the central directory at the end of the file, rather than scanning for zip file-record headers
07:30 ^🔗	Coderjoe	the file offset field is, iirc, a 32-bit unsigned integer
07:30 ^🔗	Coderjoe	i suppose if you haxed up a multi-volume archive you could get around the problem, as long as no part was over 4 GB
07:31 ^🔗	Coderjoe	there is a separate field of the central directory for which "disk" the file starts on
09:30 ^🔗	DFJustin	wonder if that zipview.php can be hooked up to the unarchiver http://wakaba.c3.cx/s/apps/unarchiver
09:30 ^🔗	DFJustin	would allow browsing the shareware isos and such too
11:40 ^🔗	emijrp	SketchCow: http://www.oclc.org/worldcat/newgrow.htm
15:15 ^🔗	ersi	1L of coke, whoa
18:26 ^🔗	gui77	for mobileme, do you guys download only, then upload on
18:26 ^🔗	gui77	*downlaod, then upload, then download, then upload, and just repeat the process? or do you do both at once?
18:38 ^🔗	Nemo_bis	gui77, when I asked (for Splinder) I was told that it's better to keep everything on your disk until data is not published on archive.org
18:38 ^🔗	Nemo_bis	otherwise, you should be able to do both at the same time
18:38 ^🔗	Nemo_bis	if it makes sense because you have enough bandwidth
19:33 ^🔗	gui77	Nemo_bis: do yo umean until it IS published?
19:33 ^🔗	gui77	*you
19:33 ^🔗	Nemo_bis	I suppose so, "until" in English always confuses me
19:33 ^🔗	gui77	aha :) what's your native language?
19:34 ^🔗	Nemo_bis	Italian
19:34 ^🔗	gui77	i have enough bandwidth (home connection, not very fast but it's not capped) - my main problem is disk space
19:34 ^🔗	Nemo_bis	"finchÃ©" or "fintantochÃ©" etc. can be used for both
19:34 ^🔗	gui77	i'm portuguese!
19:34 ^🔗	Nemo_bis	so just rsync with delete option while you keep downloading
19:34 ^🔗	Nemo_bis	:)
19:35 ^🔗	gui77	i'll run it the second time (to check) with the delete option, good idea
19:35 ^🔗	gui77	bbl
19:42 ^🔗	Coderjoe	no, do not just rsync with delete option
19:42 ^🔗	Coderjoe	use the upload script, as it skips incompletes
19:43 ^🔗	Coderjoe	(which includes profiles you are currently downloading)
20:59 ^🔗	gui77	Coderjoe: ok then
20:59 ^🔗	gui77	but then i have to erase everything manually, assuming i want to downlaod and re-up more, right?

irclogger-viewer