its a rca home theater vcr it powers one but the machine will not take the tape so i now have a tape hitting the 10000k mark not of your tapes didn't for some reason anyways i'm only putting this tape at 6000k now most cause i only capture tv stuff around 5000k btw i got the money from patreon now glad to hear you got the payment situation fixed godane btw i found a youtube channel with all Sightings epsiodes i'm grabbing it for the archive and myspleen cause they been looking for all episodes of it so got sci-fi airing of star trek why does sci-fi airing it matter? just for commercials and stuff? mundus: what's up with your server? it has intro with william shatner talking about the episode so its for commericals and stuff some times bad edits on stations anyways i got 14 tapes from the guy for $10.01 godane: I did some more digging and found a way to get full PDFs. Details here https://verifiedjoseph.com/f68qUv7lqs/archiveteam/pagesuite-pdfs.txt (i hope it makes sense) second, can't afford it mundus: how much was it costing? $7/mo hmm Bandwidth cost? I know it's not much But I don't have much money Unlimited bw Does anyone know where I can find the old imdb database? Or a movie database dataset? Can someone archive this? ftp://ftp.fu-berlin.de/pub/misc/movies/database/temporaryaccess/ https://sourceforge.net/p/imdbpy/mailman/message/35922484/ And perhaps this ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/ IMDB got rid of their database dumps and got a new format but it is missing a lot of data a lot of cast / crew meta second: how big are those? A few gigs Post said it was Old files: 49 files, 1.9 GB nods New files: 6 files, 361 MB on S3 I can download the S3 stuff and pay for it just need to know where I can upload it for archiveteam to take it and how they want it Or I can pay someone to get it and they give me a copy ;D second: you can upload it to the Internet Archive. Just make an account (all you need is an email address, which will be permanently and (not very) publically attached to whatever you upload). You can download each file and upload them before downloading the next, which should avoid you needing to hold on to larger amounts of space. How is the IA doing with space? second: they've got plenty. a few gigs won't even be noticed. I hope they have backups, also california is on fire, I hope the IA doesn't burn down too a few *terabytes* wouldn't be noticed once you get up to a petabyte, it's polite to ask first. (I'm somewhat exagerating, but only somewhat) They have backups; they are working on a backup in Canada, although I haven't heard much about it lately. second: I'm grabbing the fu-berlin one now. second: also grabbing Up to 2.8G so far second: it looks like the other address, funet.fi, is a mirror; are you sure the data is different? Somebody2: IMDB releases snapshots with diffs--compare something outside the diffs folder They will either be the same data at 2 points in time or the same point in time is what I was trying to convey kisspunch: not sure what you mean? the reported file sizes are identical the timestamps are a few hours different Somebody2: I am saying, compare a file that's not a diff if you're going to do that check. In any case, I'm just grabbing both and running a deduplicator after sounds good. Let me know if your deduplication finds that they are different, and I'll grab the second one. Pretty sure they're the same (actors.list.gz is the same size) but I'll double check tomorrow or so Expect it to be 14G My internet's not that fast, I just have an old dump :) JAA: send me an SSH public key over query or email or whatnot, I can grant you access to archivebot@archivebot-proto2 and then you can register new pipelines YAYAYAY! New pipeline energy! https://blog.archive.org/2017/10/10/books-from-1923-to-1941-now-liberated/ One of the points about this focus on whether copies can be bought for a fair price. If there are only a few copies, can someone buy them, and announce that they are no longer for sale, and thereby trigger section 108(h)? How shockingly reasonable of US copyright law. pikhq: yeah, ain't it? second: I've now got the fu-berlin one; it's 13G in size. I'll wait to hear from kisspunch about whether the funet.fi one is different before going after that. ugh, there are so many versions of fdupes Somebody2: It's the same. yipdw: Excellent, will do in a bit. Thank you Somebody2 Did anyone by chance download the aws bucket for the imdb data? Since you have to pay S3's exorbitant bandwidth fees (it's a Requester-Pays bucket), I kind of doubt it. I believe IMDB is still working on an HTTP interface without those fees. See: https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3 Them not having the HTTP interface up seems to be the reason why the FTP servers are still online. Does anyone know where I can find a last.fm dump? Is there a channel for Amazon Forum archival? qw3rty3: No, there isn't. re that new order forum. 125$ for a forum that would run on a 5$ host... wtf I don't know the story behind this case, but I've seen similar setups before, and there it was a matter of "never change a running system" mixed with "I'm too lazy to do anything about it". VerifiedJ: that's basically a slower version of pdfcat though, it's not pristine so to speak second: a quick google search gives me https://www.demonforums.net/Thread-Last-fm-Dump-Re-upload https://leakninja.com/39243-lastfm-1-8gb-dump-12.html oh hey, https://btdig.com/85f39f1d94917d61277725e7da85d8177a5c12eb/ /last.fm/lastfm.txt.gz Any way to upload a torrent larger than 100gb to internetarchive? What's the best way to archive different source code repositories? I know about svnrdump for SVN repos, but what about other softwares? git, Mercurial, Bazaar, CVS, etc. In particular, what to do if the repository itself is not public but only accessible through a web frontend? (There's an ArchiveBot job currently grabbing a CVSweb instance; that's the immediate trigger for these questions, though I've been wondering about it for longer.) git clone etc github-backup for stuff on github that might have other useful things like issues, wiki pages, etc For git clone the harder part is keeping your mirror up to date--the initial clone yeah, git clone works fine that guy that I asked to PM me about the whole deal involving NCIX got back to me and he straight up refused despite having PMed someone else already /shrug